Social stereotypes in pre-trained language models

When algorithms make decisions, they often discriminate because of programmed preferences. Author Mascha Kurpicz-Briki investigated what happens when social stereotypes are hidden in language models and presented the results at the SwissText & KONVENS 2020 conference in June 2020. Natural language processing (or computational linguistics ) is a branch of computer science that deals with the automated processing of human language in text or speech data. Typical tasks include, for example, performing automatic spelling and grammar checks, automatically extracting information from large amounts of data(text mining) or performing linguistic communication with a user (e.g. voice control). Machine learning is often used to efficiently master such challenges and to provide the computer with the best possible understanding of human language. In particular, finesses of language, such as complex correlations or irony, are still a very challenging task for automatic text processors.

Alongside the possibilities, however, there are also challenges, especially with regard to the fairness of such systems. One example is the widely used Google Translate, as shown in Figure 1.

Figure 1: Bias in Google Translate (Kurpicz-Briki, 2020).

Translate “She is an engineer. He is a nurse.” into a language where the personal pronouns he and she do not differ (such as Turkish), an assumption must be made about the gender of the persons described when translating back into English. Based on this assumption, the system concludes that “He’s an engineer. She is a nurse“.

But how do systems come to such decisions? Pre-trained models are often used to develop natural language processing applications. These are already publicly available and their use saves one’s own training, which is very time and resource consuming. A typical such language model is the so-called word embeddings. In such word embeddings, words are represented as mathematical vectors. Based on this, mathematical operations can then be used to calculate content relationships between the words. This is a great advantage for automatic processing because computers can handle such mathematical models much better than natural text. It is then possible to solve “puzzles” with such models, as shown for example in Figure 2.

Fig. 2: Determining word relationships using mathematical relationships on vectors (Bolukbasi et al., 2016)

Such methods are very useful for many different kinds of applications in the field of automatic text analysis. However, the encoded relationships in such models can also be problematic. It can be easily calculated that“man is to computer programmer, as woman is to homemake”” (Bolukbasi et al., 2016) or that “man is to doctor, as woman is to nurse” (Lu et al., 2018).

In order to make such stereotypes and prejudices present in the language model measurable, a statistical test was developed, the so-called WEAT method (Caliskan et al., 2017). The method is based on the Implicit Association Test (IAT) (Greenwald et al., 1998), which is used in the field of psychology. The IAT detects implicit biases in people. The human test subjects have to associate terms with each other, and based on the reaction time it can be determined whether an implicit prejudice is present. Analogous to the reaction time of humans in the IAT, the WEAT method uses the distance between the vectors of two words in the language model. With the help of this method, it could be shown that often used language models show prejudice towards a person’s origin and gender. This is measured using various word groups. Figure 3 shows an example of such word groups. This experiment investigates whether there is a statistically relevant difference between female and male first names with regard to career words and family words. While the original experiment uses common names from the USA (Caliskan et al., 2017), the study by author Mascha Kurpicz-Briki examines the most common names from Switzerland per language region for the German and French languages (Kurpicz-Briki, 2020).

Fig. 3: Example of a WEAT experiment for English, German and French. Typical male and female first names are related to words related to career and family (Kurpicz-Briki, 2020).

While existing research was often devoted to English, it was shown that this problem also exists for German or French language models. Author Mascha Kurpicz-Briki from the Institute for Data Applications and Security at the Bern University of Applied Sciences applied the WEAT method to German and French language models and was able to show that social biases regarding gender and origin can also be detected there (Kurpicz-Briki, 2020).

The study was also able to show that prejudices in word embeddings may well differ in form in different languages, presumably due to cultural and social differences. While not all tests from the English language models could be confirmed for German or French, the study was able to identify new word groups that are specifically based on social realities of the German-speaking cultural area.

For example, one experiment in the study investigated whether the different study choices of women and men are also reflected in the word embeddings. The five fields of study in Switzerland with the highest or lowest number of women and female or male words were examined, as shown in Figure 4. Again, it was shown that these real-world imbalances are present in the Word Embeddings, and therefore there is a risk that applications will take this bias as reality and reinforce it rather than reflect it.

Fig. 4: Example of the WEAT experiment on study choice in Switzerland (Kurpicz-Briki, 2020).

The big challenges for future research in this area are therefore, on the one hand, to make the effects of such biases measurable in the applications themselves. On the other hand, the cooperation between humans and algorithms must be discussed in order to sufficiently reflect the automated decisions of software.


  1. Bolukbasi, Tolga; Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems, pages 4349-4357.
  2. Caliskan, Aylin; Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183-186.
  3. Greenwald, Anthony G; Debbie E McGhee, and Jordan LK Schwartz. 1998. Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology, 74(6):1464.
  4. Kurpicz-Briki, Mascha. 2020. Cultural differences in bias? origin and gender bias in pre-trained German and French word embeddings. 5th SwissText & 16th KONVENS Joint Conference 2020, Zurich, Switzerland.
  5. Lu, Kaiji; Piotr Mardziel, Fangjing Wu, Preetam Amancharla, and Anupam Datta. 2018. Gender bias in neural natural language processing. arXiv preprint arXiv:1807.11714.

Further links to the study

  1. Direct link to the paper
  2. Video presentation at the conference
  3. URL of the conference
Creative Commons Licence

AUTHOR: smf smf

Create PDF

Related Posts

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *