Bias in Language Models and Data Augmentation for AI in Mental Health

Are societal stereotypes encoded in German language models, and how can data augmentation techniques support classification task in the context of mental health? Recent results on these topics were presented at the annual SwissText conference 2024.

The Applied Machine Learning group at the Bern University of Applied Sciences is involved in many innovative projects in the context of Natural Language Processing (NLP). At the recent SwissText conference, bringing together researchers from the the field, results from two projects were presented. The conference was located this year in Chur, at the Fachhochschule Graubünden.

Figure. 1: Leander Rankwiler during his presentation.

The event aims bringing together text analytics experts from industry and academia. It is organized by the Swiss Association for Natural Language Processing (SwissNLP) in collaboration with local universities of the yearly editions and the Zurich University of Applied Sciences (ZHAW).

Bias in German Word Embeddings

The BIAS project investigates how societal stereotypes are reflected in technology. In particular, the focus lies on European languages and linguistic as well as regional particularities. This is particularly relevant for models that are used to process writing language, as for example word embeddings (see this article for details). For example, stereotypes encountered in the US and reflected in English word embeddings might be different from the ones in Norwegian word embeddings. In the context of the BIAS project, co-creation workshops were organized in the different partner countries, including Switzerland. Different stakeholders including HR professionals, members of NGOs, AI specialists and workers discussed in interdisciplinary groups.

The paper presented at SwissText 2024 mainly described the results based on the German co-creation workshop in Switzerland. Target of investigation were German word embeddings, the models below modern text processing and text generation applications. The analysis showed that both static and contextualized German embeddings exhibit significant biases along several dimensions.

Data Augmentation for the Classification of Eating Disorders

The second paper presented at SwissText discussed results from the project AI4ED, which investigates how natural language processing can be used to analyze text snippets and detect different types of eating disorders. This is part of the research direction Augmented Intelligence for Mental Health by the Applied Machine Intelligence research group, which investigates the potential of AI technologies for new clinical tools of the future.

Ghofrane Merhbene speaks at SwissText Conference 2024.

In the paper presented at SwissText, the challenges of an imbalanced dataset were addressed in this context. Back translation as a data augmentation technique was applied to address the class distribution imbalance. This process significantly enhanced the dataset’s utility. Through a comprehensive grid search, a Support Vector Machine (SVM) model was identified as the most effective, achieving an average F1-score of 0.83.


Acknowledgements

The authors acknowledge the funding received for the research projects related to the work presented in this article from Inventus Bern Stiftung, the European Commission and the SERI.


References

  1. Merhbene, G. & Kurpicz-Briki, M. (2024). Data Augmentation for Multi-Class Eating Disorders Text Classification. In: Proceedings of SwissText 2024, Chur, Switzerland.
  2. Rankwiler, L. & Kurpicz-Briki, M. (2024). Evaluating Labor Market Biases Reflected in German Word Embeddings. In: Proceedings of SwissText 2024, Chur, Switzerland.

The two papers are available in the conference proceedings: https://www.swisstext.org/wp-content/uploads/2024/06/Proceedings_Preprint.pdf

Creative Commons Licence

AUTHOR: Ghofrane Merhbene

Ghofrane Merhbene is studying in the Master of Science in Engineering in the Data Science profile and working as a research assistant at the Applied Machine Intelligence research group at BFH.

AUTHOR: Mascha Kurpicz-Briki

Dr Mascha Kurpicz-Briki is Professor of Data Engineering at the Institute for Data Applications and Security IDAS at Bern University of Applied Sciences, and Deputy Head of the Applied Machine Intelligence research group. Her research focuses, among other things, on the topic of fairness and the digitalisation of social and community challenges.

Create PDF

Related Posts

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *