Hidden Bias in AI Writing Tools: What Educators Need to Know

Male Asian Student Making New Project With His Friend At Table

Are AI writing tools helping or harming student creativity? A study led by Prof. Dr. Thiemo Wambsganss from the Human-Centered AI-based Learning Systems (HAIS) Lab[1] at BFH Wirtschaft explored hidden biases in popular AI feedback tools like ChatGPT and their impact on student writing. Learn why educators should be cautious, how biases can subtly affect learning, and what can be done to ensure AI tools are used responsibly to foster growth and maintain diversity in students’ writing.

Introduction

Artificial intelligence (AI)-based tools are becoming a common practice in our classrooms. One of the most popular ways that AI is being used in education is through writing assistance. Tools like ChatGPT and other large language models (LLMs) can help students write better essays by giving them feedback on their work. However, there’s a growing concern since easy and quick feedback can come with a significant cost for learners: LLMs’ inherent underlying biases and uncontrollability may impact writing styles, educational concepts, and, in the worst case, the learner’s beliefs in harmful and unanticipated ways. Our research takes a closer look at what’s the impact of these biases on students’ writing (aka whether the biases are being passed on to students), comparing feedback from advanced AI tools like LLMs to older machine learning (ML) models.

Why Should We Care About Bias?

Bias is inherent in LLM and can have real effects on students’ learning (Baker and Hawn, 2021)[2]. If an AI tool gives different types of feedback based on gender or other traits, it could reinforce stereotypes or even discourage some students from pursuing their interests. For example, if an AI tool unintentionally uses language that is biased toward a particular gender, students might adopt that language in their own writing without even realizing it (e.g., Wambsganss et al. 2023[3]). This could create a cycle where biased feedback leads to biased student work, affecting the learning concepts and beliefs of the upcoming generation in subtle but impactful ways.

What We Wanted to Find Out

Our study set out to answer a simple but important question: Do AI writing assistants like GPT-3.5 (the large language model behind the earlier versions of ChatGPT) introduce more or less bias into student writing compared to traditional ML-based feedback tools? We conducted four different classroom studies involving 254 students. These studies included two main types of writing tasks: persuasive essays (where students had to argue for a business idea) and reflective journals (where students reflected on their learning experiences). To investigate bias in language models, a clear definition of “bias” is essential, given its varied interpretations across research. In our study, we adopt the view of algorithmic bias as “situations where model performance is substantially better or worse across mutually exclusive groups” (Baker and Hawn, 2021, p. 4).

How We Did the Study

We divided the students into groups and gave them either LLM-based feedback (using GPT-3.5) or ML-based feedback. The ML feedback used simpler AI models that highlight areas of text and provide basic suggestions, while GPT-3.5 provided more advanced, conversational feedback. We wanted to see if there were differences in the amount of gender bias found in the feedback itself and in the students’ final writing.

We used two methods to measure bias:

  1. GenBit Gender Bias Analysis[4]: This method helps us understand if certain words are used more often with male or female references. A higher score means that the writing shows a tendency to use male-associated language more frequently.
  2. WEAT Co-Occurrence Analysis[5]: This method checks if certain words (e.g., “career” vs. “family”) are more likely to appear next to male or female terms. This helps us understand if there are underlying patterns that might reflect gender stereotypes.

What We Found

To our surprise, we did not find significant differences in gender bias between the feedback from GPT-3.5 and the traditional ML models. However, we noticed that students who received GPT-based feedback tended to produce writing that was more similar to each other. In other words, GPT feedback made the students’ writing less diverse, which can be problematic because it might stifle individual expression.

When it came to reflective writing tasks, the results were particularly interesting. The feedback from GPT-3.5 often led to student writing that was highly consistent in its language use. This means that while most students did not show obvious gender bias, the few who did showed very distinct biases. This could imply that GPT-3.5-based feedback is less likely to generate varied writing, potentially limiting the diversity of student thought.

What This Means for Educators

For teachers and educators considering using AI tools for student feedback, our research offers some important takeaways:

  • AI Tools Are Not Perfect: While AI tools like GPT-3.5 can provide personalized and detailed feedback, they can also reinforce certain biases, even if they aren’t obvious. This means that educators need to use these tools thoughtfully, perhaps combining them with human feedback to ensure balanced learning outcomes.
  • Encourage Diversity in Writing: Our study showed that GPT feedback can lead to less diverse writing styles among students. To counteract this, educators can encourage students to critically evaluate AI-generated feedback and think about how they can make their writing unique. AI should be seen as a helpful assistant, not the ultimate authority.

Recommendations for Using AI Writing Tools

  1. Combine AI Feedback with Human Insights: AI tools can be great for pointing out areas for improvement, but human teachers are essential for providing context, understanding students’ unique needs, and fostering creativity.
  2. Teach Students to Be Critical Users: Students should learn to use AI tools critically, understanding both the benefits and the limitations. This includes recognizing that AI feedback may sometimes be biased and that they should not accept suggestions without thinking them through.

Conclusion

AI writing assistants like ChatGPT are powerful tools that can support students in their writing journey. However, as our study shows, they come with their own set of challenges, especially when it comes to bias and the diversity of student writing. Educators who decide to use these tools should do so with caution, combining AI feedback with human oversight to ensure that all students receive fair and supportive guidance. By being aware of these potential pitfalls, we can make AI a positive force in education, helping students grow without unintentionally reinforcing biases.

 


References

[1] https://haislab.com

[2] Ryan S Baker and Aaron Hawn. 2021. Algorithmic bias in education. International Journal of Artificial Intelligence in Education, pages 1–41

[3] Thiemo Wambsganss, Xiaotian Su, Vinitra Swamy, Seyed Neshaei, Roman Rietsche, and Tanja Käser. 2023. Unraveling downstream gender bias from large language models: A study on AI educational writing assistance. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10275-10288, Singapore. Association for Computational Linguistics

[4] Kinshuk Sengupta, Rana Maher, Declan Groves, and Chantal Olieman. 2021. Genbit: measure and mitigate gender bias in language datasets. Microsoft Journal of Applied Research, 16:63–71.

[5] Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.

Creative Commons Licence

AUTHOR: Thiemo Wambsganss

Prof. Dr. Thiemo Wambsganss is Professor of Digital Technology Management at the Institute Digital Technology Management (IDTM) at the Bern University of Applied Sciences, and head of the research group of the Human-Centered AI-based Learning Systems (HAIS) Lab. In his research, he focuses on the human-centric design, development, and evaluation of digital learning systems based on artificial intelligence.

Create PDF

Related Posts

None found

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *