Traumatic Click Work: The People Behind ChatGPT
The capabilities of new generative AI models or products based on them like ChatGPT are fascinating. Besides the concerns about what this will do to our society, there is more and more criticism about the working conditions of the people involved in such projects. We talked about this with our experts.
SocietyByte: What is the truth about the criticism of the workers’ job? What did they have to do?
Mascha Kurpicz-Briki: For the launch of ChatGPT, research was published by TIME Magazine in early 2023 [1]. It showed that the US company OpenAI, which owns Chat GPT, hired a company in Kenya to filter out toxic content such as physical and sexual violence, suicides and animal cruelty from the replies. In the process, the hired workers had to read sometimes shocking content for less than 2 dollars an hour. According to the research, the workers read up to 250 text passages of up to 1,000 words in a nine-hour shift. It was also criticised that the affected workers received too little support in dealing with it; they complained of psychological problems due to the stressful texts.
What was their preliminary work used for or what did it serve?
MKB: Due to the large amount of training data needed for such language models, quality control is difficult. The content of the training data can therefore lead to discriminatory or offensive statements being generated by a chatbot. To prevent this, such responses can be marked as undesirable, for example. If this is done for a large number of responses, the system can learn from this. However, to be able to mark it, one has to read all these unwanted texts, and this can include detailed descriptions of abuse, torture or murder, among other things.
How is it that such filtering is necessary at all?
MKB: The basic problem is that due to the large amounts of training data required for such models, data from the internet is used. The amount of data is so large that it is no longer possible to check the data manually, but some of it contains very problematic data from the very dark corners of the internet. In order to automatically recognise this and prevent it from being used in the response by the chatbot, it again needs a large amount of examples of such bad texts from which the system can learn what is undesirable. These texts need to be provided and sorted by humans.
Why wasn’t the training phase already more inclusive, i.e. more reflective of racism and bias?
MKB: The data used for the training was generated by people. They therefore also contain the stereotypes of our society. In the training phase, the data is processed automatically, and we are talking about hundreds of billions of words. The choice of training sets available here is therefore limited. The quality of the training data is very important, and there is an increasing demand for this to be documented in greater detail and transparency [2].
Is this a problem specific to OpenAI or ChatGPT?
MKB: There are these types of activity in the context of AI and digital transformation elsewhere as well. Even when filtering content on social media, or in online forums, it often takes humans. In some cases, to do a content check directly, or to train AI systems to do it. It is, of course, difficult to make a general assessment of the working conditions in each case, especially when outsourcing takes place in the global south.
In addition to this problem, there are other challenges, especially in the area of language models. Much progress is made primarily for the English language, or for a few privileged languages. A large part of the languages that are spoken worldwide cannot benefit from this, because research and development is less concerned with them.
Why is this work outsourced and so little valued?
Caroline Straub: No one in Switzerland could finance their life with this work. It is microwork. These are simple, repetitive tasks that can be done quickly online and do not require any special skills (e.g. data cleaning, coding, classifying content for AI). The pay for microjobs is usually very low (about 5 centimes per click). For many people without education in developing countries (Global South), microjobs are a way to earn money. Microwork is also called ghostwork. This is work that is done by a human but that a client believes is done by an automated process.
What are the difficulties regarding employment conditions, extending to ghostwork?
CS: Artificial intelligence relies on human labour to perform tasks such as data cleansing, coding and content classification. This on-demand work is offered and performed online on platforms such as Amazon Mechanical Turk, with payment depending on the task. Conceived as ‘ghost work’, this fast-growing, platform-based work is largely invisible: workers cannot talk to managers, receive no feedback, and lack health and safety protections.
References
[1] https://time.com/6247678/openai-chatgpt-kenya-workers/
[2] https://cacm.acm.org/magazines/2021/12/256932-datasheets-for-datasets/abstract
About the experts
Mascha Kurpicz-Briki is Professor of Data Engineering at Bern University of Applied Sciences and Deputy Head of the Applied Machine Intelligence research group. She researches how AI can be used responsibly. | Prof. Dr Caroline Straub is a professor at the Institute New Work at BFH Wirtschaft. She researches platform-based work, digital HRM and diversity & inclusion. |
Leave a Reply
Want to join the discussion?Feel free to contribute!