AI-Powered Workforce: The Impact of Using Bots in Crowdsourcing

TL;DR:

  • Crowdsource workers are using AI systems to complete tasks on platforms like Amazon Mechanical Turk.
  • An experiment by EPFL researchers revealed that a significant portion of the workers’ text submissions was generated using large language models.
  • AI-generated responses allow workers to complete tasks faster and increase their earnings.
  • The researchers developed a methodology to detect synthetic text in their specific scenario.
  • Training AI models on their own output pose risks to model performance and quality.
  • The implications extend to language models trained on fake content collected from crowdsourcing platforms.
  • The use of bots in crowdsourcing platforms challenges the idea of human data as the gold standard.
  • AI-generated responses currently lack the complexity and diversity of human creativity.
  • Crowdsourced data help study human imperfections and behavior.
  • The future may involve collaboration between humans and large language models.

Main AI News:

In the world of artificial intelligence (AI), data is the lifeblood that fuels innovation. To build accurate and reliable machine learning systems, developers rely on clean and high-quality datasets. However, compiling such valuable data can be a time-consuming task. That’s why companies often turn to third-party platforms like Amazon Mechanical Turk, leveraging a pool of inexpensive workers to perform repetitive assignments such as labeling objects, transcribing passages, and annotating text.

These tasks performed by human workers serve a crucial purpose: they generate vast amounts of training examples for AI systems. These models, in turn, empower corporations to reap enormous profits. Yet, a recent experiment conducted by researchers at the École polytechnique fédérale de Lausanne (EPFL) in Switzerland has revealed a troubling trend within this paradigm.

The EPFL team discovered that crowdsourced workers, hired through platforms like Mechanical Turk, are employing AI systems themselves to complete their online assignments. In particular, they were observed utilizing OpenAI’s advanced chatbot, ChatGPT, to perform various tasks. By training a model on its own output, these workers are inadvertently introducing potential risks to AI models’ performance and quality.

The experiment conducted by the EPFL researchers shed light on this phenomenon. They recruited 44 Mechanical Turk workers to summarize the abstracts of 16 medical research papers. The team estimated that between 33 to 46 percent of the text passages submitted by these workers were generated using large language models. The availability of AI-generated responses allows workers to complete tasks more quickly and take on additional assignments, thereby increasing their earnings.

To identify the presence of AI-generated text, the researchers trained a classifier capable of distinguishing between human and AI-generated submissions. Additionally, the researchers analyzed the workers’ keystrokes, aiming to detect instances of copying and pasting from AI models or manual text entry. While it is possible that some individuals used chatbots and manually typed the responses, the researchers found this to be unlikely.

Manoel Ribeiro, co-author of the study and a PhD student at EPFL, explained their methodology: “We developed a very specific methodology that worked very well for detecting synthetic text in our scenario. While traditional methods try to detect synthetic text ‘in any context,’ our approach is focused on detecting synthetic text in our specific scenario.”

By combining the classifier’s output with keystroke data, the researchers increased their certainty in identifying instances of AI-generated text. They found that texts not copy-pasted were classified as “real,” indicating a low rate of false positives.

However, it is important to note that the experiment’s design may not provide a fully representative picture of the extent to which workers utilize AI to automate crowdsourced tasks. The specific task of text summarization lends itself well to large language models like ChatGPT, potentially biasing the results towards a higher number of workers leveraging such tools. Moreover, the dataset used in the study consisted of only 46 responses from 44 workers, making it relatively small.

Nonetheless, the implications of this experiment are significant. If AI models are increasingly trained on synthetic content generated by AI itself, collected from crowdsourcing platforms, it could have detrimental effects on the performance of these language models. Although some organizations, such as OpenAI, keep their training methodologies confidential and may not heavily rely on platforms like Mechanical Turk, many other models do rely on human workers who, in turn, may employ bots to generate training data. This poses a significant challenge.

Mechanical Turk, for instance, positions itself as a provider of “data labeling solutions to power machine learning models.” However, according to Riberio, “Human data is the gold standard because it is humans that we care about, not large language models.” He emphasized the importance of basing research and development on human-centric data rather than relying solely on AI-generated responses.

Presently, responses generated by AI models often lack the complexity and diversity of human creativity, appearing rather bland and trivial. The researchers argue that the value of crowdsourced data lies in studying the imperfections and intricacies of human behavior. Robert West, co-author of the paper and an assistant professor at EPFL’s School of computer and communication science, highlighted this perspective: “Sometimes what we want to study with crowdsourced data is precisely the ways in which humans are imperfect.”

As AI technology continues to advance, the landscape of crowdsourced work is likely to undergo transformation. Ribeiro speculated that large language models might eventually replace workers in certain tasks. Paradoxically, this shift could make human data even more valuable, prompting platforms to implement measures that prevent excessive reliance on language models and ensure the continued availability of human-generated data.

Conclusion:

The use of AI systems by crowdsourced workers highlights a growing trend in the AI-driven workforce. While it offers benefits such as increased efficiency and earnings, there are significant concerns about the quality and reliability of AI-generated responses. Training AI models on their own output, coupled with the potential reliance on synthetic data from crowdsourcing platforms, can lead to detrimental effects on model performance and exacerbate issues like bias. This presents both challenges and opportunities for the market. Companies and organizations must carefully consider the impact on data quality and the need for human-centric approaches to ensure the integrity and diversity of AI systems. Striking the right balance between AI and human collaboration will be crucial for the future of work in an AI-driven landscape.

Source