AI Models Misjudge Rule Violations: A Comparative Analysis of Human and Machine Decision-Making

TL;DR:

  • Machine-learning models often fail to replicate human judgments regarding rule violations.
  • The choice of training data is crucial, with normative data outperforming descriptive data in judging rule violations.
  • Using descriptive data to train models leads to over-prediction of rule violations.
  • Inaccuracy in machine-learning models can have significant real-world implications, such as harsher judgments in criminal justice systems.
  • Transparency in dataset collection is essential to address the discrepancy between descriptive and normative labels.
  • Fine-tuning models with a small amount of normative data and exploring transfer learning are potential strategies to mitigate the issue.
  • Conducting similar studies with expert labelers can provide further insights into labeling disparities.
  • Using data collected in the specific setting of interest is crucial for reproducing human judgment and avoiding excessively stringent moderation by AI systems.

Main AI News:

In an endeavor to enhance equity and address backlogs, machine-learning models have been designed to emulate human decision-making processes, including the evaluation of whether social media content violates toxic content policies. However, a recent study conducted by researchers from MIT and other institutions reveals that these models often fail to replicate human judgments concerning rule violations. When trained with inadequate data, these models tend to yield dissimilar and frequently more severe assessments compared to human evaluators.

The crux of the matter lies in the “right” data used for training, which refers to information that has been explicitly labeled by humans who were asked to determine if certain items defy specific rules. The training process entails exposing machine-learning models to millions of examples of this “normative data” to facilitate learning a particular task.

The challenge arises when the data used for training machine-learning models is labeled descriptively, meaning humans are tasked with identifying factual attributes, such as the presence of fried food in a photograph. If such “descriptive data” is employed to train models that assess rule violations, such as determining whether a meal infringes a school policy prohibiting fried food, these models tend to exhibit an over-prediction of rule violations.

The decrease in accuracy resulting from this discrepancy can have profound real-world implications. For instance, if a descriptive model is utilized to make decisions regarding an individual’s likelihood of reoffending, the researchers’ findings suggest that it may render more stringent judgments compared to those made by humans. This, in turn, could lead to higher bail amounts or longer criminal sentences.

Marzyeh Ghassemi, an assistant professor and leader of the Healthy ML Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), emphasizes the significance of these findings for machine learning systems employed in human processes.

Ghassemi states, “I think most artificial intelligence/machine-learning researchers assume that the human judgments in data and labels are biased, but this result is saying something worse. These models are not even reproducing already-biased human judgments because the data they’re being trained on has a flaw: Humans would label the features of images and text differently if they knew those features would be used for a judgment. This has huge ramifications for machine learning systems in human processes.”

The research paper outlining these discoveries, published today in Science Advances, lists Marzyeh Ghassemi as the senior author. The team of authors includes lead author Aparna Balagopalan, a graduate student in electrical engineering and computer science, David Madras, a graduate student from the University of Toronto, David H. Yang, a former graduate student and co-founder of ML Estimation, Dylan Hadfield-Menell, an assistant professor at MIT, and Gillian K. Hadfield, Schwartz Reisman Chair in Technology and Society and professor of law at the University of Toronto.

Discrepancies in labeling practices have emerged as a crucial aspect during the course of this study, stemming from a separate project investigating how machine-learning models can justify their predictions. In the process of gathering data for that particular study, the researchers observed that humans often provide divergent responses when asked to provide descriptive or normative labels for the same set of data.

To collect descriptive labels, researchers instruct labelers to identify factual features, such as determining whether a text contains obscene language. In contrast, for normative labels, labelers are provided with a specific rule and asked to assess whether the data violates that particular rule, for instance, identifying if a text infringes upon a platform’s explicit language policy.

This intriguing finding prompted the researchers to initiate a user study to delve deeper into the phenomenon. They assembled four distinct datasets, each simulating different policies. For instance, one dataset consisted of dog images that could potentially violate an apartment’s policy against aggressive breeds. They then recruited groups of participants to assign descriptive or normative labels to these datasets.

In each scenario, the descriptive labelers were instructed to indicate the presence or absence of three factual features in the image or text, such as determining if the dog appears aggressive. These labelers’ responses were subsequently utilized to form judgments. If a user identified the presence of an aggressive dog in a photo, it was deemed a violation of the policy.

Importantly, the labelers were unaware of the specific pet policy. On the other hand, normative labelers were provided with the policy that prohibited aggressive dogs and was then asked to evaluate whether each image violated the policy and provide reasoning for their judgment.

The researchers discovered that humans were considerably more inclined to label an object as a violation when operating in the descriptive setting. The disparity between the descriptive and normative labels, calculated by measuring the absolute difference on average, ranged from 8 percent for a dataset used to assess dress code violations to 20 percent for the dog images.

A probable explanation for this phenomenon, as proposed by Aparna Balagopalan, is that people’s perceptions of rule violations might differ from their approach to descriptive data. Generally, when making normative decisions, individuals tend to be more lenient.

However, it is noteworthy that data used to train machine-learning models for specific tasks are typically collected with descriptive labels. Subsequently, these same data are often repurposed to train different models that are designed to perform normative judgments, such as assessing rule violations.

In order to investigate the potential consequences of repurposing descriptive data, the researchers conducted an experiment where they trained two models to assess rule violations using one of the four data settings available. One model was trained using descriptive data, while the other model was trained using normative data. The performance of these models was then compared.

The findings revealed that if a model is trained using descriptive data, it will exhibit poorer performance compared to a model trained with normative data when it comes to making similar judgments. Specifically, the descriptive model is more prone to misclassifying inputs by erroneously predicting rule violations. Moreover, the accuracy of the descriptive model decreased further when classifying objects that human labelers disagreed upon.

“This demonstrates that the choice of data is indeed crucial. It is imperative to align the training context with the deployment context when training models to detect rule violations,” highlights Aparna Balagopalan.

Determining how data have been collected can be a challenging task for users, as this information is often buried in the appendix of a research paper or not disclosed by private companies, explains Marzyeh Ghassemi. Enhancing transparency regarding datasets is one potential approach to mitigate this issue. By knowing how the data were gathered, researchers can better understand how the data should be used.

Another potential strategy is to fine-tune a model that was initially trained with descriptive data using a small amount of normative data. This concept, known as transfer learning, is an avenue the researchers aim to explore in future endeavors. Additionally, they are interested in conducting a similar study with expert labelers, such as doctors or lawyers, to investigate if it yields the same disparities in labeling.

To rectify this challenge, Marzyeh Ghassemi emphasizes the need to transparently acknowledge that if the goal is to reproduce human judgment, only data collected in that specific setting should be used. Otherwise, there is a risk of developing systems with excessively stringent moderation, surpassing the judgments made by humans. Humans possess the ability to perceive nuance and make distinctions, whereas these models do not possess such capabilities, further highlighting the importance of addressing this issue.

Conlcusion:

The findings from this study on the misjudgment of rule violations by machine-learning models have significant implications for the market. The discrepancy between machine judgments and human judgments underscores the importance of carefully selecting and training models using the appropriate data context. Businesses operating in industries where rule enforcement or policy compliance is crucial must recognize the potential risks associated with relying solely on descriptive data for training machine-learning models.

Understanding the limitations of such models and the need for normative data in training can help organizations make more informed decisions regarding their AI systems. Transparency in dataset collection and the exploration of fine-tuning strategies, such as transfer learning, are essential for improving the accuracy and reliability of these models. By addressing these challenges, businesses can ensure fairer, more accurate, and more nuanced assessments, reducing potential biases and legal implications in their operations.

Source