TL;DR:
- Chris Chen and S. Shyam Sundar published a paper examining the relationship between data labeling quality, perceived training data credibility, and trust in AI.
- The study found that visibility into the data fed into AI systems increased perceived credibility and trust in AI, but performance bias decreased aspects of trust.
- The study aimed to determine if users would factor in labeling accuracy when evaluating training data credibility and forming trust in AI.
- The study confirmed that high-quality labeling results in higher perceived credibility and trust in AI, but only with unbiased AI performance.
- The study also found that involving users in the labeling process does not add value to perceived training data credibility.
- Chen and Sundar have a strong track record of research exploring the use of automated features and the impact of social media on behavior.
Main AI News:
An exciting new paper has been published in the field of artificial intelligence, examining the critical relationship between data labeling quality, perceived training data credibility, and trust in AI. The paper, titled “Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust,” was presented by co-author S. Shyam Sundar at the 2023 ACM CHI Conference on Human Factors in Computing Systems. This annual conference is a highly regarded international platform for human-computer interaction, held this year in Hamburg, Germany.
The study, led by Chris (Cheng) Chen, Assistant Professor in the Communication Design Department, delves into the impact that the nature of training data can have on algorithmic bias and user trust. By focusing on the accuracy of labeling, the study found that providing users with visibility into the data fed into AI systems increased perceived training data credibility and trust in AI. However, if the system demonstrated signs of bias, aspects of the users’ trust decreased while others remained high.
The study’s innovative design involved showing users the labeling practice and a snapshot of labeled data before their interaction with the AI system. This approach aimed to determine if users would factor in labeling accuracy when evaluating training data credibility and forming their trust in AI.
As Chen explains, supervised machine learning requires labeled data, which are often labeled by crowd workers who assign pre-defined values, such as happy and unhappy, to each facial image in the dataset. However, data labeling can be subjective and lacks supervision, leading to labeling accuracy concerns.
The findings of a groundbreaking study in the field of artificial intelligence have confirmed the crucial role that labeling quality plays in shaping user trust in AI. Chris (Cheng) Chen, Assistant Professor in the Communication Design Department, and his co-author S. Shyam Sundar presented their paper, titled “Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust,” at the 2023 ACM CHI Conference on Human Factors in Computing Systems.
As predicted, the study found that high-quality labeling results in a higher perceived credibility of the training data and trust in AI, but only when the AI demonstrates unbiased performance. However, if the AI system shows signs of performance bias, such as racial bias in facial expression classification, priming users with credible training data does not maintain their cognitive trust in AI.
Chen explains that this is a positive outcome as it demonstrates that labeling quality can calibrate users’ cognitive trust in AI, aligning it with actual AI performance. However, the study also found that users tend to blindly trust the AI system emotionally and behaviorally when they perceive the training data to be credible, a phenomenon known as automation bias. Future studies will aim to address this issue through novel design solutions.
Contrary to their predictions, the study found that involving users in the labeling process, either by asking them to review crowd-worker labeled data or participate in data labeling themselves, does not add value to perceived training data credibility. As such, the authors do not recommend designers add labeling tasks before user interaction with the AI system.
Chen, a former doctoral student in mass communication at Penn State, has a strong track record of collaboration with Sundar, Co-Founder of Penn State’s Media Effects Research Laboratory. Their previous research has explored the use of automated features such as autocorrect on iPhones, Smart Reply on Gmail, and autoplay on YouTube, as well as the habitual and problematic use of Instagram.
Conlcusion:
The study led by Chris (Cheng) Chen and S. Shyam Sundar has important implications for the AI market. The findings demonstrate the crucial role that is labeling quality plays in shaping user trust in AI and highlight the need for accurate labeling to increase perceived training data credibility and trust in AI systems. The study also underscores the need for future research to address the issue of automation bias, where users tend to blindly trust AI systems when they perceive the training data to be credible.
For businesses and organizations that are investing in AI technology, the study highlights the importance of ensuring the quality of their training data and the labeling process. Companies should prioritize the accuracy of their labeling practices to increase user trust in their AI systems and ensure their investment in AI technology pays off.
Additionally, businesses should consider the impact of performance bias on user trust and take steps to mitigate its effects. Overall, the study provides valuable insights for businesses looking to leverage AI technology effectively and efficiently.