Transforming Biased Data into Healthcare AI Insights

TL;DR:

  • Biased data in healthcare AI is a growing concern.
  • Computer science and bioethics experts propose a novel approach in a recent NEJM article.
  • They advocate viewing biased clinical data as “artifacts” to reveal societal biases and historical influences.
  • This approach raises awareness and promotes more inclusive healthcare AI models.
  • Involving bioethicists and clinicians early in the process is recommended.
  • The NIH emphasizes ethically sourced data for next-gen AI technologies.
  • Treating biased datasets as artifacts can lead to context-aware AI, better serving specific populations.
  • An artifact-based approach may inform new policies to eliminate bias in healthcare data.

Main AI News:

In the realm of healthcare AI, the maxim “garbage in, garbage out” oversimplifies the complex issue of biased data. Esteemed computer science and bioethics scholars hailing from MIT, Johns Hopkins University, and the Alan Turing Institute address this nuanced concern in their recent op-ed featured in the New England Journal of Medicine (NEJM). As artificial intelligence gains prominence, scrutiny intensifies regarding biased AI models and the resultant algorithmic discrimination. The White House Office of Science and Technology, in its Blueprint for an AI Bill of Rights, has identified this as a critical challenge.

Confronted with biased data, especially in the realm of medical AI, conventional responses entail either amassing more data from underrepresented demographics or fabricating synthetic data to fill gaps, ensuring equitable model performance across diverse patient populations. However, the authors contend that this technical approach should be complemented by a sociotechnical perspective, one that factors in both historical legacies and contemporary societal influences. This, they argue, is key to effectively addressing bias in public health.

Marzyeh Ghassemi, one of the co-authors and an assistant professor in electrical engineering and computer science, highlights their motivation: “We often perceive data-related issues in machine learning as mere nuisances that demand technical fixes. We liken data to artifacts providing only a partial glimpse of past practices or a flawed mirror reflecting our society. The data may not align with our assumptions about societal behavior, but uncovering this historical understanding enables us to progress and rectify shortcomings.”

In their paper titled “Considering Biased Data as Informative Artifacts in AI-Assisted Health Care,” Ghassemi, Kadija Ferryman, and Maxine Mackintosh advocate for viewing biased clinical data as “artifacts,” akin to how anthropologists or archaeologists interpret physical objects. These artifacts, they argue, unveil practices, belief systems, and cultural values, shedding light on existing disparities within the healthcare system.

For instance, a 2019 study exposed a widely accepted algorithm that employed health-care expenditures as a measure of need. It erroneously concluded that sicker Black patients required the same level of care as healthier white patients. The root cause, as researchers discovered, was algorithmic discrimination failing to account for unequal access to healthcare. In this case, rather than treating biased datasets or data gaps as mere problems to be fixed or discarded, Ghassemi and her colleagues suggest the “artifacts” approach to raise awareness about the societal and historical influences on data collection. They emphasize the need to involve bioethicists or clinicians with relevant training early in the problem formulation process.

However, implementing an artifact-based approach presents challenges, particularly in determining whether data have been racially corrected—meaning they have been adjusted using white, male bodies as a standard against which other bodies are measured. As Ghassemi notes, researchers must be prepared to scrutinize race-based corrections as part of the research process.

In another paper authored by Ghassemi’s PhD student Vinith Suriyakumar and University of California at San Diego Assistant Professor Berk Ustun, it was found that assuming the inclusion of self-reported race in machine learning models improves performance can actually worsen risk scores and metrics for minority populations.

It’s important to emphasize that while biased datasets should not be endorsed, and biased algorithms require rectification, quality training data remains pivotal in developing secure, high-performance clinical AI models. The National Institutes of Health (NIH) has played a crucial role in driving ethical practices in this domain. Lawrence Tabak, the acting director of NIH, emphasized the importance of ethically sourced datasets in enabling next-generation AI technologies, highlighting their commitment to addressing factors like environmental influences and social determinants in health data.

Elaine Nsoesie, an associate professor at the Boston University of Public Health, underscores the advantages of treating biased datasets as artifacts. She highlights the significance of considering local contexts in AI training to better serve specific populations and identify latent discriminatory practices coded into algorithms or systems. Nsoesie believes that an artifact-based approach could catalyze the development of policies and structures aimed at eliminating the root causes of bias in datasets.

Conclusion:

Embracing the “artifacts” approach to biased healthcare data has the potential to revolutionize the AI market. By acknowledging the historical and societal factors influencing data, businesses can develop more ethical and inclusive AI solutions. This shift aligns with the growing demand for responsible AI in healthcare and positions companies that adopt these practices as leaders in the industry.

Source