- Mass General Brigham research delves into the use of large language models (LLMs) for patient message drafting.
- LLMs show potential in reducing physician workload and enhancing patient education but raise concerns about safety.
- The study highlights the benefits of LLM integration while underscoring the need for vigilant oversight.
- Researchers utilized GPT-4 to generate patient scenarios and responses, with mixed perceptions from physicians.
- Findings reveal that LLM-generated responses, though informative, may lack urgency in critical situations.
- Mass General Brigham pioneers responsible for AI integration and is currently testing LLMs in ambulatory practices.
- Future research aims to explore patient perceptions and demographic influences on LLM-generated responses.
Main AI News:
In a recent investigation, researchers from Mass General Brigham shed light on the potential benefits and pitfalls associated with utilizing large language models (LLMs), a form of generative artificial intelligence, to compose responses to patient messages. While LLMs show promise in alleviating physician workload and enhancing patient education, the study underscores the importance of careful oversight to ensure patient safety in the use of LLM-generated communications. Published in Lancet Digital Health, the findings underscore the significance of a cautious approach to LLM integration.
Mounting administrative tasks and documentation requirements have contributed to the growing issue of physician burnout. In response, electronic health record (EHR) vendors have turned to generative AI algorithms to aid clinicians in crafting patient messages, aiming to streamline workflows. However, until now, the efficacy, safety, and clinical impact of this technology remained uncertain.
Dr. Danielle Bitterman, corresponding author of the study and a faculty member in the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham, emphasized the potential of generative AI to strike a balance between easing clinician burden and enhancing patient education. Yet, drawing from their experience with LLMs, the research team highlighted concerns regarding potential risks associated with integrating LLMs into messaging systems. Thus, the study sought to delineate the benefits and drawbacks of LLM implementation.
To conduct the study, researchers employed OpenAI’s GPT-4, a prominent LLM, to generate scenarios related to cancer patients along with accompanying queries. Subsequently, radiation oncologists manually responded to these queries, followed by GPT-4 generating responses. The same oncologists then reviewed and edited the LLM-generated responses. Notably, physicians often couldn’t discern between LLM-generated and human-authored responses.
Analysis revealed that, on average, physician-drafted responses were briefer than those generated by GPT-4. While the latter provided more educational context, it tended to be less directive in its guidance. Physicians perceived LLM assistance as enhancing efficiency and deemed most LLM-generated responses safe for transmission to patients. However, the study identified shortcomings, with a small percentage of unedited LLM-generated responses posing potential risks to patients, including cases where urgent medical attention wasn’t emphasized.
Furthermore, responses that underwent physician editing retained significant similarity to LLM-generated responses in terms of length and content. This suggests a perceived value in the educational content provided by LLMs. Nevertheless, the researchers caution against overreliance on LLMs due to their inherent limitations.
The integration of AI tools in healthcare presents opportunities for transformative change in care delivery. However, Mass General Brigham advocates for a balanced approach that prioritizes patient safety and quality. The institution is at the forefront of responsible AI utilization, conducting rigorous research to inform the integration of AI into various aspects of healthcare delivery.
Currently, Mass General Brigham is spearheading a pilot program to integrate generative AI into EHRs for drafting patient portal message responses, with a focus on testing the technology in ambulatory practices. Looking ahead, the researchers plan to delve into patient perceptions of LLM-based communications and explore how demographic factors influence LLM-generated responses, considering known biases in LLM algorithms.
Dr. Bitterman emphasized the importance of maintaining human oversight in AI utilization within healthcare. While LLMs offer significant potential, they also introduce risks that must be carefully managed. This study underscores the necessity of robust quality monitoring systems, clinician training, and increased AI literacy among both patients and providers to ensure the safe and effective use of AI in medicine.
Conclusion:
The study sheds light on the promise and challenges of incorporating generative AI into patient messaging systems. While offering efficiencies and educational benefits, LLMs require careful monitoring to mitigate potential risks to patient safety. Organizations in the healthcare market should prioritize comprehensive oversight and training to ensure the responsible and effective use of AI technologies in patient care.