PolyU’s Study Reveals Enhanced Training of AI Large Language Models for Better Alignment with Human Brain Activity

  • PolyU research highlights the significance of training AI large language models (LLMs) with next sentence prediction (NSP) alongside word prediction.
  • NSP training improves LLMs’ alignment with human brain activity, enhancing comprehension of complex discourse.
  • Findings underscore the limitations of solely scaling LLMs and advocate for integrating diverse learning tasks for more human-like intelligence.
  • Collaboration between AI and neurocognitive research fields holds promise for future studies in AI-informed brain research and brain-inspired AI development.

Main AI News:

In recent years, the landscape of social interaction has been significantly transformed by generative artificial intelligence (GenAI). At the heart of this transformation are large language models (LLMs), which utilize deep-learning algorithms to power GenAI platforms in processing language. A groundbreaking study conducted by The Hong Kong Polytechnic University (PolyU) has illuminated a crucial insight: LLMs demonstrate a closer resemblance to the human brain when trained to process language in ways more akin to human cognition. This discovery holds profound implications for both brain studies and the advancement of AI models.

Traditionally, large language models have predominantly relied on a singular form of pretraining: contextual word prediction. While this approach has yielded remarkable success, especially when coupled with extensive training data and model parameters—as evidenced by the likes of ChatGPT—it fails to fully capture the intricacies of human language comprehension. Unlike machines, humans don’t merely predict the next word; they also integrate higher-level contextual information.

Led by Prof. Li Ping, Dean of the Faculty of Humanities and Sin Wai Kin Foundation Professor in Humanities and Technology at PolyU, a research team delved into the realm of next sentence prediction (NSP) task. This task mirrors a fundamental process in discourse-level comprehension within the human brain, wherein the coherence between a pair of sentences is evaluated. By incorporating NSP into model pretraining and scrutinizing the correlation between the model’s data and brain activation, the team embarked on a groundbreaking journey that culminated in their recent publication in the esteemed academic journal Sciences Advances.

The study involved the training of two models—one enhanced with NSP and the other without—both of which also underwent word prediction training. Functional magnetic resonance imaging (fMRI) data were gathered from individuals tasked with reading connected and disconnected sentences. Through a meticulous analysis, the research team discerned the degree of alignment between the patterns generated by each model and the corresponding brain patterns derived from the fMRI data.

The findings unequivocally underscored the advantages conferred by NSP training. The model enriched with NSP exhibited a markedly superior correspondence with human brain activity across multiple regions compared to its counterpart trained solely on word prediction. Furthermore, its operational framework seamlessly dovetails with established neural models of human discourse comprehension. These revelations shed fresh light on the intricate mechanisms underlying our brain’s processing of complex discourse, such as conversations. Notably, regions beyond the conventional left hemisphere of the brain were found to play a pivotal role in comprehending lengthier discourses. Moreover, the NSP-trained model demonstrated enhanced predictive capabilities concerning reading speed, underscoring the efficacy of simulating discourse comprehension in enabling AI to better understand human behavior.

While recent advancements in LLMs have predominantly focused on augmenting training data and model size, Prof. Li Ping advocates for a more nuanced approach. He asserts, “Relying solely on scaling poses inherent limitations. True progress lies in enhancing the efficiency of models by leveraging diverse learning tasks such as NSP, thereby moving towards a more human-like intelligence.” Importantly, these findings not only pave the way for neurocognitive researchers to leverage LLMs in unraveling higher-level language mechanisms but also foster synergistic collaborations between AI and neurocognitive domains, promising future endeavors in AI-informed brain studies and brain-inspired AI development.

Conclusion:

PolyU’s study underscores the importance of enhancing AI large language models (LLMs) through diversified training methodologies, particularly incorporating next sentence prediction (NSP). This not only fosters a closer alignment with human brain activity but also opens avenues for collaborative exploration between AI and neurocognitive research fields. Moving forward, leveraging these insights will be crucial in developing more sophisticated and human-like AI models, with profound implications for various market sectors reliant on AI technologies.

Source