Unraveling Linguistic Enigmas: AI’s Quest in The New York Times Connections Puzzle

  • NYU Tandon researchers investigate AI’s ability to solve The New York Times’ Connections puzzle.
  • Two AI approaches, leveraging GPT-3.5, GPT-4, and sentence embedding models, were explored.
  • Results show GPT-4 outperformed other models, solving 29% of puzzles.
  • Guiding GPT-4 through puzzles step-by-step boosted its performance to over 39%.
  • Researchers explore the potential of AI models like GPT-4 in aiding humans to create new word puzzles.

Main AI News:

Can artificial intelligence (AI) rival human prowess in uncovering intricate connections among words? NYU Tandon School of Engineering researchers delved into the daily Connections puzzle from The New York Times to probe this question.

The Connections puzzle challenges players to categorize 16 words into four cohesive sets, progressing from simple associations to more abstract connections. In a forthcoming presentation at the IEEE 2024 Conference on Games in Milan, Italy, researchers will unveil their investigation into whether modern natural language processing (NLP) systems can crack these linguistic conundrums. The study’s findings are also available on the arXiv preprint server.

Led by Julian Togelius, NYU Tandon’s Associate Professor of Computer Science and Engineering (CSE) and Director of the Game Innovation Lab, the team explored two AI methodologies. The first harnessed the power of GPT-3.5 and the newly-introduced GPT-4, formidable large language models (LLMs) from OpenAI, renowned for their human-like language comprehension and generation abilities.

The second method employed sentence embedding models, such as BERT, RoBERTa, MPNet, and MiniLM, which encode semantic data as vector representations but lack the comprehensive language understanding of LLMs.

Results revealed that while AI systems could tackle some Connections puzzles, the task remained formidable overall. GPT-4 outperformed other models, solving approximately 29% of puzzles but fell short of mastery. Interestingly, the models mirrored human proficiency across the puzzle’s difficulty spectrum, from simple to tricky.

LLMs are increasingly ubiquitous, and analyzing their limitations in tackling the Connections puzzle sheds light on how they process semantic information,” remarked Graham Todd, Ph.D. student in the Game Innovation Lab and lead author of the study.

Further analysis showed that guiding GPT-4 through puzzles step-by-step significantly enhanced its performance, with over 39% of puzzles solved.

Our research underscores the efficacy of ‘chain-of-thought’ prompting in encouraging structured thinking in language models,” noted Timothy Merino, Ph.D. student in the Game Innovation Lab and study author. “Prompting these models to reason about their tasks enhances their performance.”

Beyond assessing AI capabilities, researchers are investigating whether models like GPT-4 could aid humans in crafting original word puzzles. This endeavor could push the boundaries of machine learning systems in conceptual representation and contextual inference.

Conclusion:

The study underscores the evolving capabilities of AI in deciphering complex linguistic challenges like The New York Times’ Connections puzzle. GPT-4’s ability to outperform other models signals a promising trajectory for AI’s role in linguistic tasks, potentially revolutionizing language-based industries such as education, gaming, and content generation. However, further research is needed to fully unlock the potential of AI systems in enhancing human creativity and problem-solving abilities.

Source