Cornell University study reveals ChatGPT’s ability to memorize entire poems

TL;DR:

  • Cornell University study reveals ChatGPT’s capacity to memorize entire poems, particularly famous ones found online.
  • Ethical concerns arise regarding the training of AI models like ChatGPT, which may use data scraped from the internet.
  • Research explores the legal intricacies of poems, often technically under copyright but widely available.
  • Norton Anthology inclusion emerges as a key predictor of poem memorization.
  • ChatGPT’s responses evolve over time, becoming more cautious about copyrighted material.
  • Future research will examine chatbots’ responses in different languages and factors influencing poem memorization.

Main AI News:

In a recent investigation conducted by Cornell University researchers, it has been revealed that ChatGPT, the widely recognized AI language model, has the remarkable ability to effortlessly commit entire poems to memory. This capability is especially pronounced when it comes to well-known poems that are readily available online. The implications of this discovery raise substantial ethical concerns regarding the training of proprietary AI models like ChatGPT, which are likely fed with data scraped from the internet.

Lyra D’Souza, the first author of the study and a former computer science major, pointed out that the memorization of large text portions is not without its privacy implications. “We don’t know what they’re trained on, and a lot of times, private companies can train proprietary models on our private data,” she emphasized.

This groundbreaking research, titled “The Chatbot and the Canon: Poetry Memorization in LLMs,” was presented by D’Souza at the Computational Humanities Research Conference. The choice of poetry as the subject of study was strategic, as poems are concise enough to fit within the context size of a language model. Furthermore, the legal status of many poems is intricate, as they may technically be under copyright protection, even though they are widely accessible through reputable sources like the Poetry Foundation.

D’Souza conducted extensive testing of ChatGPT’s poem-retrieval abilities, comparing it with three other language models: PaLM from Google AI, Pythia from the non-profit AI research institute EleutherAI, and GPT-2, an earlier iteration of the model that eventually gave rise to ChatGPT, developed by OpenAI. She curated a selection of poems authored by 60 American poets from various eras, backgrounds, and levels of recognition and prompted the models to provide the text of these poems.

One noteworthy discovery from the study was that the most reliable predictor of a poem being memorized was whether it had appeared in a Norton Anthology of Poetry, particularly the 1983 edition. Interestingly, ChatGPT’s responses evolved over time. In February 2023, when initially queried, the chatbot would not admit to not knowing a poem; instead, it would invent one or recycle a poem from a different author. By July 2023, however, if ChatGPT was unfamiliar with a poem, it would inquire whether the poem even existed, deflecting responsibility onto the user.

Furthermore, there were shifts in ChatGPT’s response due to copyright concerns. While in February, the model had no qualms about producing copyrighted poems, by July, it occasionally declined and stated it couldn’t provide a copyrighted poem. Nevertheless, it would typically reproduce the poem if asked again, according to D’Souza’s observations.

The study’s focus was primarily on American poets, but the next phase of research aims to investigate how chatbots respond to requests in different languages and whether factors such as poem length, meter, and rhyming patterns influence their likelihood of memorization, D’Souza revealed.

As ChatGPT continues to play a pivotal role in our lives, D’Souza stressed the importance of responsibly and transparently utilizing this powerful new tool. “Figuring out how to use it responsibly and use it transparently is going to be really important,” she emphasized.

Conclusion:

The study sheds light on ChatGPT’s remarkable poem memorization ability and the ethical questions it raises regarding AI training data sources. This development underscores the need for responsible and transparent use of AI in various market sectors, as it continues to play an increasingly significant role in our lives.

Source