More than 9,000 authors have penned an open letter voicing concern over tech companies using copyrighted works to train chatbots without consent or compensation

TL;DR:

  • Over 9,000 authors sign an open letter, expressing concerns about tech companies using copyright-protected works to train chatbots without permission, credit, or compensation.
  • Generative AI powered by large language models extracts massive amounts of text to produce natural responses to user prompts.
  • Authors demand fair compensation and permission for the use of their works in AI programs to prevent flooding the market with machine-written content.
  • Legal actions arise, with authors like Mona Awad and Paul Tremblay suing OpenAI for using their novels without consent.
  • The Authors Guild seeks to protect intellectual property and ensure ethical collaboration between AI companies and authors.

Main AI News:

In a bold move, over 9,000 authors have united to express their dismay with tech companies behind generative AI. Their collective voice echoes through an open letter that shines a light on an inherent injustice: the exploitation of copyright-protected works to train chatbots without any form of consent, credit, or compensation.

The heart of the issue lies in the prowess of AI, particularly GPT-4, which can expertly summarize works by renowned authors like Roxane Gay or Margaret Atwood, going as far as breaking down the content chapter by chapter. Users can even request ChatGPT to craft a compelling story emulating the style of literary legends such as Maya Angelou, resulting in an impressive outcome.

At the core of this generative AI lies two powerful software programs known as large language models. These models discard traditional programming techniques and instead, feed on vast amounts of text to produce responses that seem natural and lifelike.

In their poignant letter, the Authors Guild asserts that “Generative AI technologies built on large language models owe their existence to our writings. These technologies mimic and regurgitate our language, stories, style, and ideas. Millions of copyrighted books, articles, essays, and poetry provide the ‘food’ for AI systems, endless meals for which there has been no bill.” The crux of their argument revolves around the fact that tech giants, including OpenAI, Alphabet, Meta, Stability AI, IBM, and Microsoft, have invested billions in developing AI technology. Thus, the authors advocate for fair compensation since, without their literary works, AI would be nothing more than mundane and severely limited.

Praising the Authors Guild’s endeavors, celebrated novelist and essayist, Jonathan Franzen, emphasizes the importance of advancing the rights of all Americans. Their data, words, and images have been exploited for massive profits, often without their consent, affecting nearly every American over the age of six.

Prominent authors, such as Dan Brown, James Patterson, Margaret Atwood, Roxane Gay, Celeste Ng, Viet Thanh Nguyen, George Saunders, and Rebecca Makkai, are among the multitude calling on AI industry leaders to address their concerns and take decisive action. The demands are straightforward:

  1. Obtain permission for the use of copyrighted material in generative AI programs.
  2. Fairly compensate writers for both past and ongoing use of their works in generative AI programs.
  3. Fairly compensate writers for the use of their works in AI output, regardless of whether the outputs infringe upon current laws.

The letter also acknowledges that many books used to develop AI systems have origins in notorious piracy websites, making it essential to disapprove of such actions even in the context of fair use.

Furthermore, the Authors Guild raises concerns about generative AI inundating the market with mediocre, machine-written content, thereby negatively impacting authors’ livelihoods. Over the past decade, authors have witnessed a significant 40% decline in income, pushing many full-time writers below the federal poverty level.

Adding fuel to the fire, renowned novelists Mona Awad and Paul Tremblay recently filed a lawsuit against OpenAI, alleging that ChatGPT was trained, in part, by “ingesting” their novels without consent. The evidence is compelling, with ChatGPT providing detailed summaries of their books upon prompting.

This legal battle exposes OpenAI’s use of a controversial dataset called BookCorpus, assembled by AI researchers in 2015. This collection contained over 7,000 unique unpublished books, sourced from websites like Smashwords.com, hosting novels under copyright.

Subsequent iterations of OpenAI’s language models, such as GPT-3, were trained on a much larger set of copyright-protected books known as “Books1” and “Books2.” The scale of this training data is staggering, with Books1 containing around 63,000 titles and Books2 encompassing approximately 294,000 titles.

As AI continues to master the art of drawing information from the web to create new content, experts predict an influx of legal actions that will shape the future of AI’s relationship with copyrighted works.

Conclusion:

The united stand of over 9,000 authors signifies a pivotal moment for the market. It brings into focus the ethical implications of AI companies exploiting copyrighted works without proper compensation or consent. As the demand for AI-generated content grows, the market must adapt to address the concerns raised by the creative community. Striking a fair balance between technological innovation and respecting intellectual property rights is crucial for the future growth and sustainability of the AI industry.

Source