TL;DR:
- Joseph Saveri Law Firm files class-action lawsuits against OpenAI and Meta on behalf of authors Sarah Silverman, Christopher Golden, and Richard Kadrey.
- Lawsuits accuse both companies of illegally using copyrighted material to train AI language models.
- Allegations include violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.
- Previous legal actions by the same law firm against GitHub Copilot and AI image generators demonstrate their commitment to addressing copyright concerns in the AI domain.
- ChatGPT and LLaMA are described as “industrial-strength plagiarists” with the ability to generate text resembling copyrighted materials.
- Authors demand jury trials and seek permanent injunctive relief that could require Meta and OpenAI to modify their AI tools.
- Allegations of unauthorized access to copyrighted works from shadow libraries and the use of controversial data sets like BookCorpus are raised.
- The intentional removal of copyright-management information (CMI) raises concerns about profiting from unattributed reproductions.
- The lawsuits raise questions about whether ChatGPT and LLaMA themselves constitute infringing derivative works.
- Restitution for alleged lost profits and the protection of copyright ownership are key objectives for the authors.
Main AI News:
In a recent development, OpenAI and Meta find themselves embroiled in legal battles as the Joseph Saveri Law Firm filed class-action lawsuits on behalf of prominent authors, including Sarah Silverman, Christopher Golden, and Richard Kadrey. The lawsuits accuse both companies of illegally utilizing copyrighted material to train their AI language models, such as ChatGPT and LLaMA. The allegations involve violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.
This is not the first time the Joseph Saveri Law Firm has taken legal action against generative AI technology. In previous instances, the firm filed lawsuits concerning GitHub Copilot and AI image generators, demonstrating its commitment to addressing copyright concerns in the AI domain. The ongoing lawsuit against GitHub Copilot is currently progressing toward trial, while the Stable Diffusion lawsuit remains in the midst of procedural maneuvering, awaiting a resolution.
ChatGPT and LLaMA, referred to by the law firm as “industrial-strength plagiarists,” have been raising concerns among authors and publishers since March 2023. The AI tools’ remarkable ability to generate text resembling copyrighted materials, including excerpts from thousands of books, has prompted worries about the violation of intellectual property rights.
Sarah Silverman, Christopher Golden, and Richard Kadrey, among others, have recently filed their lawsuits in a US district court in San Francisco. Demanding jury trials, the authors seek permanent injunctive relief that could potentially compel Meta and OpenAI to make alterations to their AI tools.
A spokesperson for the Saveri Law Firm emphasized the significance of the lawsuits, stating, “If this alleged behavior is allowed to continue, these models will eventually replace the authors whose stolen works power these AI products with whom they are competing. This novel suit represents a larger fight for preserving ownership rights for all artists and other creators.”
Meta and OpenAI have faced allegations regarding the origin of the data sets used to train LLaMA and ChatGPT. Authors claim to have deduced the likely sources based on statements and papers released by the companies and related researchers. It is alleged that both OpenAI and Meta accessed copyrighted materials without consent or permission by downloading works from prominent e-book pirate sites.
According to the OpenAI lawsuit, ChatGPT appears to have been trained on a data set consisting of 294,000 books downloaded from well-known “shadow library” websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. Meta, on the other hand, has disclosed that LLaMA was trained on the part of a data set known as ThePile, which allegedly includes the entirety of Bibliotik and encompasses 196,640 books.
In addition to the unauthorized use of copyrighted works from shadow libraries, OpenAI is accused of utilizing a controversial data set called BookCorpus. Assembled in 2015, this data set allegedly included self-published novels from Smashwords, a website offering free books to readers. However, these novels are still protected by copyright, and it is claimed that they were copied into the BookCorpus data set without the authors’ consent, credit, or compensation.
The lawsuits argue that OpenAI and Meta infringed upon the copyrights of authors such as Sarah Silverman, Christopher Golden, and Richard Kadrey by utilizing these “flagrantly illegal” data sets. The authors point out that the accuracy with which ChatGPT can summarize specific copyrighted books suggests that the AI model retained knowledge from the training data set. Furthermore, the intentional removal of copyright-management information (CMI) raises concerns about companies profiting unfairly from unattributed reproductions of stolen writing and ideas.
The lawsuits raise “numerous questions of law,” including whether ChatGPT and LLaMA themselves constitute infringing derivative works based on the works of thousands of authors. The authors are not only seeking damages but also restitution for alleged lost profits, particularly as Meta plans to release a commercial version of LLaMA in the near future.
Joseph Saveri and Matthew Butterick, the lawyers representing the authors, emphasized the gravity of the situation, stating, “Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation.” The legal actions aim to address these copyright concerns and enforce the rights of authors in the face of advancing AI technology.
Conclusion:
The class-action lawsuits against OpenAI and Meta signify a significant legal challenge in the AI market. The accusations of copyright infringement and the alleged use of unauthorized data sets highlight the importance of protecting intellectual property rights. These lawsuits could have implications for the future of AI development, emphasizing the need for companies to ensure they are using legally obtained training data and respecting copyright laws. Authors are asserting their rights and demanding accountability from AI companies, indicating a growing awareness of the potential impact of AI on creative industries and the necessity to safeguard artistic ownership in the face of advancing technology. Businesses in the AI market should take note of these legal developments and ensure compliance with copyright laws to avoid similar legal repercussions and maintain trust with content creators.