Tech Giants Face Legal Battle Over Alleged Use of Pirated Books in AI Development

TL;DR:

Microsoft, Meta, and Bloomberg face a class-action lawsuit for allegedly using copyrighted books in AI development without permission.
The lawsuit centers around the use of a plaintext dataset called Books3, containing data from 197,000 books, to train AI systems.
Large Language Models (LLMs) play a pivotal role in AI development by understanding and generating human language.
Tech companies like Microsoft and Meta benefit from LLMs by reducing reliance on human labor and creating personalized offerings.
The dataset Books3 was later incorporated into “The Pile” and hosted by EleutherAI, a grassroots collective.
Meta’s Large Language Model Meta AI (LLaMa) and Bloomberg’s AI system allegedly used the dataset without acknowledging copyrighted content.
The lawsuit includes claims of copyright infringement, conversion, and negligence, seeking class certification and damages.

Main AI News:

A recent lawsuit filed in the US District Court for the Southern District of New York has sent shockwaves through the tech industry. Microsoft, Meta (formerly Facebook), and Bloomberg stand accused of utilizing copyrighted books without permission in the development of their Artificial Intelligence (AI) technology. This class action, led by prominent authors including former Arkansas Governor and presidential candidate Mike Huckabee, alleges that their intellectual property was employed in training AI systems, using a plaintext dataset known as Books3.

Books3, a collection of data scraped from approximately 197,000 nonfiction books and novels published within the last two decades, was initially compiled by independent developer Shawn Presser. Presser, along with a team of developers, intended to make this dataset available for the creation of generative AI tools. These tools, known as Large Language Models (LLMs), are instrumental in understanding and generating human language, making them indispensable in AI development.

LLMs have become a cornerstone of AI technology, providing businesses with the means to enhance profitability through personalized offerings and reducing reliance on human labor. These models are trained on extensive datasets that draw from various sources, including the internet, books, and articles.

Books3 eventually became a component of a more extensive dataset called “The Pile,” which was hosted on the internet by EleutherAI, a grassroots collective of natural language processing researchers. EleutherAI, also named in the complaint, presented the dataset as a “free, open-source data set for the training of LLMs.“

Meta, through its Large Language Model Meta AI (LLaMa), incorporated “The Pile” into its original dataset. However, according to the lawsuit, Meta failed to acknowledge the presence of copyrighted works within the dataset. The initial release of LLaMa was powerful, and Meta later announced its partnership with Microsoft in July 2023, with the specific goal of competing with OpenAI’s ChatGPT.

In August 2023, Meta unveiled LLaMa 2, an upgraded version of the model. Notably, the company made no mention that “The Pile” dataset contained copyrighted works. The authors’ attorneys argue that Microsoft and Meta significantly benefited from earlier iterations of the LLM, as they did not have to invest additional resources in training their LLM from scratch with their own content or properly licensed material.

Following the release of OpenAI GPT-3, Bloomberg also entered the arena by working on its own LLM, announced in March. The lawsuit alleges that Bloomberg used “Books3” to train its LLM in understanding and responding in natural language. Despite stating that it would not use the dataset in future versions of “BloombergGPT,” the authors argue that their copyrighted works have already been integrated into Bloomberg’s LLM and all subsequent versions.

The lawsuit asserts that the copyright-protected works now serve as a foundation for all future LLM models developed by these companies. Furthermore, it claims that none of the defendants sought or obtained licenses to use the copyrighted works from Books3, and they were aware that those responsible for assembling Books3 lacked the necessary licenses for dissemination.

The authors, therefore, allege that Microsoft, Meta, and Bloomberg deliberately used pirated and stolen works to profit from AI development, causing harm to the plaintiffs and the class. The lawsuit includes claims of direct and vicarious copyright infringement, conversion, and negligence. The authors seek class certification, an injunction to prevent the defendants from using their works in AI training, and damages.

Conclusion:

This legal battle highlights the significance of copyright protection in the AI industry. It could influence how companies source data for AI development and have broader implications for the market’s approach to intellectual property rights in AI technology. Companies may need to be more cautious about using datasets without proper licensing, potentially impacting their AI innovation strategies.

Source

OpenAI Fast-Tracks Release of New AI Model “Strawberry,” Focuses on Advanced Reasoning

Revolutionizing AI: Efficient Diffusion Models for High-Dimensional Data

Digital Dubai Partners with RIT Dubai to Advance AI Skills and Drive Digital Transformation

CAST AI Launches Enhanced Kubernetes Security Solution to Boost Runtime Threat Detection

Dubai’s AI Hub: Paving the Way for Global Technological Leadership

Glean Technologies Secures $260M in Series E Funding, Valued at $4.6B as Enterprise AI Adoption Grows

Dubai’s AI Hub: Paving the Way for Global Technological Leadership

AI’s Role in Transforming the Banking Industry

Fintech: The Future of Finance and Technology Careers

AI’s Impact on the Workforce: Risks, Opportunities, and the Path Forward

Ford’s Advanced Technologies Aim to Tackle Quality Issues and Boost Efficiency

Aifleet Secures $16.6M to Revolutionize Trucking Industry with AI Solutions

SiMa Technologies Advances Edge AI with High-Performance Multimodal Chip

Microsoft’s FPDT Breakthrough Extends Long-Context LLM Training Capabilities

Apple Intelligence: Will Delays Impact the iPhone 16’s Supercycle Potential?

AI’s Role in Defense: Opportunities and Challenges Ahead

JFrog and Nvidia Partner to Secure AI Models with New Runtime Security Solution

ServiceNow Unveils Advanced AI Features and Platform Enhancements to Boost Enterprise Productivity

Med-MoE: A Scalable AI Framework Revolutionizing Healthcare Efficiency

Deloitte Launches AI Factory as a Service, Partnering with NVIDIA and Oracle for Scalable AI Solutions

Vietnam’s AI Rise: A Path Toward Technological Independence

AI Unlocks Pig Communication: A Step Toward Better Animal Welfare

Abu Dhabi’s Sustainable Aquaculture Initiative: A New Approach to Marine Conservation and Economic Growth

Rising AI Demand Escalates Water Consumption in Data Centers, Poses Sustainability Concerns

Leaf: Modernizing Farm Data Management with Cutting-Edge Technology

Tech Giants Face Legal Battle Over Alleged Use of Pirated Books in AI Development

TL;DR:

Main AI News:

Conclusion:

Tech Giants Face Legal Battle Over Alleged Use of Pirated Books in AI Development

TL;DR:

Main AI News:

Conclusion:

Subscribe Now