Fairly Trained: A Nonprofit Initiative Addressing AI Copyright Issues

TL;DR:

  • Ed Newton-Rex left his executive role at the AI company Stability AI due to a dispute over copyright issues.
  • The use of copyrighted material to train AI models has led to legal battles between creators and tech giants.
  • ‘Fairly Trained,’ a nonprofit founded by Newton-Rex, certifies AI companies that use data with creators’ consent for model training.
  • Nine AI models, including those by Endel, have been certified by ‘Fairly Trained.’
  • Critics raise concerns about the trust-based certification process.
  • Major AI companies, like OpenAI, have faced legal challenges over copyright issues.
  • OpenAI is working to license news articles for AI training data.

Main AI News:

In the world of artificial intelligence, ethical concerns and copyright issues have come to the forefront of discussion. The debate over the use of copyrighted material to train AI models has ignited fierce battles between creators and tech giants. Ed Newton-Rex, a former executive at a prominent AI company, made headlines last year when he publicly resigned from his position due to a fundamental disagreement over their approach to copyright.

The company in question, Stability AI, is renowned for its popular AI image generation model, Stable Diffusion. However, controversy surrounds its methodology, as it trained the model by utilizing millions of images “scraped” from the internet without the consent of their original creators. The prevailing argument within many leading AI companies, including Stability, is that this practice falls under the umbrella of “fair use” of copyrighted work and is thus legally acceptable.

Newton-Rex, who led Stability’s audio team, vehemently disagreed. In a bold statement on X, formerly known as Twitter, he declared, “Companies with valuations in the billions are employing generative AI models trained on creators’ works without permission. These models are then used to generate new content that can directly compete with the original creations. I fail to see how this aligns with a society that values the economic rights of creative artists, who heavily rely on copyright protection.”

This clash of ideologies marked the opening salvo in a raging battle over the utilization of copyrighted material to train AI systems. Notably, in December, The New York Times filed a lawsuit against OpenAI in a Manhattan court, alleging that OpenAI had illegally utilized millions of the newspaper’s articles to train AI systems, essentially positioning them as competitors to the Times as a trusted source of information.

In a separate legal confrontation, comedian Sarah Silverman and other writers took legal action against both OpenAI and Meta in July 2023, accusing these tech giants of employing their written works to train AI models without obtaining proper consent. The turmoil extended further, with artists Kelly McKernan, Sarah Andersen, and Karla Orti suing Midjourney, Stability AI, and DeviantArt – companies specializing in image-generating AI models. These artists alleged that their creations served as the basis for these AI models’ training, a practice they vigorously opposed.

Visual artists have also retaliated by embracing innovative tools designed to “taint” AI models trained on their work without consent. These tools disrupt the models, rendering them unpredictable and resistant to mimicking the artists’ unique styles.

OpenAI, amidst the legal disputes, maintained that it believed The New York Times’ lawsuit held “no merit.” Additionally, the company asserted that although it considered data scraping from the internet for training purposes as fair use, it offered publishers an opt-out option as a matter of ethical principle.

However, the narrative took a significant turn on January 17, when Ed Newton-Rex unveiled a groundbreaking initiative aimed at encouraging AI companies to uphold creators’ rights. His nonprofit organization, ‘Fairly Trained,’ introduced a certification system designed to reward AI companies that exclusively employ data with the consent of its creators for model training. By elevating companies with ethical data-sourcing practices, Newton-Rex hopes to drive the entire AI ecosystem towards a more equitable treatment of creators. “There is a strong ethical dimension to this industry, and the purpose of this certification is to spotlight that,” Newton-Rex emphasized in an interview with TIME.

To coincide with its launch, ‘Fairly Trained’ certified nine AI models, most of which belong to companies operating in the music-generation domain. Among the certified models are those developed by Endel, a company specializing in “sound wellness,” known for its collaborations with artists such as Grimes and James Blake. The certification signifies that these companies have diligently acquired legal licenses for the data used to train their models, dispelling any reliance on the notion of fair use.

Ed Newton-Rex, who is not only a technology expert but also a classical composer of choral music, revealed that his personal connection to the creative arts fueled his determination to champion creators’ rights. “This issue has always resonated with me, undoubtedly stemming from my own experiences as a musician,” he acknowledged. “Understanding the emotions involved in pouring your heart and soul into a creation, only to receive modest royalties while AI companies amass billions, is a sentiment shared by countless artists.”

Newton-Rex further explained, “Many creators, including myself, would vehemently reject the notion of their work being exploited by a company for profit without adequate compensation. However, if the opportunity exists for consent, negotiation of terms, and the potential to earn a fair share, it could be a transformative development.”

Critics have raised concerns about ‘Fairly Trained’s’ reliance on trust when certifying companies, as it does not require access to their datasets for auditing. Instead, it demands detailed written submissions from companies outlining the contents and sources of their datasets, as well as their due diligence processes and record-keeping practices. Newton-Rex defended this approach, stating, “Trust plays a pivotal role here, especially at the outset. Companies will be disincentivized from providing inaccurate information, as it could lead to decertification down the line.”

Nevertheless, some experts argue that trusting companies’ claims about their datasets’ contents and origins may leave room for exploitation. Abeba Birhane, a scholar specializing in the examination of large datasets used to train AI systems, contended, “Verifying the adequacy of this approach requires access to the datasets themselves. It’s challenging to assess its sufficiency without direct examination.

Against the backdrop of these developments, it’s worth noting that most major AI companies, including OpenAI, Google DeepMind, Meta, and Anthropic, have traditionally kept the contents and details of their datasets closely guarded. This lack of transparency has posed significant challenges for creators seeking to ascertain whether their work has been used to train AI models without their consent.

In a bid to address some of these concerns, OpenAI has initiated agreements with various news organizations, such as the Associated Press and Axel Springer, to license news articles for use as training data. Additionally, ongoing discussions with other prominent news outlets, including CNN, Fox, and TIME, are reportedly underway.

Conclusion:

The emergence of ‘Fairly Trained’ and the ongoing copyright debate in the AI sector reflect a growing demand for ethical data practices. AI companies must consider the implications of their data usage on creators’ rights and adapt their strategies accordingly to maintain public trust and align with evolving ethical standards. This shift underscores the need for transparency and responsible data sourcing in the AI market, potentially reshaping the industry’s future dynamics.

Source