TL;DR:
- Fan fiction writers and artists are rebelling against AI systems that use their work without permission.
- Social media platforms, news organizations, authors, and actors are also protesting against AI data collection.
- Protests take various forms, including locking files, boycotting websites, and filing lawsuits.
- Tech companies have been scraping the internet for data to train their AI systems.
- Legal actions have been taken against AI companies for the unauthorized use of creative works.
- Larger companies are also pushing back by charging for access to data.
- The revolt highlights the value of online information and the need to protect creators’ interests.
- The impact on the market is yet to be fully determined, but smaller AI startups and nonprofits may face challenges in obtaining data for training their systems.
Main AI News:
In a stunning turn of events, discontent has erupted among various groups, including fan fiction writers and renowned artists, against the pervasive influence of artificial intelligence (AI) systems. As the fever over this transformative technology grips Silicon Valley and the world, voices are rising against the unauthorized collection and exploitation of creative work by data companies and tech giants.
One such tale of discontent involves Kit Loffstadt, a talented voice actor from South Yorkshire in Britain, who, for more than two decades, has captivated readers with her fan fiction stories set in alternate universes of popular franchises like “Star Wars” and “Buffy the Vampire Slayer.” However, her enthusiasm waned when she discovered that a data company had shamelessly copied her stories and fed them into the underlying technology of ChatGPT, a viral chatbot. Appalled by this blatant violation, she resorted to hiding her writing behind a locked account, shielding her creativity from unscrupulous data harvesters.
Joining forces with dozens of other fan fiction writers, Ms. Loffstadt orchestrated an act of rebellion. Together, they flooded the internet with a deluge of irreverent stories, aimed at overwhelming and confusing the data-collection services that feed writers’ work into AI technology. Fueled by a shared determination, they are determined to demonstrate that the fruits of their imagination are not mere commodities for machines to exploit.
However, fan fiction writers are not the only ones raising their voices against AI systems. Social media platforms such as Reddit and Twitter, prestigious news organizations like The New York Times and NBC News, and even acclaimed authors like Paul Tremblay and Sarah Silverman have all taken a stand against the unsanctioned appropriation of their data by AI.
Their resistance takes various forms, depending on their creative medium. Some writers and artists have resorted to locking their files, safeguarding their creations from unauthorized use. Others have chosen to boycott websites that disseminate A.I.-generated content. In a bold move, Reddit is even considering charging for access to its valuable data. The mounting discontent has led to a surge in lawsuits against AI companies, with at least ten cases filed this year alone, accusing them of training their systems on artists’ creative works without consent. Notably, OpenAI, the creator of ChatGPT, has recently faced legal action from figures such as Sarah Silverman, Christopher Golden, and Richard Kadrey regarding the AI’s unauthorized use of their work.
At the core of these rebellions lies a newfound realization that online information, including stories, artwork, news articles, message board posts, and photos, possesses significant untapped value. The advent of generative AI, which produces text, images, and other content, has spurred tech companies like Google, Meta, and OpenAI to scour the internet for more data to feed their ever-growing systems. They have harvested vast amounts of information from diverse sources such as fan fiction databases, news archives, and online book collections, often available freely. In industry parlance, this practice is known as “scraping” the internet. OpenAI’s GPT -3, launched in 2020, is a mammoth AI system trained on an astonishing 500 billion “tokens,” each representing segments of words predominantly sourced from online material. Some AI models even span over one trillion tokens.
While scraping the internet has been an established practice, the unveiling of ChatGPT in November brought the underlying AI models into the public eye. The revelation sparked a fundamental realignment of the value attributed to data. According to Brandon Duderstadt, founder and CEO of Nomic, an AI company, the prevailing notion of deriving value from data by making it freely available to everyone and running ads has shifted. The prevailing sentiment now leans towards locking up data to extract greater value by utilizing it as an input for AI systems.
These data revolts may not significantly impact the long-term prospects of deep-pocketed tech giants like Google and Microsoft, as they possess substantial proprietary information and the resources to secure more data through licensing. However, as the era of easily accessible content comes to an end, smaller AI startups and nonprofits that hope to compete with industry giants may find it increasingly challenging to obtain sufficient content to train their systems effectively.
OpenAI responded to the uproar, stating that ChatGPT was trained on a combination of “licensed content, publicly available content, and content created by human AI trainers.” The company expressed its commitment to respecting the rights of creators and authors while aiming to collaborate with them to safeguard their interests. Google, on the other hand, revealed ongoing discussions on how publishers could manage their content in the future, emphasizing the importance of a vibrant content ecosystem. As of this writing, Microsoft has yet to respond to inquiries.
The data rebellions gained momentum in the wake of ChatGPT’s meteoric rise to prominence. In November, a group of programmers filed a proposed class-action lawsuit against Microsoft and OpenAI, alleging copyright violations after their code was employed to train an A.I.-powered programming assistant. Getty Images, a prominent stock photo and video provider, later sued Stability AI, accusing the startup of using copyrighted photos to train its systems. Subsequently, Clarkson, a Los Angeles-based law firm, initiated a 151-page proposed class-action suit against OpenAI and Microsoft, claiming that OpenAI had collected data from minors and asserting that web scraping infringed upon copyright law and amounted to “theft.” Clarkson also recently filed a similar suit against Google.
Ryan Clarkson, the founder of Clarkson, characterized the ongoing data rebellion as society’s pushback against the notion that Big Tech is entitled to indiscriminately appropriate information from any source and claim it as its own. While legal experts like Eric Goldman, a professor at Santa Clara University School of Law, anticipate that the court may not accept the expansive arguments put forth in the lawsuits, he acknowledges that this wave of litigation represents just the beginning, with subsequent waves expected to shape the future of AI.
Not only individual creators but also larger corporations are beginning to resist AI scrapers. In April, Reddit announced its intention to charge for access to its application programming interface (API), which enables third parties to download and analyze the platform’s extensive database of person-to-person conversations. Steve Huffman, Reddit’s CEO, stressed that the company should not be giving away substantial value to the world’s largest companies without compensation. Similarly, Stack Overflow, a renowned question-and-answer platform for programmers, has also decided to request payment from AI companies for accessing its data. Wired previously reported on Stack Overflow’s initiative, which comes as a response to the evolving landscape.
Even news organizations are taking a stand against AI systems. In an internal memo issued in June, The New York Times emphasized that AI companies should respect the intellectual property of the organization. While the Times declined to elaborate on the matter, it is clear that the publication is determined to protect its creative output.
For individual artists and writers, fighting back against AI systems entails reevaluating their publishing strategies. Nicholas Kole, an esteemed illustrator from Vancouver, British Columbia, was deeply troubled by the ease with which an AI system replicated his distinctive art style, leading him to suspect that his work had been scraped. Although he continues to share his creations on platforms like Instagram and Twitter to attract clients, he has ceased publishing on sites like ArtStation that host A.I.-generated content alongside human-generated content. To him and many others, this sense of wanton theft instills a profound existential dread.
At Archive of Our Own, a vast fan fiction database boasting over 11 million stories, writers have intensified their pressure on the site to banish data-scraping and A.I.-generated stories. In May, when Twitter accounts shared examples of ChatGPT mimicking the style of popular fan fiction found on Archive of Our Own, dozens of writers united in outrage. They promptly blocked access to their stories and crafted subversive content aimed at misleading AI scrapers. Furthermore, they rallied behind the cause of ceasing the allowance of A.I.-generated content on Archive of Our Own.
Betsy Rosenblatt, a legal advisor to Archive of Our Own and a professor at the University of Tulsa College of Law, explained that the site adheres to a policy of “maximum inclusivity” and refrains from determining which stories were authored with the assistance of AI.
For Ms. Loffstadt, the fan fiction writer leading the charge, her battle against AI unfolded while she was immersed in crafting a story set in the video game “Horizon Zero Dawn.” In this post-apocalyptic world, humans clash with A.I.-powered robots, some of which are benevolent, while others are malicious. Ms. Loffstadt noted that, tragically, the real world had witnessed the twisting of AI’s potential due to corporate greed and hubris, turning it into a tool for nefarious purposes.
Conclusion:
The data rebellions and resistance against AI systems signify a fundamental shift in the perception and value of online information. The protests highlight the need to protect the creative works of artists, writers, and content creators from unauthorized use and exploitation by AI companies. This growing movement may have implications for the market, potentially leading to a reevaluation of data ownership and access. Smaller AI startups and nonprofits may face difficulties in acquiring sufficient data to compete with industry giants, while larger companies may explore new approaches to monetize and protect their valuable data assets. The outcome of these developments will shape the future of the AI industry and its relationship with content creators and data providers.