Google Unveils Opt-Out Tool for Publishers in AI Data Training

TL;DR:

  • Google introduced Google-Extended, a tool for publishers to control AI data training usage.
  • Websites can stay accessible on Google Search while protecting data from AI model training.
  • Publishers can manage content access and influence Bard and Vertex AI generative APIs.
  • Google-Extended operates through robots.txt, ensuring choice and control for web publishers.
  • Google remains committed to exploring more machine-readable options for publishers.
  • Renowned websites like The New York Times legally block Google from using their content for AI training.

Main AI News:

In a strategic move, Google has introduced a novel solution for website publishers, empowering them to exert control over the usage of their data in training Google’s AI models. This innovative tool, aptly named Google-Extended, enables websites to maintain their accessibility on Google Search while safeguarding their data from being harnessed in the evolution of AI models.

Google-Extended grants publishers the authority to oversee whether their websites contribute to the refinement of Bard and Vertex AI generative APIs. Furthermore, web publishers can wield this toggle to effectively manage content access on their sites. As of July, Google has been actively training its AI chatbot, Bard, utilizing publicly available data harvested from the web.

Operated through robots.txt, the same text file that guides web crawlers in accessing specific sites, Google-Extended offers a nuanced approach to maintaining choice and control for web publishers. Google emphasizes its commitment to exploring additional machine-readable methods for web publishers as AI applications continue to expand, promising further developments in the near future.

This development comes amidst growing efforts by various websites to obstruct the web crawlers utilized by organizations like OpenAI for data scraping and ChatGPT training. Renowned platforms such as The New York Times, CNN, Reuters, and Medium have taken steps to curtail access to their content. Nevertheless, the challenge remains in regulating Google’s crawlers, as completely denying them access could result in websites being excluded from search indexing. As a countermeasure, some websites, like The New York Times, have resorted to legal means by updating their terms of service to prohibit companies from employing their content for AI training.

Conclusion:

Google’s introduction of Google-Extended empowers publishers to strike a balance between maintaining accessibility on Google Search and safeguarding their data from AI model training. This development acknowledges the growing concern for data privacy in AI applications and signifies a shift towards greater autonomy for web publishers in the evolving market landscape.

Source