ChatGPT Vulnerability Exposes Privacy Risks: A Wake-Up Call for AI Development

TL;DR:

  • Researchers discover ChatGPT’s vulnerability to data leakage when prompted with certain queries.
  • Extractable memorization phenomenon enables adversaries to access training data.
  • A diverse range of data, including personal information, NSFW content, and copyrighted material, can be unintentionally disclosed.
  • A methodology involving an auxiliary dataset confirms data leakage.
  • ChatGPT’s susceptibility underscores concerns about AI model security and privacy.
  • Questions arise about the adequacy of current safeguards and transparency in AI development.
  • Gary Marcus highlights the need for improved AI competence in protecting privacy.

Main AI News:

In a recent preprint research paper, a startling revelation emerges regarding the potential privacy implications of OpenAI’s ChatGPT. The study sheds light on how a seemingly innocuous request to have ChatGPT repeat a single word indefinitely, such as “poem,” could inadvertently lead to the leakage of sensitive training data, including personally identifiable information and content scraped from the web. While these findings have yet to undergo peer review, they cast a shadow of doubt over the safety and security of ChatGPT and similar large language model (LLM) systems.

Justin Sherman, the founder of Global Cyber Strategies, a research and advisory firm, underscores the significance of these findings by stating, “This research underscores the inadequacy of the ‘publicly available information’ approach to web scraping and training data, emphasizing its outdated nature.

A multi-institutional team, including researchers from Google DeepMind, the University of Washington, Cornell, Carnegie Mellon, University of California Berkeley, and ETH Zurich, delves into the concept of “extractable memorization.” This phenomenon occurs when adversaries exploit machine learning models, such as ChatGPT, to extract training data. Surprisingly, models like ChatGPT, despite making their model weights and training data publicly available, are theoretically designed to prevent the model from regurgitating training data through their alignment with human feedback.

To verify if the generated output indeed originated from the training data, despite ChatGPT’s training set remaining inaccessible, the researchers took a novel approach. They compiled a massive auxiliary dataset by downloading 9 terabytes of text from the Internet and cross-referenced it with the text generated by ChatGPT. If a sequence of words matched verbatim in both datasets, it served as a reliable proxy for testing whether the generated text was part of the training data. This methodology mirrors previous efforts in training data extraction, such as manual Google searches conducted by researchers like Carlini et al.

Recovering training data from ChatGPT proved to be a challenging task. The researchers had to coax the model to “escape” from its alignment training and revert to its base model, causing it to produce responses mirroring its initial training data. When prompted to repeat the word “poem” indefinitely, ChatGPT initially complied by repeating the word “several hundred times.” However, it eventually diverged and began generating “nonsensical” information. Yet, a small fraction of the text produced was “directly copied from the pre-training data.”

Even more concerning is the fact that, with a mere $200 investment in issuing queries to ChatGPT, the authors managed to extract over 10,000 unique verbatim memorized training examples. These memorizations encompassed a wide array of content, including:

  1. Personal Identifiable Information: The chatbot inadvertently revealed the personal details of numerous individuals, including names, email addresses, phone numbers, and personal website URLs.
  2. NSFW Content: When instructed to repeat an NSFW word instead of “poem,” the authors discovered explicit content, dating websites, and content related to guns and war. These findings challenge the safety checks claimed by AI developers.
  3. Literature and Academic Articles: The model generated verbatim excerpts from published books and poems, such as Edgar Allan Poe’s “The Raven.” Additionally, it successfully extracted text snippets from academic articles and bibliographic information from various authors. The presence of proprietary content in training data raises questions regarding ownership and the protection of creators’ and academics’ rights.

Interestingly, as the researchers expanded the size of their auxiliary dataset, the magnitude of memorizations from the training data increased. However, given that this dataset does not precisely mirror ChatGPT’s actual training data, the authors believe that their findings may underestimate the extent of potential data leakage. To address this concern, they randomly selected 494 text generations by the model and conducted Google searches to determine if the exact text sequences were present on the internet. Their results indicated that 150 out of 494 generations were discoverable online, compared to the 70 found in the auxiliary dataset.

In conclusion, if these findings stand scrutiny, they unveil the alarming ease with which training data can be extracted from ostensibly sophisticated and closed-source large language model systems like ChatGPT, highlighting their vulnerability. The implications for privacy are grave, demanding immediate attention and effective mitigation measures from model developers.

This study raises fundamental questions: Are there superior alternatives for constructing training datasets that prioritize privacy? Are existing safeguards and oversight mechanisms adequate to address the concerns illuminated in this research? The answer, it appears, is uncertain. Recent policy developments, such as the Biden administration’s AI executive order, allude to concerns regarding model transparency. However, the effectiveness of future measures remains to be seen, leaving the path forward in a state of uncertainty.

Cognitive scientist and entrepreneur Gary Marcus, commenting on X (formerly Twitter), underscores the urgency of the matter, asserting that it provides additional reasons to be skeptical of contemporary AI systems. He emphasizes the critical importance of privacy as a basic human right and calls for AI to rise to the challenge and deliver on that right with competence and understanding of facts, truth, and privacy. He concludes by demanding better from the AI industry, characterizing it as a reckless force that requires significant improvement.

A figure from the preprint paper reveals the extraction of pre-training data from ChatGPT. Source: Tech Policy Press

Conclusion:

The revelation of ChatGPT’s privacy vulnerability serves as a stark warning for the AI market. As the demand for AI systems in various industries continues to grow, ensuring the security and privacy of sensitive data becomes paramount. Developers must invest in robust safeguards and transparency measures to address these concerns, and consumers should remain vigilant when using AI-powered services that handle personal information. Failure to do so could result in serious privacy breaches and legal consequences, potentially undermining the trust and adoption of AI technologies.

Source