Microsoft accidentally exposed 38 terabytes of confidential data via its AI GitHub repository

TL;DR:

  • Microsoft accidentally exposed 38 terabytes of confidential data in its AI GitHub repository.
  • The breach occurred during the publication of open-source training data.
  • The exposed data included secrets, keys, passwords, and internal messages.
  • The repository “robust-models-transfer” is now inaccessible.
  • The breach was due to an overly permissive Azure SAS token.
  • Microsoft responded promptly and found no evidence of unauthorized data exposure.
  • They revoked the token, blocked external access, and improved security measures.
  • This incident highlights the importance of stringent data security.

Main AI News:

In a recent incident that sent shockwaves through the cybersecurity landscape, Microsoft inadvertently exposed a staggering 38 terabytes of confidential data. The tech giant swiftly responded to rectify this glaring security lapse, which occurred within its AI GitHub repository.

The breach was discovered during the publishing of an open-source training data bucket, and the exposure encompassed a disk backup from two former employees’ workstations. This backup contained a treasure trove of sensitive information, including secrets, keys, passwords, and over 30,000 internal Teams messages.

The repository in question, labeled “robust-models-transfer,” has since been rendered inaccessible. Before its removal, it housed valuable source code and machine learning models associated with a 2020 research paper titled “Do Adversarially Robust ImageNet Models Transfer Better?”

The crux of the issue lay in an overly permissive SAS token—a feature from Microsoft Azure that facilitates data sharing but can be challenging to track and revoke. The breach was brought to Microsoft’s attention on June 22, 2023.

To compound matters, the repository’s README.md file mistakenly guided developers to download models from an Azure Storage URL that inadvertently granted access to the entire storage account, exposing additional confidential data.

Adding to the gravity of the situation, the misconfigured token provided “full control” permissions rather than read-only access. This meant that not only could an attacker view all the files within the storage account, but they could also delete and overwrite existing files.

Microsoft’s swift response to this breach included an investigation that found no evidence of unauthorized exposure of customer data. Additionally, no other internal services were compromised as a result of this incident. The company revoked the SAS token and blocked external access to the storage account, resolving the issue within two days of its responsible disclosure.

To bolster future security measures, Microsoft has expanded its secret scanning service to include SAS tokens with overly permissive expirations or privileges. They also identified a bug in their scanning system that flagged the specific SAS URL as a false positive.

The incident underscores the need for heightened vigilance when it comes to handling Account SAS tokens. The researchers involved stressed the importance of treating them as sensitive as the account key itself, recommending against their use for external sharing due to the potential for unnoticed token creation mistakes leading to data exposure.

This unfortunate incident is not the first time misconfigured Azure storage accounts have come to light, highlighting the ongoing challenges in maintaining robust cybersecurity defenses. As the tech industry increasingly leans on AI solutions, the need for stringent security checks and safeguards becomes paramount, particularly when dealing with vast datasets.

Ami Luttwak, CTO and co-founder of Wiz, aptly summarizes the situation: “AI unlocks huge potential for tech companies. However, as data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.”

Conclusion:

This incident highlights the critical importance of robust cybersecurity measures in the tech industry, particularly when handling vast amounts of sensitive data. Microsoft’s swift response and subsequent security enhancements demonstrate its commitment to safeguarding data. However, it serves as a stark reminder for all market players to prioritize data security and be vigilant against potential breaches in an era where data is the cornerstone of technological advancement.

Source