Title: Microsoft AI Researchers Accidentally Expose Terabytes of Sensitive Data on GitHub
Microsoft AI researchers have inadvertently exposed large volumes of sensitive data, including private keys and passwords, while publishing an open source training data storage bucket on GitHub. Cloud security startup Wiz made this discovery during its ongoing investigation into the accidental exposure of cloud-hosted data.
In their research report shared with TechCrunch, Wiz highlighted the existence of a GitHub repository belonging to Microsoft’s AI research division. This repository provided open source code and AI models for image recognition, instructing readers to download the models from an Azure Storage URL. However, Wiz revealed that the URL was misconfigured, granting permissions to the entire storage account and inadvertently exposing additional private data.
Among the exposed data were 38 terabytes of sensitive information, including personal backups of two Microsoft employees’ personal computers. The data also included other highly sensitive personal information, such as passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages exchanged among hundreds of employees.
The misconfigured URL, which had been exposing this data since 2020, allowed full control access instead of the intended read-only permissions. This meant that anyone with knowledge of where to find this data could potentially delete, replace, or inject malicious content without detection.
Wiz clarified that the storage account itself was not directly exposed. The vulnerability arose from the Microsoft AI developers including an overly permissive shared access signature (SAS) token in the URL. SAS tokens are a mechanism used by Azure to create shareable links that provide access to an Azure Storage account’s data.
Ami Luttwak, the co-founder and CTO of Wiz, emphasized the challenges faced by developers and engineers working with vast amounts of data in their race to develop new AI solutions. Luttwak highlighted the need for additional security measures and safeguards to protect sensitive information as development teams handle massive data, share it with peers, and collaborate on public open source projects. Cases like the one discovered at Microsoft are becoming increasingly difficult to monitor and prevent.
Following Wiz’s findings, Microsoft was promptly informed on June 22 and took immediate action by revoking the SAS token two days later, on June 24. The company completed its investigation on August 16, assessing any potential impact on its organization.
In a blog post shared with TechCrunch, Microsoft’s Security Response Center assured that no customer data was exposed, and no other internal services were put at risk. As a result of Wiz’s research, Microsoft has enhanced GitHub’s secret spanning service. This service now monitors all public open source code changes, specifically checking for plaintext exposure of credentials and other secrets, including any SAS tokens with overly permissive expirations or privileges.
Microsoft’s commitment to addressing and rectifying this vulnerability demonstrates its dedication to protecting user data and preventing future incidents.
Overall, this incident underscores the significance of robust security measures in AI development, particularly when handling extensive volumes of sensitive data. Companies must continually strengthen their security protocols to avoid potentially disastrous data breaches and ensure the privacy and security of their users.