Microsoft AI Researchers Accidentally Expose Terabytes of Sensitive Data on GitHub

Date:

Title: Microsoft AI Researchers Accidentally Expose Terabytes of Sensitive Data on GitHub

Microsoft AI researchers have inadvertently exposed large volumes of sensitive data, including private keys and passwords, while publishing an open source training data storage bucket on GitHub. Cloud security startup Wiz made this discovery during its ongoing investigation into the accidental exposure of cloud-hosted data.

In their research report shared with TechCrunch, Wiz highlighted the existence of a GitHub repository belonging to Microsoft’s AI research division. This repository provided open source code and AI models for image recognition, instructing readers to download the models from an Azure Storage URL. However, Wiz revealed that the URL was misconfigured, granting permissions to the entire storage account and inadvertently exposing additional private data.

Among the exposed data were 38 terabytes of sensitive information, including personal backups of two Microsoft employees’ personal computers. The data also included other highly sensitive personal information, such as passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages exchanged among hundreds of employees.

The misconfigured URL, which had been exposing this data since 2020, allowed full control access instead of the intended read-only permissions. This meant that anyone with knowledge of where to find this data could potentially delete, replace, or inject malicious content without detection.

Wiz clarified that the storage account itself was not directly exposed. The vulnerability arose from the Microsoft AI developers including an overly permissive shared access signature (SAS) token in the URL. SAS tokens are a mechanism used by Azure to create shareable links that provide access to an Azure Storage account’s data.

See also  OpenAI Wins Dismissal of Authors' Claims over ChatGPT's Alleged Copyright Infringement

Ami Luttwak, the co-founder and CTO of Wiz, emphasized the challenges faced by developers and engineers working with vast amounts of data in their race to develop new AI solutions. Luttwak highlighted the need for additional security measures and safeguards to protect sensitive information as development teams handle massive data, share it with peers, and collaborate on public open source projects. Cases like the one discovered at Microsoft are becoming increasingly difficult to monitor and prevent.

Following Wiz’s findings, Microsoft was promptly informed on June 22 and took immediate action by revoking the SAS token two days later, on June 24. The company completed its investigation on August 16, assessing any potential impact on its organization.

In a blog post shared with TechCrunch, Microsoft’s Security Response Center assured that no customer data was exposed, and no other internal services were put at risk. As a result of Wiz’s research, Microsoft has enhanced GitHub’s secret spanning service. This service now monitors all public open source code changes, specifically checking for plaintext exposure of credentials and other secrets, including any SAS tokens with overly permissive expirations or privileges.

Microsoft’s commitment to addressing and rectifying this vulnerability demonstrates its dedication to protecting user data and preventing future incidents.

Overall, this incident underscores the significance of robust security measures in AI development, particularly when handling extensive volumes of sensitive data. Companies must continually strengthen their security protocols to avoid potentially disastrous data breaches and ensure the privacy and security of their users.

Frequently Asked Questions (FAQs) Related to the Above News

What sensitive data was accidentally exposed by Microsoft AI researchers?

The researchers inadvertently exposed 38 terabytes of sensitive data, including personal backups, passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages.

How did the exposure occur?

The exposure occurred through a misconfigured Azure Storage URL, which granted permissions to the entire storage account instead of providing read-only access as intended.

How long was the data exposed?

The data had been exposed since 2020 before it was discovered by the cloud security startup, Wiz.

What could someone potentially do with the exposed data?

With knowledge of where to find the data, someone could potentially delete, replace, or inject malicious content without detection.

Was the storage account itself directly exposed?

No, the storage account itself was not directly exposed. The vulnerability came from including an overly permissive shared access signature (SAS) token in the URL, which granted access to the data.

How did Microsoft respond to the discovery?

Microsoft was promptly informed and took immediate action by revoking the SAS token. They completed their investigation and enhanced GitHub's secret scanning service to monitor public open source code changes for plaintext exposure of credentials and other secrets.

Was any customer data exposed or internal services put at risk?

No customer data was exposed, and no other internal services were put at risk, according to Microsoft's Security Response Center.

What steps should companies take to prevent incidents like this?

Companies handling extensive volumes of sensitive data should strengthen their security protocols, continuously monitor and prevent data breaches, and ensure the privacy and security of their users.

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

AI Researcher Sues Amazon for Labor Violations, Ignoring Copyright Laws and Retaliating

Former AI researcher sues Amazon for labor violations & copyright infringement in race for AI dominance. Learn more about the legal battle & its implications.

AI Discrimination Legislation Faces Uphill Battle in States

AI discrimination legislation faces resistance as states like Colorado, Connecticut, and Texas push for accountability measures amid industry pushback.

Chinese Scientists Develop Groundbreaking AI Tool for Cancer Diagnosis

Chinese scientists develop AI tool TORCH for cancer diagnosis, exceeding human pathologists' accuracy. Promising implications for improved patient outcomes.

Meta Unveils Llama 3: 70B-Parameter AI Model With Massive Performance Boost

Meta unveils cutting-edge Llama 3 AI models with massive performance boost. Trained on 15 trillion tokens, poised to challenge industry giants.