ByteDance, the parent company of popular social media platform TikTok, has had its account suspended by OpenAI due to allegations of violating the developer license of both OpenAI and Microsoft. It is believed that ByteDance used data generated by OpenAI’s language model GPT through Microsoft Azure to train its own artificial intelligence (AI) model in China. OpenAI has taken action to suspend ByteDance’s account while it conducts a deeper investigation into the matter.
The suspension came after The Verge reported that ByteDance was utilizing the OpenAI API to train its AI model. OpenAI spokesperson, Niko Felix, confirmed the suspension, stating that all API customers must adhere to the company’s usage policies. While ByteDance’s use of the API was minimal, OpenAI has suspended their account pending further investigation. If found guilty, ByteDance will need to make changes or risk permanent account closure.
According to a document shared with The Verge, ByteDance extensively used OpenAI’s API in the development of its foundational large language model (LLM) called Project Seed. The usage of OpenAI’s technology spanned different stages, including training and model evaluation. During the process, ByteDance employees were reportedly aware of the potential consequences and even devised strategies to whitewash evidence through data desensitization.
Initially, ByteDance’s reliance on OpenAI’s platform was more prominent in the early stages of Project Seed. However, the company later directed its team to stop using GPT-generated text at any stage of model development. Despite this, the API is still being used in ways that go against the terms of service of both OpenAI and Microsoft. The API is reportedly used to assess the performance of ByteDance’s model within Doubao, a chatbot platform for which ByteDance received regulatory approval in China to launch Project Seed.
In response to The Verge’s report, ByteDance spokesperson Jodi Seth acknowledged the use of GPT-generated data in the early stages of Project Seed’s development but stressed that it was subsequently removed from the training data in mid-2023. Seth claimed that ByteDance is authorized by Microsoft to use the GPT APIs, primarily for driving products and features in non-China markets. However, ByteDance relies on its internally developed model to support its China-exclusive Doubao platform.
The use of OpenAI’s technology by ByteDance raises concerns within the AI community, as it goes against OpenAI’s terms of service, which explicitly prohibit using the model output to develop competing AI models or services. Interestingly, most of ByteDance’s utilization of GPT happened through Microsoft’s Azure platform rather than directly with OpenAI. This begs the question of whether Microsoft will take similar action to suspend ByteDance’s access to its services as OpenAI has done. As of now, Microsoft has yet to make a statement or take any action in response to the allegations.
ByteDance rose to global prominence with its popular app TikTok and its captivating For You feed. However, despite its previous AI leadership, ByteDance appears to have fallen behind in the generative AI race, leading to its discreet reliance on OpenAI’s technology to create its own rival large language model. Such an approach is highly frowned upon in the AI community and has now resulted in the suspension of ByteDance’s account by OpenAI.
It remains to be seen how ByteDance and Microsoft will respond to these allegations and whether further action will be taken. As the investigation unfolds, it is clear that the use of AI technology carries significant ethical and contractual obligations, particularly when it involves the unauthorized development of competing models.