OpenAI & The New York Times: A Wake-Up Call For Ethical Data Practices
In a tumultuous turn of events, OpenAI and The New York Times (NYT) find themselves engaged in a federal lawsuit that puts the spotlight on ethical data practices. The NYT alleges that OpenAI infringed on its copyrights by utilizing its articles to train AI technologies like ChatGPT. This isn’t the first time OpenAI has faced such accusations, but this case presents the strongest evidence yet.
The battle between news publishers and big tech has been ongoing for some time. AI, often dubbed the future of search, has the potential to overshadow publishers as GenAI models are trained on their content for free. If users no longer engage with publishers’ sites, the consequences could be dire, threatening the integrity of free press and democratic societies.
While Google and Facebook have already hollowed out publishers, OpenAI could potentially deliver the final blow. Whether it is The New York Times or OpenAI that prevails, the outcome of this case promises to be a saga worth following.
Finding the right balance between copyright protection and enabling innovation is essential. Clear, transparent rules that apply universally to big tech companies have been advocated for. Responsible data stewardship is not only a social responsibility but also a competitive necessity in risky markets. Upholding principles such as privacy, agency, transparency, fairness, and accountability throughout the data lifecycle is vital.
OpenAI’s defense can be boiled down to three main counterarguments: First, they claim that training is fair use, providing an opt-out feature. Second, they argue that regurgitation is a rare bug they are committed to eliminating. Lastly, they suggest that The New York Times is not disclosing the whole story. While both sides may present valid points, OpenAI is not exempt from violating crucial ethical data principles: agency, fairness, and transparency.
Fairness entails businesses acknowledging and mitigating the possible bias and disparate impact of data systems and AI outputs. However, it is essential to recognize that even OpenAI’s computer scientists might not fully understand the inner workings of their models. Language models can be considered black boxes, making it difficult to explain how data influences the end results. The example of Rite Aid serves as a cautionary tale, with the FTC proposing a ban on their facial recognition technology due to allegations of biased usage.
Transparency requires businesses to communicate clearly how they collect, use, share, and store data. OpenAI falls short in this regard, as they do not provide candid information about their models’ training data. Furthermore, they fail to notify users when copyrighted materials have been used for training purposes. OpenAI’s actions align with the tradition of Silicon Valley giants skirting dubious data standards to maximize profits.
The principle of agency ensures that individuals have control over their data usage and can make decisions about it. OpenAI’s new opt-out feature allowing users to restrict the sharing of their input with ChatGPT seems promising, but the implementation seems designed to discourage users from utilizing it. Additionally, ChatGPT has been trained on various data, including some obtained without user consent and using licensed content. OpenAI justifies this by claiming fair use, but training a model with copyrighted content to profit from the output hardly falls within fair use boundaries.
In the race for AI dominance, companies often guard their data sources as trade secrets. However, Apple sets an example by pursuing fair and transparent licensing agreements with publishers. Companies can ethically source data that respects creators while driving innovation.
As history has shown, copyright law always becomes relevant when new technologies emerge. The implications for AI are equally significant. This case not only highlights the need for ethical data practices but also emphasizes the importance of innovative solutions that connect content creators and data licensors to marketplaces and application companies.
With the substantial financial backing OpenAI has received from Microsoft, they can afford to settle in this case and others like it. It is hoped that OpenAI and Sam Altman will recognize the need to address these concerns head-on instead of postponing it. The consequences of inaction could haunt them for years to come.