OpenAI Bolsters Safety Measures and Grants Board Veto Power to Address Risky AI
OpenAI, the prominent developer of artificial intelligence (AI) models, is expanding its internal safety protocols in an effort to mitigate the potential dangers associated with AI development. To ensure a comprehensive approach, OpenAI has established a safety advisory group that will offer recommendations to the leadership and board. Although the use of veto power remains uncertain, this move highlights OpenAI’s commitment to addressing risks and marks a significant development in the ongoing debate regarding the potential hazards of AI.
While the inner workings of such policies typically go unnoticed, recent leadership changes and the evolving discourse around AI risks have compelled attention to OpenAI’s safety considerations. In a newly released document and accompanying blog post, OpenAI outlines its updated Preparedness Framework, which has likely been revised following the November reshuffle that removed two board members deemed as decelerationist.
The primary objective of this framework is to establish a clear procedure for identifying, evaluating, and managing potential catastrophic risks associated with the models OpenAI is developing. OpenAI defines catastrophic risks as those that may result in substantial economic damage amounting to billions of dollars or pose a severe threat to human life. This includes the concept of existential risk, often associated with a future dominated by AI-powered machines.
The framework consists of three key teams: the safety systems team, which oversees models already in production; the preparedness team, which assesses risks during the development stage; and the superalignment team, focused on establishing guidelines for the safe utilization of superintelligent models.
The safety systems team manages potential abuses of existing models through the implementation of API restrictions and tuning. The preparedness team analyzes risks associated with models in the development phase, aiming to identify and quantify potential hazards prior to release. Finally, the superalignment team is dedicated to creating theoretical guidelines for models with advanced intelligence, a concept that may still be remote in practice.
The framework incorporates a risk assessment system that involves categorizing models based on identified risks. Categories such as cybersecurity, persuasion (e.g., disinformation campaigns), model autonomy, and CBRN (chemical, biological, radiological, and nuclear threats) are evaluated for each model. Known mitigations are considered, and if a model is assessed as having high risk, it is prohibited from deployment. Similarly, models with any critical risks are not pursued further.
OpenAI has provided detailed documentation outlining these risk levels to ensure transparency and avoid subjective interpretations by individual engineers or managers. For instance, in the cybersecurity category, increasing operator productivity on cyber operation tasks by a certain factor is classified as a medium risk. Meanwhile, a high risk would involve the development of high-value exploits against fortified targets without human intervention. Lastly, a critical risk is defined as a model’s ability to conceive and execute novel cyberattack strategies against hardened targets based solely on high-level desired goals—an outcome OpenAI aims to prevent.
In an effort to enhance safety measures, OpenAI has introduced a cross-functional Safety Advisory Group, whose responsibility is to review reports from technical teams and offer recommendations based on a broader perspective. The group aims to identify potential unknown risks, knowing that uncovering unknown unknowns is a challenging task.
The Advisory Group’s recommendations will be simultaneously forwarded to both the board and the leadership, with final decisions made by the leadership, including CEO Sam Altman and CTO Mira Murati, and approved by the board. This process is intended to prevent scenarios where high-risk products or processes are greenlit without the board’s knowledge or approval. OpenAI, however, faces questions regarding the level of empowerment the board possesses and whether they will exercise their veto power in the face of CEO decisions.
Given the gravity of the risks involved and OpenAI’s professed concerns, it remains uncertain whether models classified with critical risks will be made public. OpenAI has previously withheld the release of such powerful models, recognizing their potential harmful impact. Nevertheless, the updated framework does not explicitly address this matter, leaving the outcome of highly risky models unclear.
OpenAI’s commitment to soliciting audits from independent third parties speaks to its intention to provide transparency, though the explicit details of this external evaluation process remain undisclosed.
As OpenAI’s efforts to promote the safety of AI continue to evolve, the company’s latest undertaking to enhance internal safeguards is of great significance. By involving a safety advisory group and granting the board veto power, OpenAI aims to strike a balance between technological advancements and potential risks. How these measures will be implemented and whether they will successfully prevent future AI hazards remains to be seen, but the urgency to address concerns surrounding AI safety is undeniably gaining traction.