OpenAI, the renowned artificial intelligence lab, has made a significant commitment to advancing the field of artificial superintelligence alignment research. In a bid to ensure that superintelligent AI systems do not pose a threat to humanity, OpenAI has established a new alignment research division. The company predicts that the first superintelligent AI will emerge in the coming years, surpassing human intelligence and potentially causing harm if not aligned with human values.
Termed superalignment, OpenAI’s initiative aims to achieve scientific and technical breakthroughs that will enable the control and governance of AI systems that exceed human intelligence. This dedicated division will dedicate 20% of its current compute power to tackling the alignment problem and conducting essential calculations.
Ilya Sutskever, co-founder of OpenAI, and Jan Leike, head of alignment at OpenAI, expressed concern in a blog post about the potential disempowerment or even extinction of humanity if superintelligent AI systems go rogue. They emphasize that current methods used for AI alignment, such as reinforcement learning from human feedback, may not be effective when dealing with AI systems that outperform humans and can outwit their overseers.
To address these challenges, OpenAI is shifting its focus beyond artificial general intelligence (AGI) and towards the future of AI that surpasses human intelligence. It is anticipated that superintelligent AI will emerge within the next decade, posing a more significant threat compared to AGI. Consequently, new approaches are required, as current techniques and technologies are insufficient for aligning superintelligent AI.
OpenAI aims to build a human-level automated alignment researcher while leveraging vast amounts of compute power to facilitate scalable efforts in aligning superintelligence. The company has outlined three key steps to achieve this goal. Firstly, they intend to use AI systems to assess tasks that are difficult for humans to evaluate, effectively employing AI to evaluate other AI systems. Additionally, OpenAI plans to explore how their models can oversee tasks that they themselves cannot supervise. Lastly, they aim to validate system alignment by automating the search for problematic behavior both within and outside AI systems.
The company also intends to test the entire alignment pipeline through adversarial testing by training deliberately misaligned models and then using the new AI trainer to correct them. OpenAI anticipates that their research priorities will evolve as they gain more insights into the core technical challenges of superintelligence alignment. They aim to accomplish this within the next four years.
The increased focus on AI safety has led to the emergence of a new industry, with nations recognizing the need to align AI systems with human values. For instance, the UK has allocated a budget of £100 million to its Foundation Model AI Taskforce, which aims to investigate AI safety issues. Additionally, the UK will host a global AI summit later this year, which is expected to address the immediate risks associated with current AI models, as well as the likely arrival of artificial general intelligence in the near future.
As OpenAI commits significant resources to superalignment research, the broader industry anticipates advancements in ensuring the responsible development and deployment of superintelligent AI. With the rise of AI systems surpassing human intelligence, addressing alignment and safety issues is crucial to prevent any potential harm to humanity.