OpenAI has launched a new initiative called Superalignment, aimed at addressing the potential challenges posed by superintelligent artificial intelligence (AI). The company envisions a future where AI systems surpass human intelligence, and it believes that aligning these systems with humanity’s best interests is crucial. OpenAI is assembling a team of top machine learning researchers and engineers to tackle this issue.
The focus of Superalignment is on mitigating the risks associated with superintelligent AI, as opposed to artificial general intelligence (AGI). OpenAI co-founders Ilya Sutskever and Jan Leike, who are leading the new team, assert that controlling a superintelligent AI is currently impossible. Existing alignment strategies like reinforcement learning from human feedback are not applicable when dealing with AI systems that surpass human capabilities.
OpenAI has made a significant commitment to this initiative, dedicating 20% of the compute resources it has secured over the next four years to the pursuit of superintelligence alignment. This commitment is considered the largest investment in alignment research to date, exceeding the total resources humanity has spent on this area thus far.
The Superalignment team aims to solve the core technical challenges of superintelligence alignment within four years. Their work will involve improving the safety of current AI models, understanding and mitigating various AI risks (such as economic disruption, bias, disinformation, and addiction), and addressing sociotechnical problems related to human-machine interaction.
One of the team’s initial goals is to build an automated alignment researcher that is at roughly human-level intelligence. This would allow them to scale their efforts and iteratively align superintelligence. To achieve this, the researchers will need to develop a scalable training method, validate the resulting model, and stress test the entire alignment pipeline. Stress testing would involve training AI systems to evaluate other AI systems, automating the search for problematic behavior, and detecting misalignments through adversarial testing.
Considering the concerns regarding progress in AI alignment, OpenAI is committed to monitoring and measuring progress based on empirical data. They will closely observe various aspects of their research roadmap and the development of GPT-5, a future iteration of their language model. This approach will help them assess their achievements and address any potential challenges along the way.
OpenAI’s Superalignment initiative highlights the company’s determination to ensure that superintelligent AI aligns with human intent and serves humanity’s best interests. With a dedicated team and a significant allocation of resources, OpenAI is taking proactive measures to address the complex challenges that lie ahead as AI systems continue to advance. By focusing on alignment, the company aims to promote the responsible development and deployment of AI technology, ultimately benefiting society as a whole.