OpenAI is taking a significant step forward in its efforts to address the potential risks associated with highly advanced artificial intelligence (AI) models. The research organization is establishing a new team focused on tackling the dangers that may arise from superintelligent machine learning models. Chief Scientist Ilya Sutskever and Head of Alignment Jan Leike will jointly lead the team.
In a recent blog post, Sutskever and Leike expressed their belief that AI models exhibiting superintelligence could become a reality by the end of this decade. However, while the development of such technologies holds immense potential for humanity, it also poses grave risks, including the disempowerment or even extinction of humans.
To mitigate these risks, OpenAI recognizes the need for a new approach to supervising AI. The research team’s primary objective will be to develop a roughly human-level automated alignment researcher powered by AI. The existing methods of preventing AI harms rely on human scrutiny, but OpenAI argues that humans will not be capable of effectively supervising AI systems that surpass their own intelligence.
OpenAI’s research team plans to focus on three main priorities. Firstly, they aim to devise a method for training the automated alignment researcher. This will involve teaching the system to oversee aspects of superintelligent AI models, even when the scientists themselves may lack a comprehensive understanding of those models.
Once the automated alignment researcher is developed, OpenAI intends to validate its reliability through two main methods. The first method involves searching for robustness issues, which refer to harmful outputs generated by AI models. The second method, called interpretability research, requires analyzing the internal components of AI neural networks to identify potential malfunctions that may not be apparent from the input and output alone.
Lastly, OpenAI plans to stress test the system by training misaligned models to evaluate the effectiveness of the automated alignment researcher.
While OpenAI anticipates evolving research priorities as they gain a deeper understanding of the problem, Chief Scientist Ilya Sutskever is set to make this initiative his core focus. The new research team will consist of Sutskever, Leike, members from OpenAI’s existing alignment group, as well as researchers and engineers from other OpenAI units and new hires.
By establishing this dedicated research team, OpenAI aims to proactively address the potential risks associated with advancing AI technologies. Their efforts will contribute to the development of secure and responsible AI capable of positively benefiting humanity.