OpenAI, the renowned artificial intelligence research organization, has recently announced the formation of a new team dedicated to the task of controlling and steering superintelligent AI systems. The team will be led by Ilya Sutskever, the chief scientist and co-founder of OpenAI, and its main objective is to develop strategies for managing AI systems that surpass human intelligence.
Sutskever and Jan Leike, a lead on the alignment team at OpenAI, have stated in a blog post that they predict superintelligent AI could emerge within the next decade. They emphasize the need for research in order to govern and prevent potentially malevolent AI. Their concern is that current techniques for aligning AI, such as reinforcement learning from human feedback, may not suffice as humans will unlikely be able to effectively supervise highly intelligent AI systems.
To tackle these challenges, OpenAI has established a new team called Superallignment, which will be led by Sutskever and Leike. Comprising scientists and engineers from OpenAI’s previous alignment team, as well as researchers from other organizations affiliated with the company, the Superallignment team will dedicate the next four years to resolving the core technical issues surrounding the control of superintelligent AI. They will have access to 20% of the computing capability that OpenAI has acquired to date.
The team aims to build a human-level automated alignment researcher by training AI systems using human feedback and leveraging AI to assist in human evaluation. Ultimately, the goal is to create AI systems capable of conducting alignment research themselves, ensuring that desired outcomes are achieved. It is the belief of OpenAI that AI can make more rapid progress in alignment research than humans alone.
OpenAI anticipates that, as AI systems improve, they will assume a larger role in alignment work, conceiving, implementing, and developing superior alignment techniques. Human researchers, in turn, will shift their focus to reviewing the research conducted by AI systems rather than performing it themselves.
While acknowledging the limitations and potential risks associated with using AI for evaluation, the OpenAI team remains optimistic about its approach. They recognize the challenges involved but believe that machine learning experts, even those not currently involved in alignment research, will play a crucial role in addressing them. The team intends to share their findings widely and considers contributing to the alignment and safety of non-OpenAI models as an important aspect of their work.
As the race towards developing superintelligent AI continues, OpenAI’s initiative to ensure its control and steerability marks a significant step. By fostering collaboration between human researchers and AI systems, they hope to navigate the complex challenges surrounding the alignment of AI with human values.