A new research paper from OpenAI reveals that their team, led by Ilya Sutskever, has developed tools to control a superhuman AI. The paper emphasizes the importance of ensuring the alignment of future AI systems with human values. OpenAI acknowledges that superintelligence, an AI surpassing human intelligence, could become a reality within the next decade. To address the potential risks associated with this, OpenAI’s Superalignment team proposes training smaller AI models to teach superhuman AI models. By using this approach, OpenAI aims to create AI systems that adhere to human rules and guidelines.
Currently, OpenAI employs human feedback to align their ChatGPT model, ensuring it doesn’t provide dangerous instructions or outcomes. However, as the AI becomes more advanced, human training alone will no longer be sufficient. Hence, the team suggests using smaller AI models to train the more complex superhuman AI models.
The involvement of Ilya Sutskever in this project raises questions about his current role at OpenAI. While he is listed as a lead author on the paper, his position at the company remains unclear. Speculation about Sutskever’s status grew when he became less visible at the company and reportedly hired a lawyer. Nevertheless, the Superalignment team, including Jan Leike, has praised Sutskever for his contribution to the project.
The study conducted by OpenAI demonstrates that training large AI models using smaller models, a process called weak-to-strong generalization, achieves greater accuracy compared to human training. OpenAI states that this framework shows promise in training superhuman AI but does not consider it a definitive solution for alignment. The primary concern is that misaligned or misused superhuman AI models could have devastating consequences if not properly controlled.
OpenAI’s research paper reaffirms their commitment to developing tools for controlling AGI (Artificial General Intelligence) but does not confirm the existence of an AGI at present. As discussions surrounding AGI intensify, the company focuses on responsibly deploying AI and ensuring that it aligns with human values.
In summary, OpenAI, particularly Ilya Sutskever’s Superalignment team, has created tools aimed at controlling superhuman AI. Their research paper highlights the need to align future AI systems with human values and introduces a framework involving the training of smaller AI models to teach more advanced ones. The future of Ilya Sutskever’s role at OpenAI remains uncertain, but his team continues to make significant progress in the field of responsible AI deployment.