OpenAI has introduced a new technique called process supervision that promises to improve mathematical reasoning. Unlike outcome supervision, which rewards only the correct final answer, process supervision rewards each step of reasoning. This method aligns the model with human thinking by training it to create a chain-of-thought model. OpenAI believes that this method reduces hallucinations and rewards aligned processes, which are easier to interpret. The result is a transparent, human-approved process which leads to better outcomes. Based on the evaluation of the MATH test set, OpenAI concludes that process-supervised models outperform the outcome-supervised ones and will help in inching nearer perfection in chatbots.
OpenAI was founded in 2015 with a goal to create more general AI systems and mitigate the risks posed by artificial intelligence. Today, it is an independent research organization focused on developing safe and beneficial AI.
This new technique is the result of research by OpenAI, a leading research organisation developing safe and beneficial AI. OpenAI believes that by using process supervision and other methods, we will be able to improve our understanding of how AI works and achieve better outcomes. OpenAI is continuing to work on developing the next generation of chatbots, with a focus on safety, transparency, and aligning models with human preferences and values. We can expect to see many more groundbreaking innovations from OpenAI in the near future.