OpenAI recently introduced the o1-preview, the first in a new series of advanced AI models designed for solving complex reasoning problems, particularly in the fields of science, coding, and mathematics. This model is built to engage in deeper thinking, refining its responses more carefully than earlier models. It can tackle harder tasks by evaluating different approaches, much like a human would.
Pros
The o1-preview model, available through ChatGPT and the API, represents a significant leap in AI’s reasoning capabilities. OpenAI reports that future updates of the o1 series have performed exceptionally well in testing. For instance, the next iteration of the model scored on par with PhD students in difficult science tasks and significantly outperformed previous models in mathematics and coding. In an evaluation based on the International Mathematics Olympiad, the reasoning model scored 83%, compared to GPT-4o’s 13%. In coding, the model ranked in the 89th percentile of Codeforces competitions. These accomplishments demonstrate the o1 model’s potential in specialised and technically challenging areas.
Cons
However, the current o1-preview model lacks several features available in GPT-4o, such as web browsing and file uploading. While GPT-4o remains better suited for general use, OpenAI envisions the o1 series as more capable in the near term for tasks requiring deep reasoning.
OpenAI has also prioritised safety in the development of these new AI models, incorporating a new safety training methodology. This approach leverages the reasoning abilities of the o1 model to help it better adhere to safety and alignment guidelines. For example, when tested on its ability to resist “jailbreaking” attempts (where users try to bypass safety features), the o1-preview scored an impressive 84 out of 100, compared to GPT-4o’s 22. OpenAI has stated that it has bolstered internal governance and expanded its collaborations with governmental safety institutes in the U.S. and U.K. to ensure the model’s alignment with safety standards.
Why is this important?
The new reasoning capabilities of the o1 series make it particularly appealing to professionals in fields like healthcare, physics, and software development. For instance, researchers can use the model to annotate cell sequencing data, while physicists can employ it to solve complex mathematical problems related to quantum optics. Developers can use the model to generate and debug intricate code or build multi-step workflows.
Along with the o1-preview, OpenAI also launched a smaller, more cost-effective model called o1-mini. This version is 80% cheaper to run and offers impressive reasoning skills, making it ideal for coding tasks where broad-world knowledge is not necessary. Both models are accessible to ChatGPT Plus and Team users, with plans to roll out access to ChatGPT Free users soon.
Looking ahead, OpenAI plans to continue enhancing the o1 series, adding features like web browsing and file uploading, while also continuing to develop and release models in its GPT series.