# OpenAI Achieves Breakthrough in AI Math by Rewarding the Journey, Not Just the Destination
OpenAI has announced a significant advancement in teaching AI systems to solve mathematical problems by fundamentally changing how they reward the learning process.
Instead of only rewarding models when they arrive at the correct final answerâan approach called "outcome supervision"âthe company now rewards each correct reasoning step along the way. This method, known as "process supervision," has enabled OpenAI to achieve a new state-of-the-art performance in mathematical problem-solving.
The shift matters for two key reasons. First, it simply works better: models trained with process supervision outperform those trained with traditional outcome-based methods. Second, and perhaps more importantly, it addresses a critical AI safety concern.
By rewarding correct reasoning at each step, the model learns to show its work in ways that humans can verify and endorse. This transparency makes the AI