OpenAI’s o3 Outranks 99.8% of Coders
December 26, 2024In 2022, ChatGPT might have struggled to tell you what day December 26, 1987, fell on. Fast forward two years, and OpenAI’s latest o3 model isn’t just answering trivia—it’s a Grandmaster in competitive coding. Outperforming 99.8% of humans, it now ranks among the 175 best coders on Earth.
For perspective, that kind of mastery would land a human engineer a $500K+ comp package at a top-tier tech firm—along with serious bragging rights.
And o3’s dominance doesn’t stop there. It obliterated the GPQA benchmark, a test designed for PhD-level problem-solving. While the average PhD scores 70% and GPT-4o managed 78%, o3 soared to 87.7%. On math? It crushed 96.7% of the AIME, a notoriously tough competition even elite human contenders struggle with. By comparison, GPT-4o scored 56.7%, and humans average 85%.
These milestones come as OpenAI navigates internal upheaval and relentless competition from Anthropic, Google, and xAI. Yet, every Q4, they unveil breakthroughs that not only silence skeptics but leave their competitors in the dust, cementing their dominance in the AI wars.
So, what does this mean? Beyond the benchmarks and milestones, these achievements demand a shift in how we think about AI’s potential and its trajectory. Let’s dive into what’s driving these advancements—and where it all might lead.
Objectively Verifiable
Large Language Models (LLMs) are dazzling prediction engines. Give them a prompt, and they calculate the most probable answer based on patterns in the data they’ve absorbed. But like all geniuses, they come with limits—bound by three key factors: model size, compute power, and, most critically, the data they’re trained on.
Earlier this year, we hit the inevitable: the data wall. The internet holds roughly 20 trillion words—149 zettabytes—and today’s leading models have consumed nearly all of it. What’s left to fuel the next big leap?
Enter synthetic data. When the web runs dry, we generate our own. But here’s the twist—not all data is created equal. The future hinges on producing synthetic data that’s objectively verifiable. The square root of 51? Checkable. A Picasso-esque painting? Not so much. “Good art” isn’t objective, making it useless for functional verification.
This clarity shows us where AI will thrive: domains with black-and-white correctness like math, physics, and coding.
But OpenAI didn’t just wait for the bottleneck to solve itself. This year, they delivered another seismic shift: advanced reasoning. No longer just matching patterns, their models now deconstruct problems, weigh options, and apply context. Think of it as moving from rote memorization to critical thinking.
The results speak volumes. OpenAI’s o1 model dipped its toes into reasoning. The o3 model mastered it, delivering performance leaps that feel almost unfair. Interestingly, reasoning shines brightest in domains with clear rules and verifiable answers—the same places synthetic data excels.
Where does this leave us? AI will dominate in precision and logic, transforming fields like coding, math, and science. But when it comes to the abstract, the nuanced, and the human, AI still has miles to go.
The Future of Work
I think a lot about the future of work, especially through the lens of AI. Which jobs will be transformed beyond recognition? Which will still demand human ingenuity? And how quickly will this shift unfold?
The easiest prediction? Administrative roles. These jobs are built on structured, repetitive, rules-based tasks—exactly where AI thrives. As costs continue to drop, widespread disruption is inevitable.
Next are roles requiring basic analysis and reporting. Think junior financial analysts. These positions combine rudimentary analysis with administrative work—both areas where today’s AI excels. For now, humans remain in the loop, but it’s easy to see these roles being phased out entirely in the near future.
Then there’s the realm of knowledge-intensive fields like programming, math, and science. AI is turbocharging productivity here, excelling in tasks with clear, verifiable answers. But the greatest minds in these fields do much more—they invent, theorize, and push boundaries. AI may accelerate innovation, but true breakthroughs will require human ingenuity.
The most secure jobs today, ironically, are those rooted in judgment, intuition, taste, and emotional nuance. Roles in UI/UX, copywriting, and product marketing, or those requiring leadership, coaching, strategy, and negotiation, stand apart.
Two years into the AI revolution, the initial chaos is giving way to clarity. While uncertainties remain, we now have a stronger grasp of AI’s strengths, its limitations, and the opportunities it unlocks. Clearly, the winners of tomorrow will be those who master the delicate balance of leveraging AI’s capabilities while amplifying human ingenuity.
At least until we reach AGI. Then, we’ll be asking these questions all over again.