Article link copied!
Headline

    GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA

    The release of OpenAI’s GPT-5.2 Pro has reignited the race for AI supremacy, promising significant leaps in reasoning and professional capabilities. But how does it actually perform when tested against the world's hardest domain-specific questions?

    We put the leading frontier models—including Google’s Gemini 3 Pro Preview, GPT-5.2 Pro, and GPT-5.1-Thinking—to the test on SuperGPQA, our gold standard benchmark for graduate-level knowledge covering 285 specialized disciplines from Quantum Mechanics to Agronomy, SuperGPQA bypasses surface-level internet knowledge to evaluate deep reasoning.

    The results are in, and they signal a shift in the hierarchy of "hard science" capabilities.

    Discipline accuracy distribution by model
    Discipline accuracy distribution by model

    The Verdict: Gemini 3 Pro Leads in Specialized Knowledge

    Contrary to the expectation that newer is always better, our data shows that Gemini 3 Pro Preview currently holds the edge in complex, high-stakes scientific domains. While the GPT-5 series demonstrates impressive reasoning, Gemini's underlying knowledge density in specialized fields appears superior.

    Gemini 3 Pro Preview outperforms GPT-5 variants in overall graduate-level accuracy on SuperGPQA
    Gemini 3 Pro Preview outperforms GPT-5 variants in overall graduate-level accuracy on SuperGPQA

    In the Physics domain alone, which aggregates over 2,000 graduate-level questions, the performance gap is distinct. Gemini 3 Pro consistently ranks at the top, outperforming the GPT-5 series in subfields that require precise physical intuition and calculation.

    Discipline Deep Dive: Where the Models Diverge

    The aggregate scores tell only half the story. The true test of an expert model is its performance in "long-tail" disciplines—subjects that aren't just reasoning puzzles, but require deep, memorized professional knowledge.

    Hard Physics: The Reasoning Test

    Relativity is one of the most conceptually demanding subfields in our benchmark. Here, Gemini 3 Pro achieved a commanding 79.75% accuracy. In comparison, OpenAI's specialized reasoning model, GPT-5.1-Thinking, scored 74.68%, while the new GPT-5.2 Pro trailed at 70.89%. This suggests that for theoretical physics, Gemini's internal world model is more robust.

    Specialized Agriculture: The Knowledge Test

    In Aquaculture, a niche field often overlooked by general benchmarks, the difference is even more stark. Gemini 3 Pro maintained a robust 62.50% accuracy, proving its versatility. In contrast, GPT-5.2 Pro struggled significantly, achieving only 48.21% - a gap of over 14 percentage points.

    Gemini 3 Pro demonstrates superior breadth, encompassing GPT-5.2 Pro across diverse scientific disciplines
    Gemini 3 Pro demonstrates superior breadth, encompassing GPT-5.2 Pro across diverse scientific disciplines

    Conclusion

    For developers and enterprises choosing between these frontier models, the SuperGPQA verdict is clear:

    • GPT-5.1-Thinking is a powerful tool for logic-heavy tasks, showing strong improvements over base models in reasoning-intensive questions.

    • However, Gemini 3 Pro currently reigns supreme in domain expertise. If your application requires handling specialized, graduate-level knowledge, from theoretical physics to agricultural science—Gemini 3 Pro is the statistical leader.

    As the AI landscape evolves, SuperGPQA will continue to serve as the unbiased arena for measuring true machine intelligence.

    Explore the Full Leaderboard ->

    Learn more about SuperGPQA ->

    Designed by 2077AI Team

    Thanks for Reading 2077AI!

    Stay tuned for the weekly digest of our original benchmarks, datasets, and the latest 2077AI academic events

    Subscribed! Check your email
    Email invalid