Grok 4.1 Thinking Beats Gemini 3.0 Pro in Real World Test

Gemini 3.0 Pro dominates AI benchmarks. But in my real world testing, Grok 4.1 Thinking comes out on top.

Google’s new mode is ranked #1 in LMArena. It scores off the charts in a variety of AI benchmarks, like Humanity’s Last Exam.

But the best way to test a model is to ask real questions I need the answer to. This morning, I posed three challenging questions.

When I tested Grok 4.1 Thinking yesterday, it scored an A. Let’s see how Gemini stacks up…

Round #1: Researching the Hedge Fund Industry

Earlier today, I looked at a startup that sells SaaS to hedge funds. This got me wondering, how big is this market?

Gemini was significantly slower than Grok 4.1, but its response was thorough.

It made an important distinction between the number of individual funds and the number of firms. It looks like there are around 4,000 firms total.

Gemini provided a variety of sources at the end. I would have liked to see the sources in line with the facts they cite. That would have made Gemini’s claims easier to verify.

I’m giving this round an A-.

Round #2: Troubleshooting an App Problem

I love using my new Wispr Flow dictation app. But it’s been having an annoying problem on my iPhone.

When I use one of my pre-loaded snippets, it cuts off a few characters. Can Gemini help troubleshoot?

Gemini suggests looking at the characters that get cut off to see if there’s a common theme. I noticed after each dollar sign, the next character gets cut off.

I reworked my snippet to remove dollar signs. Now it works perfectly! For whatever reason, iPhone doesn’t like that symbol.

Great job, Gemini! I’m giving this round an A.

Round #3: Finding Startups by SpaceX Alums

I recently made a small investment in a startup called Verustruct. The founder, Nick, is a former SpaceX engineer.

This got me thinking…can I find more early stage companies founded by SpaceX alums?

Gemini gave me companies that are several years old. These companies have usually already raised a seed or even a Series A, making them too late-stage for me.

The one pre-seed stage company Gemini found, Hop Aero, does not appear to have any SpaceX alums as founders.

This response was not very useful. I’m going to give this round a B-.

Wrap-Up

Gemini 3.0 Pro scores an A- in my testing.

That’s a strong result, but still puts Gemini behind Grok 4.1 Thinking. If Gemini wants to dethrone Grok, it will need to improve reasoning and sourcing.

Gemini 3.0 Pro is an impressive model. But if you want the best, you go Grok.

Have you tried Gemini 3.0 Pro?

3 responses to “Grok 4.1 Thinking Beats Gemini 3.0 Pro in Real World Test”

Nano Banana Pro: The Best Image Model Yet – Tremendous

November 21, 2025 at 10:54 am

[…] Grok 4.1 Thinking Beats Gemini 3.0 Pro in Real World Test […]

LikeLike

Reply
How to Make a Great AI Product: Oboe – Tremendous

December 12, 2025 at 11:38 am

[…] Grok 4.1 Thinking Beats Gemini 3.0 Pro in Real World Test […]

LikeLike

Reply
GPT 5.2 Still Loses to Grok and Gemini – Tremendous

December 15, 2025 at 11:13 am

[…] isn’t bad, but it’s still behind Grok and Gemini. Grok scored an A in my testing and Gemini scored an […]

LikeLike

Reply

Grok 4.1 Thinking Beats Gemini 3.0 Pro in Real World Test

Share this:

3 responses to “Grok 4.1 Thinking Beats Gemini 3.0 Pro in Real World Test”

Leave a reply to How to Make a Great AI Product: Oboe – Tremendous Cancel reply