Gemini 3.0 Pro dominates AI benchmarks. But in my real world testing, Grok 4.1 Thinking comes out on top.
Google’s new mode is ranked #1 in LMArena. It scores off the charts in a variety of AI benchmarks, like Humanity’s Last Exam.
But the best way to test a model is to ask real questions I need the answer to. This morning, I posed three challenging questions.
When I tested Grok 4.1 Thinking yesterday, it scored an A. Let’s see how Gemini stacks up…
Round #1: Researching the Hedge Fund Industry
Earlier today, I looked at a startup that sells SaaS to hedge funds. This got me wondering, how big is this market?

Gemini was significantly slower than Grok 4.1, but its response was thorough.
It made an important distinction between the number of individual funds and the number of firms. It looks like there are around 4,000 firms total.
Gemini provided a variety of sources at the end. I would have liked to see the sources in line with the facts they cite. That would have made Gemini’s claims easier to verify.
I’m giving this round an A-.
Round #2: Troubleshooting an App Problem
I love using my new Wispr Flow dictation app. But it’s been having an annoying problem on my iPhone.
When I use one of my pre-loaded snippets, it cuts off a few characters. Can Gemini help troubleshoot?

Gemini suggests looking at the characters that get cut off to see if there’s a common theme. I noticed after each dollar sign, the next character gets cut off.
I reworked my snippet to remove dollar signs. Now it works perfectly! For whatever reason, iPhone doesn’t like that symbol.
Great job, Gemini! I’m giving this round an A.
Round #3: Finding Startups by SpaceX Alums
I recently made a small investment in a startup called Verustruct. The founder, Nick, is a former SpaceX engineer.
This got me thinking…can I find more early stage companies founded by SpaceX alums?

Gemini gave me companies that are several years old. These companies have usually already raised a seed or even a Series A, making them too late-stage for me.
The one pre-seed stage company Gemini found, Hop Aero, does not appear to have any SpaceX alums as founders.
This response was not very useful. I’m going to give this round a B-.
Wrap-Up
Gemini 3.0 Pro scores an A- in my testing.
That’s a strong result, but still puts Gemini behind Grok 4.1 Thinking. If Gemini wants to dethrone Grok, it will need to improve reasoning and sourcing.
Gemini 3.0 Pro is an impressive model. But if you want the best, you go Grok.
Have you tried Gemini 3.0 Pro?
More on tech:
Grok 4.1: Elon Drops the World’s Best Model
Kimi 2 Thinking — A Real Threat to ChatGPT and Grok
OpenAI Behind Competitors Despite GPT 5.1 Release
Save Money on Stuff I Use:
This platform lets me diversify my real estate investments so I’m not too exposed to any one market. I’ve invested since 2018 with great returns.
More on Fundrise in this post.
If you decide to invest in Fundrise, you can use this link to get $100 in free bonus shares!
I’ve used Misfits for years, and it never disappoints! Every fruit and vegetable is organic, super fresh, and packed with flavor!
I wrote a detailed review of Misfits here.
Use this link to sign up and you’ll save $15 on your first order.
Leave a reply to How to Make a Great AI Product: Oboe – Tremendous Cancel reply