This morning, I tested OpenAI’s new GPT-5.1. It still falls behind the best models from xAI, Google, and Kimi.
When I tested GPT-5 in August it performed poorly, notching a C-. It gave me outdated data and struggled to cite sources.
OpenAI claims that GPT-5.1 is better at reasoning and easier to customize. I ran a three-round test to find out…
Round #1: Finding Fertility Startups
Low fertility is a huge problem across much of the world. Italy just registered a fertility rate of 1.1, an all-time low.
Even folks who want to have a child sometimes can’t. But what if we were able to cure infertility and help people have more babies?
I asked GPT 5.1 to find me startups working on infertility.

ChatGPT did a great job of finding some leading startups working on infertility. It gave citations on the technologies they’re developing and the amount of capital they raised.
I’m giving this round an A.
Round #2: Defending Against a Robot War
We’ve seen some incredible new robotics demos recently, like the Neo. These androids will help us clean up around the house and build things in factories.
But what if androids are used against us in a war? What would be the best way to stop them?

ChatGPT comes up with some interesting ideas, like hacking the software that drives the robots. This could allow us to neutralize a large number of robots at once more easily than if we used munitions.
But ChatGPT does not cite any sources in its response. So, we don’t know how accurate it is. ChatGPT should’ve cited reports by scholars of war and experts on robotics.
I’m giving this round a C.
Round #3: The Origins of Fall Colors
Let’s move on to happier topics!
Looking out my window right now, I can see the beautiful red-yellow foliage. What exactly causes these gorgeous colors?

GPT-5.1 gives us a great response, including some beautiful leaf pictures.
It explains that a reduction in chlorophyll makes the leaves change color. The citations are good, linking to an article in the Smithsonian.
I’m giving this round an A.
Wrap-Up
Overall, I’m giving GPT-5.1 a B+. That’s up from a C- for GPT-5, but still well below the leading models.
The quality of GPT-5.1’s responses is inconsistent. Some have great sourcing and follow instructions well, while others do not.
Grok 4, Kimi 2 Thinking and Gemini 2.5 Pro are all better than GPT-5.1. The outputs are more consistent and the information more reliable.
When GPT-3.5 came out in late 2022, OpenAI was way ahead. That’s no longer the case.
For all the billions in investment, OpenAI is falling behind its competitors.
But Sam still has some incredible researchers on his team. I wouldn’t count out OpenAI yet.
Have a great weekend, everybody!
More on tech:
Kimi 2 Thinking — A Real Threat to ChatGPT and Grok
Testing Zuck’s $70 Billion Model
Save Money on Stuff I Use:
This platform lets me diversify my real estate investments so I’m not too exposed to any one market. I’ve invested since 2018 with great returns.
More on Fundrise in this post.
If you decide to invest in Fundrise, you can use this link to get $100 in free bonus shares!
I’ve used Misfits for years, and it never disappoints! Every fruit and vegetable is organic, super fresh, and packed with flavor!
I wrote a detailed review of Misfits here.
Use this link to sign up and you’ll save $15 on your first order.
Leave a reply to Nano Banana Pro: The Best Image Model Yet – Tremendous Cancel reply