Meta’s market cap surged by $60 billion yesterday after dropping Muse Spark. But Meta’s new model struggles in real-world testing.
Meta claims that Muse Spark beats all other models on numerous AI benchmarks, including visual reasoning and agentic search. But I don’t put a lot of faith in these gameable benchmarks.
Real-world use is the true test. So this morning, I put Muse Spark through three rounds…
Round #1: Analyzing the Strait of Hormuz Crisis
Markets jumped Wednesday following a ceasefire between Iran and the U.S. But will it hold and will the Strait of Hormuz fully reopen?
I asked Meta to analyze the situation using its Thinking feature…

Meta gave a solid analysis. It concludes that reopening the strait will be gradual. Expect a few ships to trickle through at first, rather than a full reopening.
Meta pulled in a handful of relevant citations. But it also relied on some secondary and unreliable sources, like Wikipedia.
It’s a step up from Llama. But Meta is still way behind leaders like Grok and Gemini. I ran the same prompt through Grok Expert, and it analyzed 258 sources in seconds.
I’m giving this round a B-.
Round #2: Finding Exciting New Startups
For the next round, I asked Meta to help me find startups in three areas I discussed in a recent post:
- American-made APIs
- Robots to pick up litter
- Robochefs for the home

Meta found some interesting startups like rStream, which separates trash from recyclables. Still, that’s not really what I asked for. I wanted robots that pick up litter, not sort it.
The results were close, but not close enough to be very useful. I’m giving this round a B- as well.
Round #3: Daddy Needs a New Coat
For the final round, I did something totally different. I noticed that Meta has a shopping feature. Maybe this can help me find a deal on the parka I’ve been looking for?
I was very specific about size and color. Let’s see what Meta can do…

Meta pulled in some amazing deals. Maybe it’s finally time to buy this thing!
Not so fast. I clicked on the links, and they just go to the home page of these websites or to listings for sizes and styles I don’t want. These suspiciously cheap coats don’t exist.
Meta was a total bust on this one. I’m giving this round an F.
Wrap-Up
Meta’s new model disappointed me. I’m giving it an C- overall.
Muse Spark really is much better than Llama, Meta’s prior model. But it’s still way behind the frontier LLMs like Claude, ChatGPT, Gemini, and Grok.
It doesn’t always follow prompts correctly. It also tends to cite unreliable or irrelevant sources.
Zuckerberg has invested around $140 billion so far in AI. All he has to show for it is a mediocre model that’s way behind the leaders.
If Zuck and his team don’t come up with something better soon, Meta will be left behind.
More on tech:
Gemma 4: Strong on Basic Tasks, Weak on Real Research
Grok Imagine Quality Mode: Jaw-Dropping Images, But Text Still Struggles
Automating My Investing with Lovable
Save Money on Stuff I Use:
This platform lets me diversify my real estate investments so I’m not too exposed to any one market. I’ve invested since 2018 with great returns.
More on Fundrise in this post.
If you decide to invest in Fundrise, you can use this link to get $100 in free bonus shares!
I used this app every single day to dictate to my computer, I’m even dictating this text using Wispr Flow! It’s way better than Apple’s native dictation.
My productivity is up about 25% since I started dictating rather than typing. I’m also less tired and stressed.
Get a free month of Wispr Flow Pro here!


















