Tag: LLM
-
Chinese AI startup MiniMax is going public this week at a $7 billion valuation. But its model flopped in my testing. I ran MiniMax through three tests with real world questions. These are harder to game than benchmarks. Let me show you where MiniMax does well and where it struggles……
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, China, Finance, Investing, Large Language Model, LLM, Startups, Stock Market, Tech, Technology, Venture Capital+
+
+
+
+
+
+
+
+
+
+
+
-
China is about to launch its first AI model IPO, Zhipu AI. Investors may be excited, but Zhipu’s product is weak. China’s first IPO of an LLM startup, Zhipu AI, will start trading Thursday. The IPO is expected to value Zhipu at nearly USD $7 billion. This morning, I ran…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, China, Finance, Investing, IPO, LLM, Tech, Technology, Venture Capital+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
The Allen Institute for AI recently released its most powerful model ever: Olmo 3. Olmo is cheap to train, making it perfect for anyone training their own model. Olmo 3 is 2.5 times more efficient to train than Llama 3.1 based on GPU-hours per token. Olmo is also much more…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, LLM, Model Training, Open source, OSS, Tech, Technology, Venture Capital+
+
+
+
+
+
+
+
+
+
+
+
-
I tested the top models from Grok, Gemini, and ChatGPT head to head this morning. Gemini won, showing incredible power at research and sourcing. The last month has seen some amazing releases from the top AI labs. xAI released Grok 4.1 Thinking, Google released Gemini 3.0 Pro, and OpenAI dropped…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, Elon, Elon Musk, Entrepreneur, gemini, Google, Grok, LLM, OpenAI, Silicon Valley, Startups, Technology, xAI+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
I ran OpenAI’s new GPT 5.2 through a three round test. It still scores below Grok and Gemini. Some of GPT-5.2’s responses are excellent. But the quality of its answers are inconsistent. Let me show you where this model excels and where it falls short… Round #1: Learning About Needle…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, gemini, Google, Grok, LLM, OpenAI, Silicon Valley, Startups, Tech, Technology, Venture Capital, xAI+
+
+
+
+
+
+
+
+
+
+
+
-
Gemini 3.0 Pro dominates AI benchmarks. But in my real world testing, Grok 4.1 Thinking comes out on top. Google’s new mode is ranked #1 in LMArena. It scores off the charts in a variety of AI benchmarks, like Humanity’s Last Exam. But the best way to test a model…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, Elon, Elon Musk, gemini, GOOG, Google, Grok, Large Language Model, LLM, Technology, xAI+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
Elon just dropped Grok 4.1. I tested it this morning. This is the best model I’ve ever used. xAI’s claims 4.1 has fewer hallucinations than prior models. It ranks number one on LMArena, ahead of Gemini, Claude and ChatGPT. Let’s see what this thing can do! Round #1: Are Consumers…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, Elon, Elon Musk, Grok, LLM, Startups, Tech, Technology, Venture Capital, xAI+
+
+
+
+
+
+
+
+
+
+
+
-
This morning, I tested OpenAI’s new GPT-5.1. It still falls behind the best models from xAI, Google, and Kimi. When I tested GPT-5 in August it performed poorly, notching a C-. It gave me outdated data and struggled to cite sources. OpenAI claims that GPT-5.1 is better at reasoning and…
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
Kimi just dropped Kimi 2 Thinking. I tested it this morning. It’s as good as Grok 4, the best model I’ve ever used. In July, I ran the prior Kimi through testing. It scored a B+, solid but below Grok 4. This time, Kimi performed far better. Let me show…
+
+
+
+
+
+
+ AI, Artificial Intelligence, ChatGPT, China, Kimi, LLM, Open source, Startups, Technology, Venture Capital, Writing+
+
+
+
+
+
+
+
+
+
+
+