“We tried ChatGPT. It wasn't great.”

I hear some version of this on almost every discovery call. A manufacturing ops lead, a sales director, a CEO who saw their kid use it for homework. They opened ChatGPT sometime last year, asked it to write an email or summarize a document, got something mediocre, and moved on.

That experience became their mental model for “what AI can do.”

I don't blame them. If the only car you'd ever driven was a 2003 Corolla with a slipping transmission, you'd be skeptical of someone telling you that Formula 1 exists. The gap between free-tier ChatGPT from last year and a production-grade Claude deployment in 2026 is genuinely that large. And it's a gap almost nobody outside the tech industry can see.

What did Karpathy just say about the AI gap?

Andrej Karpathy, one of the most respected AI researchers alive (co-founded OpenAI, led AI at Tesla, built the Eureka Labseducation platform), posted something recently that crystallized exactly what I've been trying to explain to clients for months.

He described two groups of people talking past each other:

Group 1tried the free tier of ChatGPT sometime in 2024 or 2025. They saw hallucinations, laughed at viral videos of AI fumbling simple questions, and concluded that AI is overhyped. You've probably seen the clips. OpenAI's voice mode struggling to answer whether you should drive or walk to the carwash. Stuff like that.

Group 2pays for frontier models, uses them professionally in technical domains, and watches these models melt through problems that used to take days or weeks. Karpathy's words: the improvements in 2026 have been “nothing short of staggering.”

These two groups are living in different realities. And they're having completely different conversations about what AI means for their work.

Anthropic's vision for Claude AI: a space to think, not just a chatbot — Anthropic describes Claude as “a space to think.” That framing is closer to reality than most people realize. Image: Anthropic

Why the gap is so wide

Karpathy explains this with a technical insight that I think every business leader should understand, even if they never touch a line of code.

The areas where AI has improved most dramatically are areas with verifiable outputs. Programming, for instance. You write code, you run it, it either works or it doesn't. That binary feedback loop is gold for training AI models. The technical term is “reinforcement learning with verifiable rewards.” When the AI can be told “yes, that's correct” or “no, try again” millions of times, it gets very good very fast.

Writing a witty response to “should I drive to the carwash?” There's no clean way to verify if that answer is “right.” So it improves slowly.

The second factor: money follows value. The companies building these models deploy their best people on the problems worth the most. And right now, the biggest revenue comes from B2B technical applications, not consumer chat. So the smartest engineers at Anthropic and OpenAI are laser-focused on making Claude Code and OpenAI Codex terrifyingly capable, while the free chatbot experience gets less attention.

The result? A growing chasm. The free experience stays mediocre. The paid, frontier, professional-grade experience accelerates.

Does the same dynamic apply outside programming?

Karpathy's post focuses on programming and technical work. Fair enough. That's his world. But what surprised me over the past year is that the same dynamic plays out in business operations.

Think about what makes programming amenable to AI improvement: verifiable outputs and high dollar value. Now think about a manufacturing sales team generating a machine offer.

The output is verifiable. The pricing either matches the price list or it doesn't. The spec sheet either includes the right components or it doesn't. The document either follows the company's format or it doesn't. These are all binary checks. And the dollar value? A single offset printing press offer can be worth $200,000 or more (based on Orient Printing & Packaging's average deal size, a 79-year-old manufacturer selling into 50+ countries). Getting it right matters. Getting it fast matters more.

I watched one of our clients — Orient Printing & Packaging, again — go from spending two hours building each sales offer to generating one in about 15 minutes (measured over the March 2026 offer-generation rollout). Not because AI is magic. Because we gave Claude the right context: their price lists, their sample offer templates, their brand guidelines, their product images. With structured instructions and the right reference material, Claude doesn't hallucinate prices. It pulls them from the actual spreadsheet.

That's a different planet from “write me a marketing email.”

It really is two different products

I think the most important line in Karpathy's post is this: it's “simultaneously the case” that the free voice mode will fumble basic questions on Instagram reels, while the highest-tier model will “go off for 1 hour to coherently restructure an entire code base.”

Both things are true at the same time. And that's what makes this moment so confusing for people who aren't deeply embedded in it.

Your experience of AI depends entirely on which product you used, when you used it, and what you asked it to do. If your last interaction was free ChatGPT asking for a recipe, you're forming an opinion based on maybe 5% of what's actually possible. It's like judging the entire internet based on a 1998 dial-up connection to AltaVista.

What this means if you run a business

I'm not going to pretend this is simple. The gap Karpathy describes creates a real problem for decision-makers.

If you're a CEO who tried ChatGPT and found it unimpressive, you're probably not prioritizing AI adoption. Why would you? Your direct experience told you it wasn't ready. Meanwhile, your competitor down the street hired someone who understands frontier models, gave Claude the right instructions, and now their sales team generates offers in 15 minutes while yours takes two hours. Their customer service reps have an AI that actually understands the product catalog. Their operations team has an AI that can cross-reference production schedules with material availability.

You won't notice this gap for a while. It doesn't show up as a single dramatic event. It shows up as your competitor being slightly faster at everything, slightly more responsive, slightly more consistent. By the time it's obvious, you're a year behind.

The uncomfortable truth

I genuinely struggle with how to communicate this without sounding like I'm selling fear. I run a company that deploys Claude into businesses. Of course I'm going to say AI is important. I know how that sounds.

But the thing is, I started Settle because I saw this gap forming before Karpathy wrote about it. I was using Claude professionally, watching it get dramatically better every few months, and realizing that most businesses had no idea. They were still thinking about the chatbot they tried a year ago.

The gap isn't shrinking. Every model update, every new capability like computer use and managed agents, every improvement to tool use and structured outputs widens it. The people using frontier AI are pulling further ahead. The people who dismissed it based on an old free-tier experience are falling further behind without knowing it.

Karpathy's framing is clean: two groups, talking past each other. My worry is that for most businesses, by the time they realize which group they should have been in, the distance will be hard to close.

What I'd actually recommend

If you're reading this and you're in Group 1, here's what I'd say:

Don't trust your old experience. The AI you tried isn't the AI that exists today. Get someone who knows the frontier models to show you what they can actually do inside your specific workflow, with your actual data, against your real problems. Not a generic demo. Not a marketing video. Your workflow.

If the result isn't impressive, fair enough. Walk away. But at least you'll be making that decision based on what's actually possible in April 2026, not what was possible in the ChatGPT free tier eighteen months ago.

The gap is real. It's growing. And the only people who can't see it are the ones who stopped looking.