Three Years In

Happy New Year everyone! It’s been three years since ChatGPT was released, and what a crazy three years it’s been! 2026 promises to be just as, if not more, exciting than 2025 in the world of AI. Last year was supposedly the year of the AI Agent…but if you’ve been reading this blog, you know there was more hype than reality.

Unlike 2024, this year there was no 12 days of OpenAI. The gimmicks and marketing hype has slowed down a bit, as it’s been a bit harder to get people excited about the latest and greatest model. Instead, amid questions of “where is the value for all this investment,” companies want to see real impact. That impact is very real in the two lead use cases – coding and customer service, and it’s starting to spread wider. It will be interesting to watch how far it goes in 2026.

Predictions for 2026

Predictions are nearly useless with how fast things are changing, but I’ll venture a few. Here is what I think we will see this year:

  • This will the year of the AI agent…we will see them deployed in production at many companies
  • But for every production success there will be multiple failures, as people realize the Achilles’ heel of these models – that they are prediction machines, not real-world machines – greatly limits their application, without very strong scaffolding surrounding and supporting their deployment
  • Because of these failures, we will enter the trough of disillusionment with AI agents
  • But companies that take a robust approach, build proper scaffolding, use advanced RAG, and sophisticated validation pipelines, will see agentic AI success at scale
  • Scaling for pre-training levels off (if it hasn’t already), and scaling efforts shift to focus on post-training (reinforcement learning of various kinds) and inference
  • The US continues to dominate closed models; China continues to dominate open models
  • But this year the models become less important. What is built on the models is more important, as the LLM providers seek to move up the value chain and deliver apps that do things, not just models.

Competition Intensifies

Meta bought Manus, a company with an impressive general-purpose AI agent. This makes a lot of sense. Meta wasted billions on the Metaverse, which was a bust. They’re behind in AI; their Llama models are essentially irrelevant. They need a leg up, and agentic is the future and Manus has been strong in agentic AI. Hopefully they make the right decisions and turn this into a boost.

NVIDIA announced Rubin, its next-generation AI chips about six months ahead of schedule (!!). They claim it reduces inference costs by 10x and reduces training costs by 4x. We’ll see if those numbers play out in the real world, but NVIDIA’s dominance is still strong, driven largely by their proprietary software (CUDA). While people tend to think of NVIDIA as a hardware company, their software makes them the clear leader for deploying massively parallel compute, and it’s difficult and costly to switch.

Applications Become the Focus

Big announcements from the big players – OpenAI, Anthropic, and Google. But none of them are about new models. That’s telling.

Google added AI to its email for all users. If you use Gmail, you now have baked-in AI capabilities to help you manage your emails (AI Overviews), draft responses (Sugested Replies, Help Me Write, and Proofread), and guidance on what to do with your inbox clutter (AI Inbox).

OpenAI announced ChatGPT Health (not generally available yet, there is a waitlist). Of course, many people are already leaning heavily on AI for medical advice, so they’re just springing on the opportunity. Of course, this was amusing:

“It is not intended for diagnosis or treatment.”

Really? Then what exactly are you SUPPOSED to use it for? “Hey ChatGPT, here are my medical records. Do you think I should ask Suzanne out on a date Friday night?”

Not wasting any time, a few days later Anthropic announced Claude for Healthcare. This is a set of tools, capabilities, and knowledge (connecting to medical data sources) to improve how Claude performs on medical-oriented tasks and increasing its usefulness in healthcare.

And more apps…

Anthropic also gave an update on Claude running a vending machine business. It’s a very interesting read on what it’s like to try to use AI to do something in the real world. While this test was much more successful than the previous one, AI isn’t ready to run a business. We still need humans.

Anthropic also released Cowork, a way for Claude to do things on your computer. It’s based on Claude Code, you can give it access to a folder and ask it to perform actions on those files. It can organize them, create a list of their contents and put it in a spreadsheet (for instance, if they were expense reports), etc.


My take on why does it matter, particularly for generative AI in the workplace


It’s telling that the big announcements from the big players for the past few weeks were not about improved models. They were about new and improved applications.

Yes, there will be many new model announcements this year. And there is still plenty of room for the models to get better (less so through pre-training, but certainly through better data, better techniques, better post-training, and more inference). But the fact is, the models have gotten so capable that it’s getting harder to get as excited about the latest and greatest model, the one that scores 63 points on the latest test-o-meter compared to the next best model at 62.

Google announced apps. OpenAI and Anthropic talked about verticalization. And Anthropic also talked about apps – an app for doing things on your computer, and an app for running a vending machine. This is the future. The real value isn’t in a human having a conversation with a computer. The real value is in the computer being able to get real-world workdone in specific domain in a trustworthy fashion.

Real-world work

That’s all the email stuff that Google is providing. It makes your life easier and allows you to get more done faster.

Specific domain

Generic approaches don’t work for healthcare. The human body is too complex and nuanced for “average” to work. So they’re specializing. They’re focusing on domain-specific data, verticalized tools, and 

Trustworthy

LLMs aren’t trustworthy (more on that in a coming post) because for all their power, they are still just prediction machines. They don’t understand the real world, so if they are asked to do something just a little outside their comfort zone, they hallucinate. Yes, the improvement that Anthropic saw with their vending machine exercise was in part due to a better model – but they emphasized that the real improvement was scaffolding. In other words, it was what they put around the model – a clear goal, sound guardrails, solid knowledge about the task – that made it more successful.

This is exactly what we see with our customers.

AI for Health

OpenAI and Anthropic both announced expanded capabilities for medical and health topics. But their target audience is very different. Where OpenAI is focusing on consumer applications, Anthropic is focusing on business applications. This is consistent with their corporate focus; OpenAI owns the consumer LLM market while Anthropic is strongest with business use.

I should also mention that security is definitely a concern. Although OpenAI says they have taken many extra security steps, that has not been their strong suit. I think the apps on my Mac reveal corporate mentalities. ChatGPT on my Mac requires “full disk access” for no reason; I’ve never asked it to work with local files (that I haven’t uploaded to it). On the other hand, Claude on my Mac requests permission to access files, and it asks permission for every single folder, not the entire drive.

Copyright (c) 2025 | All Rights Reserved.


Discover more from

Subscribe to get the latest posts sent to your email.

Leave a Reply