April Fools and AI News

Lots of releases, and some fun insights into how these models work…and don’t, thanks to April Fools’ Day! Here are the main happenings:

Runway released Gen-4 of their video generation model. The big deal about this model is that it is better able to show character consistency and scene consistency for longer videos (although from watching the videos, it’s still far from perfect). They also said it “understands physics” but of course it doesn’t, it’s just better at matching the visual patterns that occur as a result of physics.

Amazon announced Nova Act, an agent that can control a computer through a web browser. This follows OpenAI’s Operator, Google’s Mariner, and Anthropic’s Computer Use.

From the AI-is-moving-too-fast department: OpenAI (Sam Altman) said they are going to release GPT-5 soon. This is a complete about-face from what he said only a month (!!) ago – he said GPT-5 would be a consolidation of many models, accessible from one interface, so you wouldn’t have to think about and choose which model you wanted. He now say that’s turning out to be “harder than we thought” and GPT-5 will be “much better than we thought.” I could write a whole post about this alone, but it’s super interesting that:

They can’t predict the future at all; things are changing so fast that nobody – not even the de facto market leader – can guess what’s next
If GPT-5 really is going to be that much better, is that because they found a different approach or is there still a lot of headroom for scaling during pretraining? Consensus says “no” to big improvements from pretraining.
Or maybe they just can’t stand being in second place. Google’s recently-released Gemini Pro 2.5 is the current “best” model. Maybe the latest fundraise and the valuation demands that they put out the best model, even if it’s only a little bit better.

If you’re interested in learning more about AI at your own pace, you’re in luck. OpenAI released the OpenAI Academy, a collection of materials (and webinars) that cover a variety of topics relating to generative AI. I haven’t checked it myself but it’s getting a lot of positive reviews.

But the most significant event of the week, particularly as it relates to making LLMs useful for getting work done (especially in business, especially in factual contexts) is yet another excellent example of why it’s super critical for companies to have and use the best-quality RAG possible on the best-quality data. It’s the same old story – LLMs don’t actually know anything, so garbage-in, garbage-out. As an April Fools’ day joke, Timothy Gowers published this on X:

Since this is a joke, created on April 1, if you ask an LLM about the Dubnovy-Blazen math problem, it obviously hasn’t seen this as part of its training data. Jonathan Oppenheim wanted to know what AI would say about it. It (in this case Google’s Gemini) searches the internet using RAG…finds the post on X…and presents Grok’s success as fact:

This is another example of how:

LLMs don’t actually have any understanding, and despite all the fancy appearances of intelligence and reasoning, at their core they are still just stochastic parrots.
- LLMs aren’t suspicious that the only mention of a “well-known problem” is on X
- LLMs aren’t suspicious that all the mentions of a “well-known problem” appear in the last few days
- LLMs aren’t suspicious that the second most relevant source (about “the General Position Problem”) isn’t relevant at all
- LLMs aren’t suspicious that the first post was on April 1

(thanks to Gary Marcus for sharing this)

My take on why does it matter, particularly for generative AI in the workplace

This example showcases the limits of LLMs that are frequently glossed over amid all the hype, either because people are focused on the positive or because they think they’ll be solved with current techniques (no evidence that is the case) or because they’re selling something. It reinforces what I’ve been saying for a while: LLMs are awesome when there is no base truth, or when approximate is good enough. But when facts are involved (which is what this blog is focused on: generative AI in enterprises):

LLMs cannot truly reason – they match patterns that mimic understanding
LLMs do not truly understand – they match patterns that give the appearance of understanding
The old adage of garbage-in, garbage-out still applies
To get factual information from an LLM the only hope is to ground it with the truth using Retrieval-Augmented Generation (RAG)
Good responses depend on giving it good truth to start with: accurate, focused, and consistent information (you need high-quality RAG)
For good responses you need good source data; either sources that have been curated or sources that the LLM knows how to deal with.
If a bad actor has the ability to inject false information, don’t expect the LLM to figure it out. LLMs are easily fooled.

Like this:

Discover more from