There is Soooo Much Happening in AI

Last week was a BIG, BIG week. And I’m not talking about tariffs or the stock market. Some absolutely huge announcements from some big players give a glimpse into the future. Many were announcements or pre-releases, not releases, so time will tell how fast and how far this goes, but the direction is clear and, and they want progress to be quick!

Microsoft announced the expansion of Copilot into their vision for the future: an AI companion. Not just an AI companion. YOUR AI companion, emphasizing personalization to you and your needs, so that everyone’s personal AI Copilot will be different. This is very much focused on Copilots for individuals rather than for business (although we can expect a similar evolution on the business side too). Included with this announcement are MANY expanded capabilities:

  • Memory: with a strong focus on personalization, it remembers things about you to make its actions more suited to your tastes
  • Actions: as we’ve seen with others, Copilot is now capable of performing actions using a web browser
  • Vision: with your phone’s camera Copilot can see what you see and respond accordingly
  • Pages: create your own projects by collecting and organizing subjects for you to interact with AI, the same as with OpenAI’s Canvas, Anthropic’s Artifacts, and Google’s NotebookLM
  • Podcasts: Like Google’s NotebookLM, Copilot can now generate podcasts based on the content you provide it
  • Shopping: Use an AI Agent to streamline shopping and alert you when prices drop
  • Deep research: just like OpenAI and Google…search + test-time compute to add reflection and reasoning capabilities
  • Search: this is the new search experience, one that is conversation-oriented instead of a list of links (provided through Bing). It’s just like Google’s AI Mode (part of Google Labs, still labeled as an experimental).

Stanford released their annual AI Index report, a massive (413 pages + appendices) collection of excellent information about AI. Waaaay too much info to summarize here, but if you want a good “big picture” view of how AI is progressing, its adoption, its impact, and global trends, I suggest you follow the link to see their 12 Top Takeaways or perhaps the Report Highlights in first few pages of the report. But I will share a few detailed observations that may be of interest here:

  • Speed: for major models, compute for training doubles every five months, size of training datasets every eight months, power consumption every twelve months, and hardware performance every 1.9 years
  • Cost: Depending on the task, inference costs have fallen anywhere from 9 to 900 times (GPT-3.5 costs 1/280th of its cost 18 months ago)
  • Quality: (if you believe the benchmarks) the gap between open-weight and closed-weight models is small enough to be almost irrelevant
  • Benchmarks: models are blowing through benchmarks so quickly we need new, more challenging benchmarks
  • Complex reasoning remains a problem (as does bias)
  • Agents show promise
  • Robots are coming with more AI applications there (as well as in medical devices)

I could easily write a whole post about these findings…

Meta released the first of the models in the Llama 4 family. The usual story – new models, besting a lot of benchmarks, even beating some of the best commercial models on some of the benchmarks. Two things are notable here:

  1. These appear to be a major step forward in training but don’t have a corresponding step forward in performance. Lots of noise on the ‘net about the diminishing returns of larger models and that we’re reaching the point where more pre-training doesn’t seem to be the path to significantly better performance (this is the trend, perhaps with one exception: Sam Altman’s claim last week that they are seeing enough improvement to release a GPT-5).
  2. These are the first open-weight multimodal models, they’re the first Llama models that use mixture of experts (MoE), and they were trained in part by distillation from a much larger model (Behemoth) that is “still training.” Behemoth is indeed very big – 2T parameters trained on 30T tokens (spread across 16 experts, so 288B parameters are active at any given time). It also supports a 10M token context length, although it’s really a virtual context achieved through some fancy manipulation; the actual max context used in training was 256K tokens.

Amazon released Nova Reel 1.1, the latest in their video generation capabilities. Clearly a move forward for them, with the capability of creating 2-minute videos with more consistency and specificity than before.

OpenAI gave ChatGPT a better memory about past conversations so you don’t have to start over every time, or search for a previous conversation. It just “knows” (like what Google announced in February). They also announced the Pioneers Program to improve results in specialized domains. They want to partner with companies in specific domains to build domain-specific models as well as evaluation sets to benchmark the performance of those models. This is kind of an alternate path for OpenAI to be valuable for businesses without search and RAG.

But, Google. Google, Google, and more Google…they were on a tear this week!

  • The biggest announcement is their agent-to-agent protocol. This is intended as a standard for all agents to follow so that they can interact with one another regardless of what system they were built on. It will “allow AI agents to communicate with each other, securely exchange information, and coordinate actions on top of various enterprise platforms or applications.”
  • They also announced additions to Agentspace. This is their enterprise AI/agent development and deployment platform that empowers agents with enterprise data and search (RAG). It was originally launched in December but these are major upgrades to expand its scope and of course it fully complies with their new A2A protocol. It’s clear they’re going all in, I’m sure they desperately want to draw businesses away from Microsoft’s Office/Copilot stranglehold the same way Bing is trying to grab internet search! The key expansions include:
    • incorporating Google’s multimodal search into Agentspace directly from Chrome
    • access to an Agent Gallery
    • the ability to build agents with Agent Designer
    • access to Google agents like Deep Research. 
  • They have incorporated agents into their backend-as-a-service development platform (for building web and mobile apps). Previously called Firebase, now with agents to help you more rapidly develop and manage your apps it’s called Firebase Studio. This is significant because they see the agents as ready for production, at least in this focused context (and definitions of agents still vary).
  • And they nicely pointed out that Google Vertex is the only one-stop-shop for models for all modalities. No other vendor has the breadth of models that Google does.

Although benchmarks are questionable, they’re the best way we have to measure performance of these models. Since different benchmarks measure different things, perhaps the AAI Index is the best “overall” measure, combining scores across 7 benchmarks…and Gemini 2.5 Pro is #1…which is quite significant because it’s also one of the least expensive (with the notable exception of o3-mini).

Note that the top 5 models are all “reasoning” models, that use more tokens at inference to improve results.


My take on why does it matter, particularly for generative AI in the workplace


All the vendors are pushing an agentic vision of the future

A vision where AI doesn’t just move information around but actually does stuff. That will be transformative in almost every way, for both our personal and professional lives, not to mention companies and economies and countries. But they’re not talking about the unreliability of the current technology, so near-term adoption will only be for narrow, low-risk situations. Complex, high-risk scenarios are not easily addressed with today’s approaches and bigger models won’t fix that; broad adoption will take many years unless new technology comes along.

Models are commodities

In fact, most of the features tacked onto the models (memory, search, etc.) are commodities. Why? One, there isn’t a material difference in performance. Two, there isn’t a moat – as soon as one provider has something unique, it is soon copied by the others (Google’s NotebookLM podcast feature might be the record holder for longest time without a major competitor…it took 7 months for Microsoft to copy it). That means that cost will be the key factor, and although there is a lot more to squeeze out of this tech, Google currently has the lead.

Copyright (c) 2025 | All Rights Reserved.


Discover more from

Subscribe to get the latest posts sent to your email.

One response to “There is Soooo Much Happening in AI”

  1. Kevin Evernham Avatar

    When Copilot AI companion remembers how much paper towel I want in the restroom, then I will be impressed. “Application of software is achievement, not the capability of the software.” Or, when I can tell my Copilot AI companion to not feed me data based on my history or interests, but to actually introduce new and unrelated content, then I will be impressed.