More, more more: Still More Agents and Reasoning

The trend of using test time compute (“reasoning”) to go deeper and get better answers continues, as all the LLM providers work on new “research” models. First Google came out with Deep Research last year…then OpenAI released theirs (with the same name) three weeks ago, and Perplexity followed suit (also with the same name) last week!

In model news, X.ai released Grok 3, which they’re saying is a huge upgrade from Grok 2 (and it appears to be the case). Grok 3 stands in stark contrast to DeepSeek; it was trained with massive compute, not with optimization. And in typical Elon Musk style, it was trained very quickly. What’s most important for us to know is that Grok 3 has a “Deep Search” mode – an acknowledgment that LLMs are not sufficient even for public knowledge, much less for enterprise knowledge – you have to have (good) search!

(side note: I couldn’t find much information about why Grok’s “Deep Search” is deep or why it’s different, so it seems to be just the usual RAG that everyone’s using, and “deep” is just a marketing term)

But the biggest news in the enterprise RAG space is Google’s announcement of AgentSpace. This is basically their push into the realm of enterprise RAG and it looks like a direct response to Glean…and Copilot for that matter. RAG on enterprise data…but of course, common enterprise data (Jira, Sharepoint, etc.) for general business work, just like Copilot. One interesting twist is that Google has included their NotebookLM technology which provides a nice workspace for ongoing work projects.

However, and this is the most important takeaway this week: no one has a killer app. These companies are a solution looking for a problem – their demos (to say the least) are not high-value use cases!

Google’s demo is “summarize my open tickets in Jira” (hmm, can’t I just go to Jira and see the list?) and “send that to my manager in an email” (just what a manager wants…a list of open tickets s/he can ALSO see in Jira!) and “add an expense to my expense report” (sure it’s nice that it fills in a few numbers on the expense for me…but come on).

Glean is no better – I said their demo last week was impressive – and it is – but their use cases are similarly low-value: “Summarize my day” (because, sure, I’d much rather read a lengthy AI description of my calendar instead of…oh, I don’t know…look at the events on my calendar?)

My take on why does it matter, particularly for generative AI in the workplace

Despite all of the generative AI hype, broad adoption is taking a long time. This is in part due to the fact that it’s a general-purpose technology. It’s like electricity. No one says “yeah, I wanna get me some of that electricity!” But we do want things that run on electricity – our toaster, our TV, our toothbrush. Generative AI is similar, and we haven’t yet found the so-called “killer app” – the thing that generative AI powers that everyone wants.

So that will be the holy grail for generative AI. Finding specific applications that add value, that allow us to do new things, or do old things in amazing new ways. Right now, everyone is still searching for that killer app.

More, more more: Still More Agents and Reasoning

Like this:

Discover more from