When you start trying to make an agent, you realize how much your feedback, rerolls, etc are making chat based llms useful
the error correction mechanism is you in a chat based llms, and in the absence of that, it's quite easy for agents to get off track
you can of course add error correction mechanism like multiple llms checking each other, multiple chains of thought, etc, but the cost can quickly get out of hand
I have an AI agent that wrote myself; I use it on average 5x per week over the last 6 months. I think it's moderately useful. I mostly use it for simple shell tasks that would otherwise require copy-pasting back and forth with claude.ai.
My guess is that the big AI companies don't think the market for this is big enough to be worth making a product out of it.
Anthropic's computer use model and Google's Deep Research both do this. Training systems like this to work reliably has been a bottleneck to releasing them
I can't help but wonder if part of the answer is that they seem dangerous and people are selecting out of producing them.
Like I'm not an expert but creating AI agents seems extremely fun and appealing, and I'm intentionally working on it none because it seems safer not to build them. (Whether you think my contributions to trying to build them would matter or not is another question.)
Intuitively, the AutoGPT concept sounds like it should be useful if a company invests in it. Yet, all the big publically available systems are seem to be chat interfaces where the human writes a messages and then the computer writes another message.
Even if AutoGPT-driven by an LLM alone wouldn't achieve all ends, a combination where a human could oversee the steps and shepard AutoGPT, could likely be very productive.
The idea sounds to me like it's simple enough that people at big companies should have considered it. Why isn't something like that deployed?