All it took to prove me wrong was a single major company deciding to take a shot at a technically-feasible task
Google/YouTube were working heavily on multimodal models >18 months ago, including for music/non-speech audio. Some of those have been publicly demonstrated over the last year, e.g. text -> music. Knowing this, I fully expected Gemini to be highly multimodal (I was actually expecting more music related capabilities than they've mentioned so far in their publicity materials).
“buy an egg whisk” actually uses your credit card to buy the egg whisk, not just add it to your shopping cart
I know of an AI company that internally actively considered trying to build this, and rapidly decided current LLMs are too unreliable for it to be a good idea. So, good prediction.
The time between the first public release of an LLM-based AI that uses tools, and one that is allowed to arbitrarily write and execute code is >12 months: 70%
You.com already has an AI agent that does this (for Python) — it works pretty well. So does OpenAI (for Mathematica). Both are sandboxed and resource constrained, but otherwise arbitrary.
In my accounting, the word "arbitrarily" saved me here. I do think I missed the middle ground of the sandboxed, limited programming environments like you.com and the current version of ChatGPT!
Fair enough, I was wondering how strongly you meant 'arbitrarily'. I work for You.com, and we definitely quickly thought about things like malicious or careless use, rapidly came to the conclusion we needed a sandbox, investigated and found that facilities for running untrusted code in a sandbox are already pretty widely available both commercially and open-source, so this wasn't very challenging for our security team to implement. What's taking more time is security vetting and whitelisting Python libraries that are useful for common user use-cases, but don't provide turn-key capabilities for plausible malicious misuses. Doing this is made easier by the fact that current LLMs cannot write very large amounts of bug-free code (and if they could, it's trivial to see how much code has been written).
9 months ago I predicted trends I expected to see in AI over the course of 2023. Here's how I did (bold indicates they happened, italics indicates they didn't, neither-bold-nor-italics indicates unresolved):
See the original post for evidence/justification for why these did/didn't resolve true. If you disagree with any of these, let me know. Note especially for prediction 2, about video processing LLM, I am NOT counting Gemini as a "production-ready" model using video inputs because Google has confirmed they “used still images and fed text prompts” to make the trailer.
Some commentary: