Saying What You Want
There is a hierarchy of useful interfaces for tools that goes something like this: 1. Figure out what you want to do, then how to use the tool to achieve that, then carry out those actions yourself (hammer, machining workshop) 2. Figure out what you want to do, then how to use the tool to achieve that, then convert that into the tool's language (washing machine, 3d printer, traditional programming) 3. Figure out what you want to do, then how to use the tool to achieve that, then specify in a format natural to you (current-gen AI) 4. Figure out what you want to do, then specify in a format natural to you (next-gen AI) 5. Figure out what you want to do (???) 6. The tool figures out what you want to do (???) Each level is clearly more useful than the one before it, all else equal. All else is never equal. There is a serious question about feedback being lost as we move up the levels of the hierarchy. Tight feedback loops are, after all, key to creative expression. This is where "a format natural to you" is doing some heavy lifting. Higher levels can still create specialized interfaces (the creation of those interfaces can be specified in natural language) with tight feedback loops and intuitive, tactile user experiences. We're currently breaking into level 3. If AI progress continues at even a fraction of the pace it has for the last 5 years (and there are still a number of low hanging fruit to be picked), we will soon reach level 4. Level 5 would need to read your mind or something similar, which intuitively (fight me) seems a pretty long way off. As far as I can tell, once we reach level 5, there aren't any obvious blockers on the way to level 6, so I speculate that level 5 will probably be pretty short-lived. So we're likely to be in level 3-4 for the near future. You might notice that these levels have an unusual step in common that the others don't: "specify in a format natural to you". This doesn't necessarily mean plain English (or whatever your nativ
My wife, who uses LLMs pretty much all day, says that Claude Opus 4.6 feels more 'mature' than 4.5.
Anthropomorphizing models is dangerous, but it's always a bit of a delight when I notice us talking about software using human personality traits. It clearly points at coherent concepts, pieces of software now have distinct 'personalities' that compare naturally to those of humans.
Anyway, I use LLMs a fair bit too, and tend to agree with my wife's assessment. Anecdotally, it is more cautious, more likely to catch itself going on a tangent, and tends towards more of a neutral stance than 4.5.
Does this match your experience?