It seems like GPT-4 is going to be coming out soon and, so I've heard, it will be awesome. Now, we don't know anything about its architecture or its size or how it was trained. If it were only trained on text (about 3.2 T tokens) in an optimal manner, then it would be about 2.5X the size of Chinchilla i.e. the size of GPT-3. So to be larger than GPT-3, it would need to be multi-modal, which could present some interesting capabilities.
So it is time to ask that question again: what's the least impressive thing that GPT-4 won't be able to do? State your assumptions to be clear i.e. a text and image generating GPT-4 in the style of X with size Y can't do Z.
Wait actually, this is interesting. Because I bet GPT-4 could probably convince many (most?) people to brush their own teeth.
Even with actuators, you need a compliant human subject, eg, someone who has been convinced to have their teeth brushed by a robot. So "convincingness" is always a determing factor in the result. Convincing the person to do it themselves is then, basically the same thing. Yano, like AI convincing its way out of the box.
Except in this case, unlike the box hypothetical, people universally already want their teeth to be brushed (they just don't always want to do it), and it is a quick, easy, and routine task. GPT could probably dig up incentives and have a good response for each of the person's protests ("I'm tired", "I just did 2 hours ago", etc). It would be especially easy to be responsible for a counterfactual tooth-brushing given the reader skips often.
This is a measurable, non-harmful metric to see how convincing LLMs are, and is making me think about LLM productivity and coaching benefits (and some more sinister things).