It seems like GPT-4 is going to be coming out soon and, so I've heard, it will be awesome. Now, we don't know anything about its architecture or its size or how it was trained. If it were only trained on text (about 3.2 T tokens) in an optimal manner, then it would be about 2.5X the size of Chinchilla i.e. the size of GPT-3. So to be larger than GPT-3, it would need to be multi-modal, which could present some interesting capabilities.
So it is time to ask that question again: what's the least impressive thing that GPT-4 won't be able to do? State your assumptions to be clear i.e. a text and image generating GPT-4 in the style of X with size Y can't do Z.
Safely brushing a person's teeth? To try to state my assumptions more clearly: a text/image model of any size or training style won't be able to do basic tasks like brushing teeth or folding laundry or whatever, due to lacking actuators. Though you could probably hook it up to actuators in some way, but in that case it will 1) not just be GPT-4 but also an extended system containing it, 2) have too high latency to be practical, 3) not be safe/reliable (might possibly be safe for narrow tasks after fine-tuning for safety in some way, but not in an open-ended sense).
I was thinking with a willing human that just stands still.