Can the problem described in the hidden complexity of wishes (at least partially) be considered solved? I believe that current LLMs are perfectly capable of answering a question like "How do I save grandma from a burning house?" without suggesting any unexpected maximums in an underspecified utility function.

However, I still think that the existence of LLMs capable of answering such questions is not sufficient proof of solving the problem (we do not have a robot capable of performing tasks of a similar level of difficulty to the 'saving grandma' task, with safety properties comparable to those that a human firefighter can provide when performing 'saving grandma' task). But can it at least be considered evidence supporting progress in that direction? This seems like a topic worth discussing, and I expect it has been discussed somewhere. If so, please link to it in answers.

New Answer
New Comment

2 Answers sorted by

JBlack

130

No, I do not believe that it has been solved for the context in which it was presented.

What we have is likely adequate for current AI capabilities, with problems like this for which solutions exist in the training data. Potential solutions far beyond the training data are currently not accessible to our AI systems.

The parable of wishes is intended to apply to superhuman AI systems that can easily access solutions radically outside such human context.

johnswentworth

81

Short answer: no.

Longer answer: we need to distinguish between two things people might have in mind when they say that LLMs "solve the hidden complexity of wishes problem".

First, one might imagine that LLMs "solve the hidden complexity of wishes problem" because they're able to answer natural-language questions about humans' wishes much the same way a human would. Alas, that's a misunderstanding of the problem. If the ability to answer natural-language questions about humans' wishes in human-like ways were all we needed in order to solve the "hidden complexity of wishes" problem, then a plain old human would be a solution to the problem; one could just ask the human. Part of the problem is that humans themselves understand their own wishes so poorly that their own natural-language responses to questions are not a safe optimization target either.

Second, one might imagine LLMs "solve the hidden complexity of wishes problem" because when we ask an LLM to solve a problem, it solves the problem in a human-like way. It's not about the LLM's knowledge of humans' (answers to questions about their) wishes, but rather about LLMs solving problems and optimizing in ways which mimic human problem-solving and optimization. And that does handle the hidden complexity problem... but only insofar as we continue to use LLMs in exactly the same way. If we start e.g. scaling up o1-style methods, or doing HCH, or put the LLM in some other scaffolding so we're not directly asking it to solve a problem and then using the human-like solutions it generates... then we're (potentially) back to having a hidden complexity problem. For each of those different methods of using the LLM to solve problems, we have to separately consider whether the human-mimicry properties of the LLM generalize to that method enough to handle the hidden complexity issue.

(Toy example: suppose we use LLMs to mimic a very very large organization. Like most real-world organizations, information and constraints end up fairly siloed/modularized, so some parts of the system are optimizing for e.g. "put out the fire" and don't know that grandma's in the house at all. And then maybe that part of the system chooses a nice efficient fire-extinguishing approach which kills grandma, like e.g. collapsing the house and then smothering it.)

And crucially: if AI is ever to solve problems too hard for humans (which is one of its main value propositions), we're definitely going to need to do something with LLMs besides use them to solve problems in human-like ways.

2 comments, sorted by Click to highlight new comments since:
[-]nim2-1

we do not have a robot that is perfectly capable of executing the "saving grandma" task

Do you mean to imply that humans are perfectly capable of executing the "saving grandma" task?

Opening a door in a burning building at the wrong time can cause the entire building to explode by introducing enough oxygen to suddenly combust a lot of uncombusted gases.

I'm not convinced that there exists a "perfect solution" to any task with 0 unintended consequences, though, so my opinions probably aren't all that helpful in the matter.

I meant to imply that we do not have a robot capable of performing tasks of a similar level of difficulty to the 'saving grandma' task, with safety properties comparable to those that a human firefighter can provide when performing 'saving grandma' task.

Thanks for pointing that out, I will adjust the post.