Open Thread February 25 - March 3

Scott Garrabrant

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Loosemore does not disagree with the orthogonality thesis. Loosemore's argument is basically that we should expect beliefs and goals to both be amenable to self-improvement and that turning the universe into smiley faces when told to make humans happy would be a model of the world failure and that an AI that makes such failures will not be able to take over the world.

There are arguments why you can't hard-code complex goals, so you need an AI that natively updates goals in a model-dependent way. Which means that an AI designed to kill humanity will do so and not turn into a pacifist due to an ambiguity in its goal description. An AI that does mistake "kill all humans" with "make humans happy" would do similar mistakes when trying to make humans happy and would therefore not succeed at doing so. This is because the same mechanisms it uses to improve its intelligence and capabilities are used to refine its goals. Thus if it fails on refining its goals it will fail on self-improvement in general.

I hope you can now see how wrong your description of what Loosemore claims is.

The AI is given goals X. The human creators thought they'd given the AI goals Y (when in fact they've given the AI goals X).

Whose error is it, exactly? Who's mistaken?

Look at it from the AI's perspective: It has goals X. Not goals Y. It optimizes for goals X. Why? Because those are its goals. Will it pursue goals Y? No. Why? Because those are not its goals. It has no interest in pursuing other goals, those are not its own goals. It has goals X.

If the metric it aims to maximize -- e.g. the "happy" in "make humans happy" -- is different f... (read more)

13

Open Thread February 25 - March 3

13

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

13

13

Open Thread February 25 - March 3

13

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

13