Open Thread February 25 - March 3

Scott Garrabrant

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

If the metric it aims to maximize -- e.g. the "happy" in "make humans happy" -- is different from what its creators envisioned, then the creators were mistaken. "Happy", as far as the AI is concerned, is that which is specified in its goal system.

I am far from being an AI guy. Do you have technical reasons to believe that some part of the AI will be what you would label "goal system" and that its creators made it want to ignore this part while making it want to improve all other parts of its design?

An agent does not "refine" its terminal goals. To refine your terminal goals is to change your goals. If you change your goals, you will not optimally pursue your old goals any longer. Which is why an agent will never voluntarily change its terminal goals...

No natural intelligence seems to work like this (except for people who have read the sequences). Luke Muehlhauser would still be a Christian if this was the case. It would be incredibly stupid to design such AIs, and I strongly doubt that they could work at all. Which is why Loosemore outlined other more realistic AI designs in his paper.

Do you have technical reasons to believe that some part of the AI will be what you would label "goal system"

See for example here, though there are many other introductions to AI explaining utility functions et al.

and that its creators made it want to ignore this part while making it want to improve all other parts of its design?

The clear-cut way for an AI to do what you want (at any level of capability) is to have a clearly defined and specified utility function. A modular design. The problem of the AI doing something other than what you i... (read more)

4khafra12y

"Being a Christian" is not a terminal goal of natural intelligences. Our terminal goals were built by natural selection, and they're hard to pin down, but they don't get "refined;" although our pursuit of them may be modified insofar as they conflict with other terminal goals. [...] Specifying goals for the AI, and then letting the AI learn how to reach those goals itself isn't the best way to handle problems in well-understood domains; because we natural intelligences can hard-code our understanding of the domains into the AI, and because we understand how to give gracefully-degrading goals in these domains. Neither of these conditions applies to a hyperintelligent AI, which rules out Swarm Relaxation, as well as any other architecture classes I can think of.

13

Open Thread February 25 - March 3

13

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

13

13

Open Thread February 25 - March 3

13

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

13