TheAncientGeek comments on No Universally Compelling Arguments in Math or Science - Less Wrong

30 Post author: ChrisHallquist 05 November 2013 03:32AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (227)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheAncientGeek 11 November 2013 06:02:23PM 0 points [-]

For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.

I don't think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.

Comment author: [deleted] 12 November 2013 10:19:15AM *  0 points [-]

From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn't result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.

From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I've not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.

It's a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.

Comment author: TheAncientGeek 12 November 2013 01:42:22PM 2 points [-]

From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn't result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.

You seem rather sure of that. That isn't a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.

It's a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.

Maybe figuring out how it is done would be easier than solving morality mathematically. It's an alternative, anyway.

Comment author: [deleted] 12 November 2013 07:37:47PM 1 point [-]

We have reason to believe current AIXI-type models will wirehead if given the opportunity.

Maybe figuring out how it is done would be easier than solving morality mathematically. It's an alternative, anyway.

I would agree with this if and only if we can also figure out a way to hardwire in constraints like, "Don't do anything a human would consider harmful to themselves or humanity." But at that point we're already talking about animal-like Robot Worker AIs rather than Software Superoptimizers (the AIXI/Goedel Machine/LessWrong model of AGI, whose mathematics we understand better).

Comment author: TheAncientGeek 12 November 2013 08:18:30PM *  2 points [-]

I know wire heading is a known failure mode. I meant we don't see many evil genius wire headers. If you can delay gratification well enough to acquire the skills to be a world dominator, you are not exactly a wire header at all.

Are you aiming for a 100% solution, or just reasonable safety?

Comment author: [deleted] 12 November 2013 08:31:13PM 1 point [-]

Sorry, I had meant an AI agent would both wirehead and world-dominate. It would calculate the minimum amount of resources to devote to world domination, enact that policy, and then use the rest of its resources to wirehead.

Comment author: TheAncientGeek 12 November 2013 08:44:29PM *  1 point [-]

Has that been proven? Why wouldn't it want to get to the bliss of wire head heaven as soon as possible? How does it motivate itself in the meantime? Why would a wire header also be a gratification delayed? Why makeelaborate plans for a future self, when it could just rewrite itself to be a happ in the the the present ?

Comment author: nshepperd 12 November 2013 09:38:45PM *  2 points [-]

Well-designed AIs don't run on gratification, they run on planning. While it is theoretically possible to write an optimizer-type AI that cares only about the immediate reward in the next moment, and is completely neutral about human researchers shutting it down afterward, it's not exactly trivial.

If I recall correctly, AIXI itself tries to optimize the total integrated reward from t = 0 to infinity, but it should be straightforward to introduce a cutoff after which point it doesn't care.

But even with a planning horizon like that you have the problem that the AI wants to guarantee that it gets the maximum amount of reward. This means stopping the researchers in the lab from turning it off before its horizon runs out. As you reduce the length of the horizon (treating it as a parameter of the program), the AI has less time to think, in effect, and creates less and less elaborate defenses for its future self, until you set it to zero, at which point the AI won't do anything at all (or act completely randomly, more likely).

This isn't much of a solution though, because an AI with a really short planning horizon isn't very useful in practice, and is still pretty dangerous if someone trying to use one thinks "this AI isn't very effective, what if I let it plan further ahead" and increases the cutoff to a really huge value and the AI takes over the world again. There might be other solutions, but most of them would share that last caveat.

Comment author: [deleted] 12 November 2013 09:47:54PM 1 point [-]

My advice would be to read the relevant papers.

http://www.idsia.ch/~ring/AGI-2011/Paper-B.pdf