You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

pinyaka comments on [LINK] Wait But Why - The AI Revolution Part 2 - Less Wrong Discussion

17 Post author: adamzerner 04 February 2015 04:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (87)

You are viewing a single comment's thread. Show more comments above.

Comment author: pinyaka 08 February 2015 11:49:57PM 0 points [-]

I don't think they're necessarily safe. My original puzzlement was more that I don't understand why we keep holding the AI's value system constant when moving from pre-foom to post-foom. It seemed like something was being glossed over when a stupid machine goes from making paperclips to a being a god that makes paperclips. Why would a god just continue to make paperclips? If it's super intelligent, why wouldn't it figure out why it's making paperclips and extrapolate from that? I didn't have the language to ask "what's keeping the value system stable through that transition?" when I made my original comment.

Comment author: Houshalter 09 February 2015 02:22:05AM 0 points [-]

It depends on the AI architecture. A reinforcement learner always has the goal of maximizing it's reward signal. It never really had a different goal, there was just something in the way (e.g. a paperclip sensor.)

But there is no theoretical reason you can't have an AI that values universe-states themselves. That actually wants the universe to contain more paperclips, not merely to see lots of paperclips.

And if it did have such a goal, why would it change it? Modifying it's code to make it not want paperclips, would hurt it's goal. It would only ever do things that help it achieve it's goal. E.g. making itself smarter. So eventually you end up with a superintelligent AI, that is still stuck with the narrow stupid goal of paperclips.

Comment author: pinyaka 10 February 2015 01:32:58PM 0 points [-]

But there is no theoretical reason you can't have an AI that values universe-states themselves.

How would that work? How do you have a learner that doesn't have something equivalent to a reinforcement mechanism? At the very least it seems like there has to be some part of the AI that compares the universe-state to the desired-state and that the real goal is actually to maximize the similarity of those states which means modifying the goal would be easier than modifying reality.

And if it did have such a goal, why would it change it?

Agreed. I am trying to get someone to explain how such a goal would work.

Comment author: Houshalter 10 February 2015 03:33:14PM 1 point [-]

How would that work?

Well that's the quadrillion dollar question. I have no idea how to solve it.

It's certainly not impossible as humans seem to work this way. We can also do it in toy examples. E.g. a simple AI which has an internal universe it tries to optimize, and it's sensors merely update the state it is in. Instead of trying to predict the reward, it tries to predict the actual universe state and selects the ones that are desirable.

Comment author: pinyaka 10 February 2015 06:39:48PM 0 points [-]

How would that [valuing universe-states themselves] work? Well that's the quadrillion dollar question. I have no idea how to solve it.

Yeah, I think this whole thread may be kind of grinding to this conclusion.

It's certainly not impossible as humans seem to work this way

Seem to perhaps, but I don't think that's actually the case. I think (as mentioned above) that we value reward signals terminally (but are mostly unaware of this preference) and nothing else. There's another guy in this thread who thinks we might not have any terminal values.

I'm not sure that I understand your toy AI. What do you mean that it has "an internal universe it tries to optimize?" Do the sensors sense the state of the internal universe? Would "internal state" work as a synonym for "internal universe" or is this internal universe a representation of an external universe? Is this AI essentially trying to develop an internal model of the external universe and selecting among possible models to try and get the most accurate representation?

Comment author: Houshalter 10 February 2015 07:42:30PM 1 point [-]

I don't think that humans are pure reinforcement learners. We have all sorts of complicated values that aren't just eating and mating.

The toy AI has an internal model of the universe. In the extreme, a complete simulation of every atom and every object. It's sensors update the model, helping it get more accurate predictions/more certainty about the universe state.

Instead of a utility function that just measures some external reward signal, it has an internal utility function which somehow measures the universe model and calculates utility from it. E.g. a function which counts the number of atoms arranged in paperclip shaped objects in the simulation.

It then chooses actions that lead to the best universe states. Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn't lead to real paperclips.

Obviously a real universe model would be highly compressed. It would have a high level representation for paperclips rather than an atom by atom simulation.

I suspect this is how humans work. We can value external objects and universe states. People care about things that have no effect on them.

Comment author: pinyaka 10 February 2015 10:19:54PM 1 point [-]

I don't think that humans are pure reinforcement learners. We have all sorts of complicated values that aren't just eating and mating.

We may not be pure reinforcement learners, but the presence of values other than eating and mating isn't a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that's creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don't have to be free of other side effects (until given long enough with a finite resource pool maybe). Pure, rational reward seeking is almost certainly selected against because it doesn't leave any room for replication. It seems more likely that we have a reward system that is accompanied by some circuits that make it fire for a few specific sensory cues (orgasms, insulin spikes, receiving social deference, etc.).

The toy AI has an internal model of the universe, it has an internal utility function which somehow measures the universe model and calculates utility from it....[toy AI is actually paperclip optimizer]...Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn't lead to real paperclips.

I think we've been here before ;-)

Thanks for trying to help me understand this. Gram_Stone linked a paper that explains why the class of problems that I'm describing aren't really problems.

Comment author: Houshalter 12 February 2015 03:39:17PM 0 points [-]

But that's the thing. There is no sensory input for "social deference". It has to be inferred from an internal model of the world itself inferred from sensory data.

Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can't use it for social instincts or morality, or anything you can't just build a simple sensor to detect.

Comment author: pinyaka 13 February 2015 03:07:30PM 0 points [-]

But that's the thing. There is no sensory input for "social deference". It has to be inferred from an internal model of the world itself inferred from sensory data...Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can't use it for social instincts or morality, or anything you can't just build a simple sensor to detect.

Why does it only work on simple signals? Why can't the result of inference work for reinforcement learning?