Vladimir_Nesov comments on Imagine a world where minds run on physics - Less Wrong

12 Post author: cousin_it 31 October 2010 07:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (29)

You are viewing a single comment's thread.

Comment author: Vladimir_Nesov 01 November 2010 10:13:05AM 1 point [-]

And they all share a curious pattern. Even though the computer can destroy itself without complaint, and even salvage itself for spare parts if matter is scarce, it never seems to exhibit any instability of values.

Only by virtue of action being defined to be the result of non-disturbed calculation, which basically means that brain surgery is prohibited by the problem statement, or otherwise the agent is mistaken about its own nature (i.e. agent's decision won't make true the statement that the agent thinks it'd make true). Stability of values is more relevant when you consider replacing algorithms, evaluating expected actions of a different agent.

Comment author: cousin_it 01 November 2010 10:26:32AM *  1 point [-]

Well, obviously there has to be some amount of non-disturbed calculation at the start - the AI hardly has much chance if you nuke it while its Python interpreter is still loading up. But the first (and only) action that the AI returns may well result in the construction of another AI that's better shielded from uncertainty about the universe and shares the same utility function. (For example, our initial AI has no concept of "time" - it outputs a result that's optimized to work with the starting configuration of the universe as far as it knows - but that second-gen AI will presumably understand time, and other things besides.) I think that's what will actually happen if you run a machine like the one I described in our world.

ETA. So here's how you get goal stability: you build a piece of software that can find optima of utility functions, feed it your preferred utility function and your prior about the current state of the universe along with a quined description of the machine itself, give it just-barely-powerful-enough actuators, and make it output one single action. Wham, you're done.

Comment author: Vladimir_Nesov 01 November 2010 10:43:31AM *  3 points [-]

More generally, you can think of AI as a one-decision construction, that never observes anything and just outputs a program that is to be run next. It's up to AI to design a good next program, while you, as AI's designer, only need to make sure that AI constructs a highly optimized next program, while running on protected hardware. This way, knowledge of physics or physical protection of AI's hardware or self-modification is not your problem, it's AI's.

The problem with this plan is that your AI needs to be able not just to construct an optimized next program, it needs to be able to construct a next program that is optimized enough, and it is you that must make sure that it's possible. If you know that your AI is strong enough, then you're done, but you generally don't, and if your AI constructs a slightly suboptimal successor, and that successor does something a little bit stupid as well, so on it goes and by the trillionths step the world dies (if not just the AI).

Which is why it's a good idea to not just say that AI is to do something optimized, but to have a more detailed idea about what exactly it could do, so that you can make sure it's headed in the right direction without deviating from the goal. This is the problem of stable goal systems.

Your CA setup does nothing of the sort, and so makes no guarantees. The program is vulnerable not just while it's loading.

Comment author: cousin_it 01 November 2010 10:50:04AM *  1 point [-]

All very good points. I completely agree. But I don't yet know how to approach the harder problem you state. If physics is known perfectly and the initial AI uses a proof checker, we're done, because math stays true even after a trillion steps. But unknown physics could always turn out to be malicious in exactly the right way to screw up everything.

Comment author: Vladimir_Nesov 01 November 2010 11:25:11AM *  6 points [-]

If physics is known perfectly and the first generation uses a proof checker to create the second, we're done.

No, since you still run the risk of tiling the future with problem-solving machinery of no terminal value that never actually decides (and kills everyone in the process; it might even come to a good decision afterwards, but it'll be too late for some of us - the Friendly AI of Doom that visibly only cares about Friendliness staying provable and not people, because it's not yet ready to make a Friendly decision).

Also, FAI must already know physics perfectly (with uncertainty parametrized by observations). Problem of induction: observations are always interpreted according to a preexisting cognitive algorithm (more generally, logical theory). If AI doesn't have the same theory of environment as we do, it'll make different conclusions about the nature of the world than we'd do, given the same observations, and that's probably not for the best if it's to make optimal decisions according to what we consider real. Just as no moral arguments can persuade an AI to change its values, no observations can persuade an AI to change its idea of reality.

But unknown physics could always turn out to be malicious in exactly the right way to screw up everything.

Presence of uncertainty is rarely a valid argument about possibility of making an optimal decision. You just make the best decision you can find given uncertainty that you're dealt. Uncertainty is part of the problem anyway, and can as well be treated with precision.

Comment author: red75 13 December 2010 03:48:16PM *  -2 points [-]

Also, interesting thing happens if by the whim of the creator computer is given a goal of tiling universe with most common still life in it and universe is possibly infinite. It can be expected, that computer will send slower than light "investigation front" for counting encountered still life. Meanwhile it will have more and more space to put into prediction of possible treats for its mission. If it is sufficiently advanced, then it will notice possibility of existence of another agents, and that will naturally lead it to simulating possible interactions with non-still life, and to the idea that it can be deceived into believing that its "investigation front" reached borders of universe. Etc...

Too smart to optimize.

Comment author: red75 10 December 2011 11:26:22AM *  1 point [-]

One year and one level-up (thanks to ai-class.com) after this comment I'm still in the dark about the cause of downvoting the above comment.

I'm sorry for whining, but my curiosity took me over. Any comments?

Comment author: homunq 10 December 2011 03:35:58PM 2 points [-]

It wasn't me, but I suspect the poor grammar didn't help. It makes it hard to understand what you were getting at.

Comment author: red75 10 December 2011 06:35:57PM 0 points [-]

Thank you. It is something I can use for improvement.

Can you point at the flaws? I can see that the structure of sentences is overcomplicated, but I don't know how it feels to native English speakers. Foreigner? Dork? Grammar Illiterate? I appreciate any feedback. Thanks.

Comment author: homunq 11 December 2011 02:10:43PM 0 points [-]

Actually, a bit of all three. The one you can control the most is probably "dork", which unpacks as "someone with complex ideas who is too impatient/show-offy to explain their idiosyncratic jargon".

I'm a native English speaker, and I know that I still frequently sound "dorky" in that sense when I try to be too succinct.

Comment author: TimS 10 December 2011 07:29:53PM *  0 points [-]

Also, interesting thing happens if by the whim of the creator computer is given a goal of tiling universe with most common still life in it and universe is possibly infinite.

Respectfully, I don't know what this sentence means. In particular, I don't know what "most common still life" meant. That made it difficult to decipher the rest of the comment.

ETA: Thanks to the comment below, I understand a little better, but now I'm not sure what motivates invoking the possibility of other agents, given that the discussion was about proving Friendliness.