shminux comments on Domesticating reduced impact AIs - All

9 Post author: Stuart_Armstrong 14 February 2013 04:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread.

Comment author: shminux 14 February 2013 05:43:23PM *  0 points [-]

Thinking of possible ways to munchkin this setup... One example: create a copy of the universe, and turn it into paperclips, while leaving the original intact (R was originally specified over only). Provided, of course, that building more than a single paperclip increases U. This seems to be implied by "The disciple builds a paperclip or two and nothing much else", though is in contradiction with "utility 1 if it builds at least one".

Comment author: Stuart_Armstrong 14 February 2013 05:52:22PM *  1 point [-]

Provided, of course, that building more than a single paperclip increases U. This seems to be implied by "The disciple builds a paperclip or two and nothing much else", though is in contradiction with "utility 1 if it builds at least one".

Building two paperclips increases the probability that you've built one (and haven't just imagined you built one).

Comment author: Stuart_Armstrong 14 February 2013 05:45:08PM *  1 point [-]

Thinking about possible ways to munchkin this setup...

:-D That's exactly what I want.

One example: create a copy of the universe...

Don't get that - can you develop the idea? "create a copy of the universe" seems... non-trivial.

Comment author: shminux 14 February 2013 05:59:14PM *  1 point [-]

Non-trivial indeed, but why would it stop a capable AI determined to maximize its utility?

Further notes:

F contains everything that can be reached from O, along a geodesic with proper-time more than two hours.

Geodesic is probably too weak, as it implies ballistic motion only. A timelike or null future-pointing path is somewhat safer (one should be allowed to use engines). Still, anything outside O's original lightcone, or within a two-hour proper-time window appears to be a fair game for paperclipping.

Moreover, any timelike curve can be mimicked by a near-zero-proper-time curve. Does this mean the AI can extend the window out into the future indefinitely?

Comment author: Stuart_Armstrong 14 February 2013 06:13:55PM *  0 points [-]

Do you mean the actual AI making an actual physical copy of the actual universe?

Added a clause that the AI must obey the laws of physics; it was implicit, but now it's explicit.

Still, anything outside O's original lightcone, or within a two-hour proper-time window appears to be a fair game for paperclipping.

Yes. But the AI can only reach that if it breaks the laws of physics, and if it can do that, we likely have time travel so our efforts are completely for naught.

Moreover, any timelike curve can be mimicked by a near-zero-proper-time curve. Does this mean the AI can extend the window out into the future indefinitely?

No. I defined F as anything that can be reached by a timeline geodesic of length two hours (though you're right that it needn't be a geodesic). Just because the point can be reached by a path of zero length, doesn't mean that it's excluded from F.

Comment author: shminux 14 February 2013 06:25:44PM 3 points [-]

But the AI can only reach that if it breaks the laws of physics

... As we know them now. Even then, not quite. The AI might just build a version of the Alcubierre drive, or a wormhole, or... In general, it would try to exploit any potential discrepancy between the domains of U and R.

Comment author: Stuart_Armstrong 14 February 2013 06:29:38PM 0 points [-]

Ok, I concede that if the AI can break physics as we understand it, the approach doesn't work. A valid point, but a general one for all AI (if the AI can break our definitions, then even a friendly AI isn't safe, even if the definitions in it seem perfect).

Any other flaws in the model?

Comment author: Larks 16 February 2013 09:12:02AM 5 points [-]

There's a big difference between UFAI because it turned out tha peano arithmetic was inconsistant, which no-one think possible, and UFAI because our current model of physics was wrong/the true model was given negligible probability, which seems very likely.

Comment author: Stuart_Armstrong 16 February 2013 09:25:45AM 1 point [-]

Yes.

This is related to ontology crises - how does the AI generalise old concepts across new models of physics?

But it may be a problem for most FAI designs, as well.

Comment author: Eliezer_Yudkowsky 17 February 2013 02:29:21AM 3 points [-]

Um, I wouldn't hurt people if I discovered I could violate the laws of physics. Why should a Friendly AI?

Comment author: Stuart_Armstrong 18 February 2013 01:10:01PM 2 points [-]

Here's my intuition: Eliezer and other friendly humans have got their values partially through evolution and selection. Genetic algorithms tend to be very robust - even robust to the problem not being properly specified. So I'd assume that Eliezer and evolved FAIs would preserve their friendliness if the laws of physics were changed.

An AI with a designed utility function is very different, however. These are very vulnerable to ontology crises, as they're grounded in formal descriptions - and if the premises of the description change, their whole values change.

Now, presumably we can do better than that, and design a FAI to be robust across ontology changes - maybe mix in some evolution, or maybe some cunning mathematics. If this is possible, however, I would expect the same approach to succeed with a reduced impact AI.

Comment author: ialdabaoth 17 February 2013 02:34:53AM 1 point [-]

Why shouldn't it? To rephrase, why do you intuitively generalize your own utility function to that of a FAI?

Comment author: shminux 17 February 2013 03:20:36AM *  0 points [-]

Presumably all the math you are working on is required for your proof of friendliness? And if the assumptions behind the math do not match the physics, wouldn't it invalidate the proof, or at least its relevance to the world we live in? And then all bets are off?