shminux comments on Domesticating reduced impact AIs - All
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (104)
Thinking of possible ways to munchkin this setup... One example: create a copy of the universe,
and turn it into paperclips, while leaving the original intact (R was originally specified over
only). Provided, of course, that building more than a single paperclip increases U. This seems to be implied by "The disciple builds a paperclip or two and nothing much else", though is in contradiction with "utility 1 if it builds at least one".
Building two paperclips increases the probability that you've built one (and haven't just imagined you built one).
:-D That's exactly what I want.
Don't get that - can you develop the idea? "create a copy of the universe" seems... non-trivial.
Non-trivial indeed, but why would it stop a capable AI determined to maximize its utility?
Further notes:
Geodesic is probably too weak, as it implies ballistic motion only. A timelike or null future-pointing path is somewhat safer (one should be allowed to use engines). Still, anything outside O's original lightcone, or within a two-hour proper-time window appears to be a fair game for paperclipping.
Moreover, any timelike curve can be mimicked by a near-zero-proper-time curve. Does this mean the AI can extend the window out into the future indefinitely?
Do you mean the actual AI making an actual physical copy of the actual universe?
Added a clause that the AI must obey the laws of physics; it was implicit, but now it's explicit.
Yes. But the AI can only reach that if it breaks the laws of physics, and if it can do that, we likely have time travel so our efforts are completely for naught.
No. I defined F as anything that can be reached by a timeline geodesic of length two hours (though you're right that it needn't be a geodesic). Just because the point can be reached by a path of zero length, doesn't mean that it's excluded from F.
... As we know them now. Even then, not quite. The AI might just build a version of the Alcubierre drive, or a wormhole, or... In general, it would try to exploit any potential discrepancy between the domains of U and R.
Ok, I concede that if the AI can break physics as we understand it, the approach doesn't work. A valid point, but a general one for all AI (if the AI can break our definitions, then even a friendly AI isn't safe, even if the definitions in it seem perfect).
Any other flaws in the model?
There's a big difference between UFAI because it turned out tha peano arithmetic was inconsistant, which no-one think possible, and UFAI because our current model of physics was wrong/the true model was given negligible probability, which seems very likely.
Yes.
This is related to ontology crises - how does the AI generalise old concepts across new models of physics?
But it may be a problem for most FAI designs, as well.
Um, I wouldn't hurt people if I discovered I could violate the laws of physics. Why should a Friendly AI?
Here's my intuition: Eliezer and other friendly humans have got their values partially through evolution and selection. Genetic algorithms tend to be very robust - even robust to the problem not being properly specified. So I'd assume that Eliezer and evolved FAIs would preserve their friendliness if the laws of physics were changed.
An AI with a designed utility function is very different, however. These are very vulnerable to ontology crises, as they're grounded in formal descriptions - and if the premises of the description change, their whole values change.
Now, presumably we can do better than that, and design a FAI to be robust across ontology changes - maybe mix in some evolution, or maybe some cunning mathematics. If this is possible, however, I would expect the same approach to succeed with a reduced impact AI.
Why shouldn't it? To rephrase, why do you intuitively generalize your own utility function to that of a FAI?
Presumably all the math you are working on is required for your proof of friendliness? And if the assumptions behind the math do not match the physics, wouldn't it invalidate the proof, or at least its relevance to the world we live in? And then all bets are off?