shminux comments on Domesticating reduced impact AIs - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (104)
Even invalidating a proof doesn't automatically mean the outcome is the opposite of the proof. The key question is whether there's a cognitive search process actively looking for a way to exploit the flaws in a cage. An FAI isn't looking for ways to stop being Friendly, quite the opposite. More to the point, it's not actively looking for a way to make its servers or any other accessed machinery disobey the previously modeled laws of physics in a way that modifies its preferences despite the proof system. Any time you have a system which sets that up as an instrumental goal you must've done the Wrong Thing from an FAI perspective. In other words, there's no super-clever being doing a cognitive search for a way to force an invalidating behavior - that's the key difference.
Hmm, I did not mean "actively looking". I imagined something along the lines of being unable to tell whether something that is a good thing (say, in a CEV sense) in a model universe is good or bad in the actual universe. Presumably if you weren't expecting this to be an issue, you would not be spending your time on non-standard numbers and other esoteric mathematical models not usually observed in the wild. Again, I must be missing something in my presumptions.
The model theory is just for understanding logic in general and things like Lob's theorem, and possibly being able to reason about universes using second-order logic. What you're talking about is the ontological shift problem which is a separate set of issues.