Kawoomba comments on Welcome to Less Wrong! (July 2012) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (843)
If that were the case (and it may very well be), there goes provably friendly AI, for to guarantee a property under all circumstances, it must be upheld from the bottom layer upwards.
I think it's possible that any leaky abstraction used in designing FAI might doom the enterprise. But if that's not true, we can use this "qualia translation function" to make a leaky abstractions in a FAI context a tiny bit safer(?).
E.g., if we're designing an AGI with a reward signal, my intuition is we should either (1) align our reward signal with actual pleasurable qualia (so if our abstractions leak it matters less, since the AGI is drawn to maximize what we want it to maximize anyway); (2) implement the AGI in an architecture/substrate which produces as little emotional qualia as possible, so there's little incentive for behavior to drift.
My thoughts here are terribly laden with assumptions and could be complete crap. Just thinking out loud.