army1987 comments on Rationality Quotes August 2012 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (426)
It's not transparently obvious to me why this would be "ridiculous", care to enlighten me? Building an AI at all seems ridiculous to many people, but that's because they don't actually think about the issue because they've never encountered it before. It really seems far more ridiculous to me that we shouldn't even try to read the AIs mind, when there's so much at stake.
AIs aren't Gods, with time and care and lots of preparation reading their thoughts should be doable. If you disagree with that statement, please explain why? Rushing things here seems like the most awful idea possible, I really think it would be worth the resource investment.
Humans reading computer code aren't gods either. How long until an uFAI would get caught if it did stuff like this?
It would be very hard, yes. I never tried to deny that. But I don't think it's hard enough to justify not trying to catch it.
Also, you're only viewing the "output" of the AI, essentially, with that example. If you could model the cognitive processes of the authors of secretly malicious code, then it would be much more obvious that some of their (instrumental) goals didn't correspond to the ones that you wanted them to be achieving. The only way an AI could deceive us would be to deceive itself, and I'm not confident that an AI could do that.
That's not the same as “I'm confident that an AI couldn't do that”, is it?
At the time, it wasn't the same.
Since then, I've thought more, and gained a lot of confidence on this issue. Firstly, any decision made by the AI to deceive us about its thought processes would logically precede anything that would actually deceive us, so we don't have to deal with the AI hiding its previous decision to be devious. Secondly, if the AI is divvying its own brain up into certain sections, some of which are filled with false beliefs and some which are filled with true ones, it seems like the AI would render itself impotent on a level proportionate to the extent that it filled itself with false beliefs. Thirdly, I don't think a mechanism which allowed for total self deception would even be compatible with rationality.
Even if the AI can modify its code, it can't really do anything that wasn't entailed by its original programming.
(Ok, it could have a security vulnerability that allowed the execution of externally-injected malicious code, but that is a general issue of all computer systems with an external digital connection)
The hard part is predicting everything that was entailed by its initial programing and making sure it's all safe.
That's right, history of engineering tells us that "provably safe" and "provably secure" systems fail in unanticipated ways.