hairyfigment comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (110)
While I do think that real AIs won't make decisions in this fashion, that aside, as I had understood Stuart's article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which "the AI" was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.
The "But also..." part is the bit I actually object to.
I'm puzzled. Are you sure that's your main objection? Because,
you make a different objection (I think) in your response to the sibling, and
it seems to me that since any simulation of this kind will be incomplete, and I assume the AI will seek the most efficient way to achieve its programmed goals, the scenario you describe is in fact horribly dangerous; the AI has an incentive to deceive us. (And somewhat like Wei Dai, I thought we were really talking about an AI goal system that talks about extrapolating human responses to various futures.)
It would be completely unfair of me to focus on the line, "as thorough as a film might be today". But since it's funny, I give you Cracked.com on Independence Day.
To be honest, I was assuming we're not talking about a "contained" UFAI, since that's, you know, trivially unsafe.