9eB1 comments on Indifferent vs false-friendly AIs - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (12)
Alternatively, suppose that there is a parliament of all IAIs vs. a parliament of all FFAIs. It could be that the false-friendly AIs, who are each protecting some aspect of humanity, end up doing an ok job. Not optimal, but the AI who wants humans to be safe and the AI that wants humans to have fun, and the AI that wants humans to spread across the galaxy all together could lead to something not awful. The IAI parliament on the other hand just leads to the humans getting turned into resources for whichever of them can most convenient make use of our matter.
Notice that adding IAIs to the FFAIs does nothing more (according to many ways of resolving disagreements) than reducing the share of resources humanity gets.
But counting on a parliament of FFAIs to be finely balanced to get FAI out of it, without solving FAI along the way... seems a tad optimistic. You're thinking of "this FFAI values human safety, this one values human freedom, they will compromise on safety AND freedom". I'm thinking they will compromise on some lobotomy-bunker version of safety while running some tiny part of the brains to make certain repeated choices that technically count as "freedom" according to the freedom-FFAI's utility.
I'm just brainstorming in the same vein as these posts, of course, so consider the epistemic status of these comments to be extremely uncertain. But, in the limit, if you have a large number of AIs (thousands, or millions, or billions) who each optimize for some aspect that humans care about, maybe the outcome wouldn't be terrible, although perhaps not as good as one truly friendly AI. The continuity of experience AI could compromise with the safety AI and freedom AI and "I'm a whole brain experiencing things" AI and the "no tricksies" AI to make something not terrible.
Of course, people don't care about so many aspects with equal weights, so if they all got equal weight, maybe the most likely failure mode is that something people only care about a tiny amount (e.g. not stepping on cracks in the sidewalk) gets equal weight with something people care about a lot (e.g. experiencing genuine love for another human) and everything gets pretty crappy. On the other hand, maybe there are many things that can be simultaneously satisfied, so you end up living in a world with no sidewalk-cracks and where you are immediately matched with plausible loves of your life, and while it may not be optimal, it may still be better than what we've got going on now.
I'll think about it. I don't think it will work, but there might be an insight there we can use.