I'm just brainstorming in the same vein as these posts, of course, so consider the epistemic status of these comments to be extremely uncertain. But, in the limit, if you have a large number of AIs (thousands, or millions, or billions) who each optimize for some aspect that humans care about, maybe the outcome wouldn't be terrible, although perhaps not as good as one truly friendly AI. The continuity of experience AI could compromise with the safety AI and freedom AI and "I'm a whole brain experiencing things" AI and the "no tricksies" AI to make something not terrible.
Of course, people don't care about so many aspects with equal weights, so if they all got equal weight, maybe the most likely failure mode is that something people only care about a tiny amount (e.g. not stepping on cracks in the sidewalk) gets equal weight with something people care about a lot (e.g. experiencing genuine love for another human) and everything gets pretty crappy. On the other hand, maybe there are many things that can be simultaneously satisfied, so you end up living in a world with no sidewalk-cracks and where you are immediately matched with plausible loves of your life, and while it may not be optimal, it may still be better than what we've got going on now.
I'll think about it. I don't think it will work, but there might be an insight there we can use.
A putative new idea for AI control; index here.
For anyone but an extreme total utilitarian, there is a great difference between AIs that would eliminate everyone as a side effect of focusing on their own goals (indifferent AIs) and AIs that would effectively eliminate everyone through a bad instantiation of human-friendly values (false-friendly AIs). Examples of indifferent AIs are things like paperclip maximisers, examples of false-friendly AIs are "keep humans safe" AIs who entomb everyone in bunkers, lobotomised and on medical drips.
The difference is apparent when you consider multiple AIs and negotiations between them. Imagine you have a large class of AIs, and that they are all indifferent (IAIs), except for one (which you can't identify) which is friendly (FAI). And you now let them negotiate a compromise between themselves. Then, for many possible compromises, we will end up with most of the universe getting optimised for whatever goals the AIs set themselves, while a small portion (maybe just a single galaxy's resources) would get dedicated to making human lives incredibly happy and meaningful.
But if there is a false-friendly AI (FFAI) in the mix, things can go very wrong. That is because those happy and meaningful lives are a net negative to the FFAI. These humans are running dangers - possibly physical, possibly psychological - that lobotomisation and bunkers (or their digital equivalents) could protect against. Unlike the IAIs, which would only complain about the loss of resources to the FAI, the FFAI finds the FAI's actions positively harmful (and possibly vice versa), making compromises much harder to reach.
And the compromises reached might be bad ones. For instance, what if the FAI and FFAI agree on "half-lobotomised humans" or something like that? You might ask why the FAI would agree to that, but there's a great difference to an AI that would be friendly on its own, and one that would choose only friendly compromises with a powerful other AI with human-relevant preferences.
Some designs of FFAIs might not lead to these bad outcomes - just like IAIs, they might be content to rule over a galaxy of lobotomised humans, while the FAI has its own galaxy off on its own, where its humans take all these dangers. But generally, FFAIs would not come about by someone designing a FFAI, let alone someone designing a FFAI that can safely trade with a FAI. Instead, they would be designing a FAI, and failing. And the closer that design got to being FAI, the more dangerous the failure could potentially be.
So, when designing an FAI, make sure to get it right. And, though you absolutely positively need to get it absolutely right, make sure that if you do fail, the failure results in a FFAI that can safely be compromised with, if someone else gets out a true FAI in time.