I mainly talk about a system alignment in depth being aligned to one person. This is because I find collective alignment a lot more problematic. Problems include things like founder effects, edit wars and monopolies.

Collective alignment is in some ways easier from the “agent alignment” philosophy, you just have to find a goal that represents a fair aggregate of everyone’s many preferences. From the program alignment point of view you have more problems, because there are many ways that alignment changes can be caused and so many ways for things to go wrong or be problematic. One of the most likely forms of alignment that might go wrong in the collective scenario is the “Instruct” phase. The instruct phase is when a human sees the system is doing something wrong and instructs it to change it’s programming, to change things up to and including its morals and/or goals.

Let us construct a scenario where there is a moral disagreement between different members of the human population. For the sake of not too much argument, let us say that, “Wearing white after labour day” is not just a fashion faux pas but a moral one.

So Anti-white-after-labour-day people may seek to modify the AIs behaviour to do the following.

  • Prevent outfit suggestions that included white being made to themselves after labour day
  • Prevent outfit suggestions that included white being made to others after labour day (lest people get tempted into sin)
  • Cease to ship all white clothing/accessories after labour day.
  • Seek to prevent Pro-white-after-labour-day people being able to instruct the AI to modify the above rules.

If we have a scenario like this, where the Anti-white group was dominant to start with, or just had earlier access to the collective AGI, then you would expect founder effects as the AGI would also have an impact on the culture of the people interacting with it. So the initial group who can influence the system have an out-sized impact on the behaviour of the collective system and also of the culture that is interacting with it.

If there is a cultural injunction against modifications to stop instruction from other people, you might get into the equivalent of edit wars. Where instruction from each group continually overrides the programming given by the other grpup, on whether the wearing of white is acceptable after labour day.

This is not ideal! Rapidly changing moral mores could make the world very hard to navigate for people. We may want a slower process  more akin to how laws are currently made. And we probably want people augmented with AGI so that they can have some hope of getting their preferences integrated with the laws.

The last potential problem is monopolies. Literal ones. Each program can be seen as occupying an ecological niche in the larger system. A niche that is monopolised by a good enough program will be unlikely to have new programs introduced into it and do well. In a mature system new programs will  be unlikely to be able to compete against the experienced programs with established relationships to other programs.

However if new AGIs are being made (and are associated with new people, to avoid stale thought patterns), then programs that represent entirely novel solutions to problems can gain experience and evolve in their own ways. As there is benefit to sharing programs around, the new programs might be able to spread to the older systems, once they have established themselves in a niche and proven their worth.

Conclusion

Different philosophies of alignment imply different difficulties for collective AGIs. It is worth examining the nuts and bolts of your preferred philosophy of alignment so that you can see the practicalities of collective AGIs in that philosophy. They may suggest further work necessary, above and beyond, work on individually aligned AGIs or a preference for individually focused AGIs and seeking to solve co-ordination problems through non-AGI means.

For program alignment, Collective AIs might spawn large scale value conflicts as different human values seek to become dominant inside the collective as the mechanics of program alignment is path dependent and somewhat zero sum.  Giving individuals AIs or having small collectives may alleviate this problem some what, but require other co-ordination mechanisms as mentioned above.

Collective AIs are not a magical panacea. We should think more at this time, about what we are likely to want so we can prepare properly. We should not settle too quickly on any particular solution.

New Comment
2 comments, sorted by Click to highlight new comments since:

One possibility I like for addressing issues of collective alignment is to consider ways to establish equilibria where we can see gains from trade through compromise.

He seems pretty down on democracy.

"Our social intuitions about fairness and democracy posit that everyone deserves an equal say in the final outcome. Unfortunately for these intuitions, compromise bargains are necessarily weighted by power -- "might makes right.""

It seems that people could decide to make a meta-compromise and not try to get things weighted via power, understanding that power is often distributed via luck and that they or their beliefs might be in the position of not having power in the future. They could decide to weight via people and only compromise with people that weight via people, to improve the long term likelihood that some of their values persist over time.

This seems to be what democracy does for you or is supposed to do for you, when it works.

Even if democracy is not the best possible system for compromise, it seems important to work within it to maintain the norm of "respecting the current compromise system". Normalising the subvertion or ignoring of our current compromise system seems like a very bad norm to create for a peaceful or pleasant future.

Also I don't think power is a scalar, or a hard thing. It is a convenient fiction we all chose to believe in, that can go up in smoke in an instant.