A fundamental problem in AI alignment, as well as in many social sciences is the problem of preference aggregation. Given a number of different actors who have specific preferences, what is a consistent way of making decisions that ensures that the outcome is fair and ideally that all of the agents are about as happy as they can be [1]. In AI alignment, this is essentially the outer alignment problem which, in layman’s terms is: assuming we have an AI superintelligence which makes all decisions according to a utility function, what should that utility function be?
Ideally, the utility function pursued by the aligned superintelligence, or by the democratic state in political science, in some sense closely and fairly represents the utility functions, interests, and values of the constituent stakeholders: in the case of a (direct) democratic state this is all citizens with voting rights; for an aligned superintelligence ideally this would be some form of humanity in general.
For the case of democracies we have a long history of political science which has coalesced around various forms of voting. Each stakeholder gets some number of votes and can either vote for their chosen policy or candidate or rank them. Then the highest ranking candidate or policy gets selected for implementation. Voting often works reasonably in practice and generates decisions that are usually seen as fair even if in practice direct voting is often significantly watered down, usually to prevent tyrannies of the majority by mechanisms such as constitutions which define democratically-inalienable rights, independent judiciaries which can strike down democratic laws, and most importantly representative democracy where voters do not vote directly on policies but instead for representatives who in theory should vote for their electorate’s desired policies but in practice often do other things. For more theoretical questions of alignment most discussion centers around the vaguely defined notion of coherent extrapolated volition (CEV).
The fundamental question is how do we somehow combine the preferences of the constituents into a coherent utility function or make an action which somehow respects their preferences?
My answer is that we already have an extremely well-developed mathematical machinery for dealing with questions which, I claim, are essentially isomorphic to that of preference aggregation: Bayesian Inference and Bayesian Decision Theory.
To understand this isomorphism, we have to think about things the other way around. Instead of thinking of constituent’s preferences being fundamental, let’s assume instead that there is indeed some “optimal” preference distribution over outcomes/policies U∗. Our goal is to figure out what U∗ is and we can think of the voters or other preference expressions of our constituents as data which helps inform us about the optimal U∗. Let’s assume we have a set of ‘votes’ V=[V1…VN] where each vote is a normalized probability distribution over preferences over outcomes. The rest is standard Bayesian inference. Specifically, we can use Bayes-rule to obtain the distribution over "true preferences" as,
p(U∗∥V)∝p(V∥U∗)p(U∗)
We observe that for this to work in practice, we need two key quantities. First there is the noise/likelihood model p(V∥U∗) which specifies, given an optimal true preference, how we expect people to vote. First, to make things simpler let’s make the extremely common assumption that each vote is independent of the others given the optimal preference i.e. p(V∥U∗)=∏Nip(Vi∥U∗)[2]. For a super naive model, we could just assume that there is some fixed optimal U∗ and that the constituent’s votes are simply noisy reflections of this optimum – i.e. V=U∗+ϵwhere ϵ is some noise distribution. There are a number of noise models you could pick here but many simple cases such as Bernoulli or Gaussian noise result in what are essentially majority vote schemes – i.e. the optimal / MLE choice is the one that receives the most ‘votes’. I found this extremely interesting paper which derives a number of well-known voting schemes such as majority vote, single-transferable vote, and so on as maximum-likelihood Bayesian inference under different noise models.
However, it seems prudent to also go beyond simple noise models like this. A huge advantage of the Bayesian framework is it lets us come up with very complicated models and then mathematically deduce their consequences in a principled way. For instance, one very natural extension would be to suppose that there is not just one ‘optimal’ preference distribution, but instead a number of different value clusters. We could represent this by introducing an additional latent variable z which represents the identity of the value cluster that a given voter belongs to. Then we can apply Bayes theorem in the usual way:
We can continue to add different and more complex structures as we wish until we are happy that our noise model and hence our value posterior is good. Of course more complex models have more expensive and complex inference requirements but if we are serious about making good preference aggregation decisions, this does not seem to be that bad of a cost. Additionally, given the inevitable uncertainty about what the ‘true’ likelihood model is for voters, we can utilize the standard Bayesian model selection machinery to compare hypotheses and ultimately converge on a good model.
Secondly, the Bayesian perspective provides a prior over ‘optimal preferences’ p(U∗). This is very interesting since it gives us a principled way to adjust for various other desiderata we might want to add to our voting system in a mathematically transparent and principled way. For instance, we might want to be conservative in our inference and assume a fairly high entropy posterior distribution of votes. We might want to install basic preferences such as ‘respect for some set of rights’ into the prior by making it extremely unlikely that a valid preference distribution would run counter to these ‘rights’. We might want to encode various forms of symmetry or Rawlsian style ‘veils of ignorance’ into our preference prior[3].
Another useful aspect of this mathematical framework is that it provides a principled way to incorporate hard constraints and additional regularization into the problem. For instance, suppose we had a set of rights that we wanted to always be respected, then this could be encoded into a hard constraint on the solution of the posterior distribution, and we could optimize for the constrained solution via the method of Lagrange multipliers. Similarly, we could also want to explicitly regularize the solution, for instance to also maximize the entropy of our posterior distribution over preferences. Mathematically, this turns the problem into a variational maximization – finding the optimal probability distribution that minimizes the loss function. Suppose we define our approximate / constrained posterior q(U∗) then we define our posterior as the solution to the following variational optimization problem,
Where reg is some regularizing function and λ⋅constraint(⋅) is a lagrange multiplier embodying a hard constraint on the solution, and $D$ is some divergence measure such as the KL divergence.
Finally, once we have the preference posterior p(U∗∥V) the question remains how to translate this posterior into actually deciding on policies or actions. This is the realm of Bayesian decision theory, where we choose actions that maximize some loss function which depends on this preference posterior. There are a large number of options possible here but a few are fairly straightforward:
a.) Maximum a-posteriori selection – i.e. choose the option that has the highest posterior probability. This is essentially what standard voting procedures do when they choose the option with the most votes
b.) Thompson sampling where we explicitly sample from the posterior over actions[4].
c.) Some kind of truncated sampling where e.g. we truncate to the top-k options and then sample
d.) Some other method which minimizes some loss function which depends on what we care about which we cannot include into the posterior.
Ultimately, through this framework, we can come up with a mathematically precise and rigorous formulation of the preference aggregation and decision problem where we understand what assumptions we are making and what our voting mechanism is actually doing. It also sheds some light into the fundamental components that any preference aggregation and action-selection mechanism must have. Namely, an implicit or explicit likelihood/noise model, an implicit or explicit preference prior, an inference procedure with perhaps regularization or constraints, and a decision procedure to map from the inferred value posterior to actions.
While this is clearly false in general, every existent voting scheme implicitly assumes it is true, and facts that break this assumption – such as vote-buying – are commonly considered ‘hacks’ of the voting scheme
In a way this is one way to think of alternate voting schemes like quadratic voting, where each voter has a set of ‘voting points’ to assign between candidates where the cost of making multiple votes for a specific candidate increases quadratically – i.e. to vote once costs 1 point, to vote twice costs four and so on. In general, there is nothing special about the quadratic function – any strictly increasing function such as an exponential also works nicely – and we can represent a generalized ‘quadratic voting’ algorithm as v=f(vp) where v is the actual vote and vp is the number of voting points subject to some total number of voting points and f is an increasing function. These voting schemes can usefully be seen from a Bayesian perspective as adding additional constraints onto the posterior. For instance, the increasing quadratic cost of additional votes to a single outcome has the effect of highly penalizing low entropy preference distributions and hence tends towards higher entropy ‘broad’ outcomes.
This relates to something I have been wondering about for a while: why does no political system use stochastic policy selection. The idea here being that you count up the votes for different options, normalize them to form a probability distribution, and then sample the policy from this distribution. For instance, suppose that there is a vote in a parliament on a bill and it receives 51 votes against 49. The stochastic voting system would then generate a random number and the pill passes if number <= 0.51 and fails otherwise. This seems a generally fairer way to make decisions than pure majority-vote wins everything while at the same time it means that decisiveness can occur which does not occur in more complex systems with vetoes and supermajorities designed to prevent majority dictatorship, and it is able to represent minority opinions more fairly – i.e. if you are 10% of the legislature your bills win 10% of the time, and also introducing a probably helpful level of stochasticity into the political and governing process. The main issues would be needing to prevent people simply hacking the process by continuing to introduce bills (even if very unpopular ones) until they pass by chance which could be done by assigning a fixed ‘proposal budget’ per voter or per party or whatever.
Excellent post, a great starting point, but we must go deeper :) For instance:
Voting is a very low-bandwidth signaling scheme. There's room for arbitrary expressions of preferences and plans.
Most implementations of voting are also cast as irreversible. We'd want room for dynamic discovery of the aggregate preference by the individuals.
The "collective" won't always have a coherent preference; for instance, if the individuals are locked into a zero-sum game. (Let alone if the individuals' preferences are incoherent to start with!) I'd like a theory that would output "there is no coherent collective here, you should just go your own ways and agree on a transactional relationship instead".
Crossposted from my personal blog.
A fundamental problem in AI alignment, as well as in many social sciences is the problem of preference aggregation. Given a number of different actors who have specific preferences, what is a consistent way of making decisions that ensures that the outcome is fair and ideally that all of the agents are about as happy as they can be [1]. In AI alignment, this is essentially the outer alignment problem which, in layman’s terms is: assuming we have an AI superintelligence which makes all decisions according to a utility function, what should that utility function be?
Ideally, the utility function pursued by the aligned superintelligence, or by the democratic state in political science, in some sense closely and fairly represents the utility functions, interests, and values of the constituent stakeholders: in the case of a (direct) democratic state this is all citizens with voting rights; for an aligned superintelligence ideally this would be some form of humanity in general.
For the case of democracies we have a long history of political science which has coalesced around various forms of voting. Each stakeholder gets some number of votes and can either vote for their chosen policy or candidate or rank them. Then the highest ranking candidate or policy gets selected for implementation. Voting often works reasonably in practice and generates decisions that are usually seen as fair even if in practice direct voting is often significantly watered down, usually to prevent tyrannies of the majority by mechanisms such as constitutions which define democratically-inalienable rights, independent judiciaries which can strike down democratic laws, and most importantly representative democracy where voters do not vote directly on policies but instead for representatives who in theory should vote for their electorate’s desired policies but in practice often do other things. For more theoretical questions of alignment most discussion centers around the vaguely defined notion of coherent extrapolated volition (CEV).
The fundamental question is how do we somehow combine the preferences of the constituents into a coherent utility function or make an action which somehow respects their preferences?
My answer is that we already have an extremely well-developed mathematical machinery for dealing with questions which, I claim, are essentially isomorphic to that of preference aggregation: Bayesian Inference and Bayesian Decision Theory.
To understand this isomorphism, we have to think about things the other way around. Instead of thinking of constituent’s preferences being fundamental, let’s assume instead that there is indeed some “optimal” preference distribution over outcomes/policies U∗. Our goal is to figure out what U∗ is and we can think of the voters or other preference expressions of our constituents as data which helps inform us about the optimal U∗. Let’s assume we have a set of ‘votes’ V=[V1…VN] where each vote is a normalized probability distribution over preferences over outcomes. The rest is standard Bayesian inference. Specifically, we can use Bayes-rule to obtain the distribution over "true preferences" as,
p(U∗∥V)∝p(V∥U∗)p(U∗)
We observe that for this to work in practice, we need two key quantities. First there is the noise/likelihood model p(V∥U∗) which specifies, given an optimal true preference, how we expect people to vote. First, to make things simpler let’s make the extremely common assumption that each vote is independent of the others given the optimal preference i.e. p(V∥U∗)=∏Nip(Vi∥U∗) [2]. For a super naive model, we could just assume that there is some fixed optimal U∗ and that the constituent’s votes are simply noisy reflections of this optimum – i.e. V=U∗+ϵwhere ϵ is some noise distribution. There are a number of noise models you could pick here but many simple cases such as Bernoulli or Gaussian noise result in what are essentially majority vote schemes – i.e. the optimal / MLE choice is the one that receives the most ‘votes’. I found this extremely interesting paper which derives a number of well-known voting schemes such as majority vote, single-transferable vote, and so on as maximum-likelihood Bayesian inference under different noise models.
However, it seems prudent to also go beyond simple noise models like this. A huge advantage of the Bayesian framework is it lets us come up with very complicated models and then mathematically deduce their consequences in a principled way. For instance, one very natural extension would be to suppose that there is not just one ‘optimal’ preference distribution, but instead a number of different value clusters. We could represent this by introducing an additional latent variable z which represents the identity of the value cluster that a given voter belongs to. Then we can apply Bayes theorem in the usual way:
p(U∗,z∥V)∝p(V∥U∗,z)p(U∗∥z)p(z)p(U∗∥V)=∫dz p(U∗,z∥V)
We can continue to add different and more complex structures as we wish until we are happy that our noise model and hence our value posterior is good. Of course more complex models have more expensive and complex inference requirements but if we are serious about making good preference aggregation decisions, this does not seem to be that bad of a cost. Additionally, given the inevitable uncertainty about what the ‘true’ likelihood model is for voters, we can utilize the standard Bayesian model selection machinery to compare hypotheses and ultimately converge on a good model.
Secondly, the Bayesian perspective provides a prior over ‘optimal preferences’ p(U∗). This is very interesting since it gives us a principled way to adjust for various other desiderata we might want to add to our voting system in a mathematically transparent and principled way. For instance, we might want to be conservative in our inference and assume a fairly high entropy posterior distribution of votes. We might want to install basic preferences such as ‘respect for some set of rights’ into the prior by making it extremely unlikely that a valid preference distribution would run counter to these ‘rights’. We might want to encode various forms of symmetry or Rawlsian style ‘veils of ignorance’ into our preference prior[3].
Another useful aspect of this mathematical framework is that it provides a principled way to incorporate hard constraints and additional regularization into the problem. For instance, suppose we had a set of rights that we wanted to always be respected, then this could be encoded into a hard constraint on the solution of the posterior distribution, and we could optimize for the constrained solution via the method of Lagrange multipliers. Similarly, we could also want to explicitly regularize the solution, for instance to also maximize the entropy of our posterior distribution over preferences. Mathematically, this turns the problem into a variational maximization – finding the optimal probability distribution that minimizes the loss function. Suppose we define our approximate / constrained posterior q(U∗) then we define our posterior as the solution to the following variational optimization problem,
q∗(U∗)=argmin D(q(U∗)∥∥p(U∗∥V))+α⋅reg(q(U∗))+λ⋅constraint(q(U∗))
Where reg is some regularizing function and λ⋅constraint(⋅) is a lagrange multiplier embodying a hard constraint on the solution, and $D$ is some divergence measure such as the KL divergence.
Finally, once we have the preference posterior p(U∗∥V) the question remains how to translate this posterior into actually deciding on policies or actions. This is the realm of Bayesian decision theory, where we choose actions that maximize some loss function which depends on this preference posterior. There are a large number of options possible here but a few are fairly straightforward:
a.) Maximum a-posteriori selection – i.e. choose the option that has the highest posterior probability. This is essentially what standard voting procedures do when they choose the option with the most votes
b.) Thompson sampling where we explicitly sample from the posterior over actions[4].
c.) Some kind of truncated sampling where e.g. we truncate to the top-k options and then sample
d.) Some other method which minimizes some loss function which depends on what we care about which we cannot include into the posterior.
Ultimately, through this framework, we can come up with a mathematically precise and rigorous formulation of the preference aggregation and decision problem where we understand what assumptions we are making and what our voting mechanism is actually doing. It also sheds some light into the fundamental components that any preference aggregation and action-selection mechanism must have. Namely, an implicit or explicit likelihood/noise model, an implicit or explicit preference prior, an inference procedure with perhaps regularization or constraints, and a decision procedure to map from the inferred value posterior to actions.
In technical language this is pareto-optimal but is a very strong condition.
While this is clearly false in general, every existent voting scheme implicitly assumes it is true, and facts that break this assumption – such as vote-buying – are commonly considered ‘hacks’ of the voting scheme
In a way this is one way to think of alternate voting schemes like quadratic voting, where each voter has a set of ‘voting points’ to assign between candidates where the cost of making multiple votes for a specific candidate increases quadratically – i.e. to vote once costs 1 point, to vote twice costs four and so on. In general, there is nothing special about the quadratic function – any strictly increasing function such as an exponential also works nicely – and we can represent a generalized ‘quadratic voting’ algorithm as v=f(vp) where v is the actual vote and vp is the number of voting points subject to some total number of voting points and f is an increasing function. These voting schemes can usefully be seen from a Bayesian perspective as adding additional constraints onto the posterior. For instance, the increasing quadratic cost of additional votes to a single outcome has the effect of highly penalizing low entropy preference distributions and hence tends towards higher entropy ‘broad’ outcomes.
This relates to something I have been wondering about for a while: why does no political system use stochastic policy selection. The idea here being that you count up the votes for different options, normalize them to form a probability distribution, and then sample the policy from this distribution. For instance, suppose that there is a vote in a parliament on a bill and it receives 51 votes against 49. The stochastic voting system would then generate a random number and the pill passes if number <= 0.51 and fails otherwise. This seems a generally fairer way to make decisions than pure majority-vote wins everything while at the same time it means that decisiveness can occur which does not occur in more complex systems with vetoes and supermajorities designed to prevent majority dictatorship, and it is able to represent minority opinions more fairly – i.e. if you are 10% of the legislature your bills win 10% of the time, and also introducing a probably helpful level of stochasticity into the political and governing process. The main issues would be needing to prevent people simply hacking the process by continuing to introduce bills (even if very unpopular ones) until they pass by chance which could be done by assigning a fixed ‘proposal budget’ per voter or per party or whatever.