Consider the following degenerate case: there is only one decision to be made, and your competing theories assess it as follows.
And suppose you find theory 2 just slightly more probable than theory 1.
Then it seems like any parliamentary model is going to say that theory 2 wins, and you choose option A. That seems like a bad outcome.
Accordingly, I suggest that to arrive at a workable parliamentary model we need to do at least one of the following:
As you might gather, I find the last option the most promising.
Yes, I think we need something like this veil of ignorance approach.
In a paper (preprint) with Ord and MacAskill we prove that for similar procedures, you end up with cyclical preferences across choice situations if you try to decide after you know the choice situation. The parliamentary model isn't quite within the scope of the proof, but I think more or less the same proof works. I'll try to sketch it.
Suppose:
Then in a decision between A and B there is no scope for negotiation, so as two of the theories prefer A the parliament will. Similarly in a choice between B and C the parliament will prefer B, and in a choice between C and A the parliament will prefer A.
My reading of the problem is that a satisfactory Parliamentary Model should:
Since bargaining in good faith appears to be the core feature, my mind immediately goes to models of bargaining under complete information rather than voting. What are the pros and cons of starting with the Nash bargaining solution as implemented by an alternating offer game?
The two obvious issues are how to translate delegate's preferences into utilities and what the disagreement point is. Assuming a utility function is fairly mild if the delegate has preferences over lotteries. Plus,there's no utility comparison problem even though you need cardinal utilities. The lack of a natural disagreement point is trickier. What intuitions might be lost going this route?
In order to get a better handle on the problem, I’d like to try walking through the mechanics of a how a vote by moral parliament might work. I don’t claim to be doing anything new here, I just want to describe the parliament in more detail to make sure I understand it, and so that it’s easier to reason about.
Here's the setup I have in mind:
Each MP wants to maximize the utility of the results according to their own scores, and they can engage in negotiation before the voting starts to accomplish this.
Does this seem to others like a reasonable description of how the parliamentary vote might work? Any suggestions for improvements to the descr...
In Ideal Advisor Theories and Personal CEV, my co-author and I describe a particular (but still imprecisely specified) version of the parliamentary approach:
we determine the personal CEV of an agent by simulating multiple versions of them, extrapolated from various starting times and along different developmental paths. Some of these versions are then assigned to a parliament where they vote on various choices and make trades with one another.
We then very briefly argue that this kind of approach can overcome some objections to parliamentary models (and similar theories) made by philosopher David Sobel.
The paper is short and non-technical, but still manages to summarize some concerns that we'll likely want a formalized parliamentary model to overcome or sidestep.
...It seems that specifying the delegates' informational situation creates a dilemma.
As you write above, we should take the delegates to think that Parliament's decision is a stochastic variable such that the probability of the Parliament taking action A is proportional to the fraction of votes for A, to avoid giving the majority bloc absolute power.
However, your suggestion generates its own problems (as long as we take the parliament to go with the option with the most votes):
Suppose an issue The Parliament votes on involves options A1, A2, ..., An
We discussed this issue at the two MIRIx Boston workshops. A big problem with parliamentary models which we were unable to solve, was what we've been calling ensemble stability. The issue is this: suppose your AI's value system is made from a collection of value systems in a voting-like system, is constructing a successor, more powerful AI, and is considering constructing the successor so that it represents only a subset of the original value systems. Each value system which is represented will be in favor; each value system which is not represented, will ...
It seems to me that if we're going to be formalizing the idea of the relative "moral importance" of various courses of action to different moral theories, we'll end up having to use something like utility functions. It's unfortunate, then, that deontological rules (which are pretty common) can't be specified with finite utility functions because of the timelessness issue (i.e., a deontologist who doesn't lie won't lie even if doing so would prevent them from being forced to tell ten lies in the future).
...To me it looks like the main issues are in configuring the "delegates" so that they don't "negotiate" quite like real agents - for example, there's no delegate that will threaten to adopt an extremely negative policy in order to gain negotiating leverage over other delegates.
The part where we talk about these negotiations seems to me like the main pressure point on the moral theory qua moral theory - can we point to a form of negotiation that is isomorphic to the "right answer", rather than just being an aw
One route towards analysing this would be to identify a unit of currency which was held in roughly equal value by all delegates (at least at the margin), so that we can analyse how much they value other things in terms of this unit of currency -- this could lead to market prices for things (?).
Perhaps a natural choice for a currency unit would be something like 'unit of total say in the parliament'. So for example a 1% chance that things go the way of your theory, applied before whatever else would happen.
I'm not sure if this could even work, just throwing it out there.
Is there some way to rephrase this without bothering with the parliament analogy at all? For example, how about just having each moral theory assign the available actions a "goodness number" (basically expected utility). Normalize the goodness numbers somehow, then just take the weighted average across moral theories to decide what to do.
If we normalize by dividing each moral theory's answers by its biggest-magnitude answer, (only closed sets of actions allowed :) ) I think this regenerates the described behavior, though I'm not sure. Obviously this cuts out "human-ish" behavior of parliament members, but I think that's a feature, since they don't exist.
Any parliamentary model will involve voting.
When voting arrows impossibly theorm is going to impose constraints that can't be avoided http://en.m.wikipedia.org/wiki/Arrow's_impossibility_theorem
In particular it is impossible to have all of the below
If every voter prefers alternative X over alternative Y, then the group prefers X over Y. If every voter's preference between X and Y remains unchanged, then the group's preference between X and Y will also remain unchanged (even if voters' preferences between other pairs like X and Z, Y and Z, or Z and W chang...
I was thinking last night of how vote trading would work in a completely rational parliamentary system. To simplify things a bit, lets assume that each issue is binary, each delegate holds a position on every issue, and that position can be normalized to a 0.0 - 1.0 ranking. (e.g. If I have a 60% belief that I will gain 10 utility from this issue being approved, it may have a normalized score of .6, if it is a 100% belief that I will gain 10 utility it may be a .7, while a 40% chance of -1000 utility may be a .1) The mapping function doesn't really matt...
Can MPs have unknown utility functions? For example, I might have a relatively low confidence in all explicitly formulated moral theories, and want to give a number of MPs to System 1 - but I don't know in advance how System 1 will vote. Is that problem outside the scope of the parliamentary model (i.e., I can't nominate MPs who don't "know" how they will vote)?
Can MPs have undecidable preference orderings (or sub-orderings)? E.g., such an MP might have some moral axioms that provide orderings for some bills but not others.
Thanks to ESrogs, Stefan_Schubert, and the Effective Altruism summit for the discussion that led to this post!
This post is to test out Polymath-style collaboration on LW. The problem we've chosen to try is formalizing and analyzing Bostrom and Ord's "Parliamentary Model" for dealing with moral uncertainty.
I'll first review the Parliamentary Model, then give some of Polymath's style suggestions, and finally suggest some directions that the conversation could take.
The Parliamentary Model
The Parliamentary Model is an under-specified method of dealing with moral uncertainty, proposed in 2009 by Nick Bostrom and Toby Ord. Reposting Nick's summary from Overcoming Bias:
In a comment, Bostrom continues:
It's an interesting idea, but clearly there are a lot of details to work out. Can we formally specify the kinds of negotiation that delegates can engage in? What about blackmail or prisoners' dilemmas between delegates? It what ways does this proposed method outperform other ways of dealing with moral uncertainty?
I was discussing this with ESRogs and Stefan_Schubert at the Effective Altruism summit, and we thought it might be fun to throw the question open to LessWrong. In particular, we thought it'd be a good test problem for a Polymath-project-style approach.
How to Polymath
The Polymath comment style suggestions are not so different from LW's, but numbers 5 and 6 are particularly important. In essence, they point out that the idea of a Polymath project is to split up the work into minimal chunks among participants, and to get most of the thinking to occur in comment threads. This is as opposed to a process in which one community member goes off for a week, meditates deeply on the problem, and produces a complete solution by themselves. Polymath rules 5 and 6 are instructive:
It seems to us as well that an important part of the Polymath style is to have fun together and to use the principle of charity liberally, so as to create a space in which people can safely be wrong, point out flaws, and build up a better picture together.
Our test project
If you're still reading, then I hope you're interested in giving this a try. The overall goal is to clarify and formalize the Parliamentary Model, and to analyze its strengths and weaknesses relative to other ways of dealing with moral uncertainty. Here are the three most promising questions we came up with:
The original OB post had a couple of comments that I thought were worth reproducing here, in case they spark discussion, so I've posted them.
Finally, if you have meta-level comments on the project as a whole instead of Polymath-style comments that aim to clarify or solve the problem, please reply in the meta-comments thread.