Recent updates to gwern.net (2015-2016)
"When I was one-and-twenty / I heard a wise man say, / 'Give crowns and pounds and guineas / But not your heart away; / Give pearls away and rubies / But keep your fancy free.' / But I was one-and-twenty, / No use to talk to me."
My past year of completed writings, sorted by topic:
Genetics:
- Embryo selection for intelligence cost-benefit analysis
- meta-analysis of intelligence GCTAs, limits set by measurement error, current polygenic scores, possible gains with current IVF procedures, the benefits of selection on multiple complex traits, the possible annual value in the USA of selection & value of larger GWASes, societal consequences of various embryo selection scenarios, embryo count versus polygenic scores as limiting factors, comparison with iterated embryo selection, limits to total gains from iterated embryo selection etc.
- Wikipedia article on Genome-wide complex trait analysis (GCTA)
AI:
- Computational Complexity vs the Singularity
- Adding metadata to an RNN for mimicking individual author style
- Armstrong’s AI control problem:
Reinforce.jsdemo
Biology:
Statistics:
- Candy Japan new packaging decision analysis
- “The Power of Twins: Revisiting Student’s Scottish Milk Experiment Example”
- Genius Revisited: Critiquing the Value of High IQ Elementary Schools
- Inferring mean ethnic IQs from very high IQ samples like TIP/SMPY
Cryptography:
Misc:
gwern.net itself has remained largely stable (some CSS fixes and image size changes); I continue to use Patreon and send out my newsletters.
Counterfactual Mugging Alternative
Edit as of June 13th, 2016: I no longer believe this to be easier to understand than traditional CM, but stand by the rest of it. Minor aesthetic edits made.
First post on the LW discussion board. Not sure if something like this has already been written, need your feedback to let me know if I’m doing something wrong or breaking useful conventions.
An alternative to the counterfactual mugging, since people often require it explained a few times before they understand it -- this one I think will be faster for most to comprehend because it arose organically, not seeming specifically contrived to create a dilemma between decision theories:
Pretend you live in a world where time travel exists and Time can create realities with acausal loops, or of ordinary linear chronology, or another structure, so long as there is no paradox -- only self-consistent timelines can be generated.
In your timeline, there are prophets. A prophet (known to you to be honest and truly prophetic) tells you that you will commit an act which seems horrendously imprudent or problematic. It is an act whose effect will be on the scale of losing $10,000; an act you never would have taken ordinarily. But fight the prophecy all you want, it is self-fulfilling and you definitely live in a timeline where the act gets committed. However, if it weren’t for the prophecy being immutably correct, you could have spent $100 and, even having heard the prophecy (even having believed it would be immutable) the probability of you taking that action would be reduced by, say, 50%. So fighting the prophecy by spending $100 would mean that there were 50% fewer self-consistent (possible) worlds where you lost the $10,000, because its just much less likely for you to end up taking that action if you fight it rather than succumbing to it.
You may feel that there would be no reason to spend $100 averting a decision that you know you’re going to make, and see no reason to care about counterfactual worlds where you don’t lose the $10,000. But the fact of the matter is that if you could have precommitted to fight the choice you would have, because in the worlds where that prophecy could have been presented to you, you’d be decreasing the average disutility by (($10,000)(.5 probability) - ($100) = $4,900). Not following a precommitment that you would have made to prevent the exact situation which you’re now in because you wouldn’t have followed the precommitment seems an obvious failure mode, but UDT successfully does the same calculation shown above and tells you to fight the prophecy. The simple fact that should tell causal decision theorists that converting to UDT is the causally optimal decision is that Updateless Decision Theorists actually do better on average than CDT proponents.
(You may assume also that your timeline is the only timeline that exists, so as not to further complicate the problem by your degree of empathy with your selves from other existing timelines.)
Iterated Gambles and Expected Utility Theory
The Setup
I'm about a third of the way through Stanovich's Decision Making and Rationality in the Modern World. Basically, I've gotten through some of the more basic axioms of decision theory (Dominance, Transitivity, etc).
As I went through the material, I noted that there were a lot of these:
Decision 5. Which of the following options do you prefer (choose one)?
A. A sure gain of $240
B. 25% chance to gain $1,000 and 75% chance to gain nothing
The text goes on to show how most people tend to make irrational choices when confronted with decisions like this; most strikingly was how often irrelevant contexts and framing effected people's decisions.
But I understand the decision theory bit; my question is a little more complicated.
When I was choosing these options myself, I did what I've been taught by the rationalist community to do in situations where I am given nice, concrete numbers: I shut up and I multiplied, and at each decision choose the option with the highest expected utility.
Granted, I equated dollars to utility, which Stanovich does mention that humans don't do well (see Prospect Theory).
The Problem
In the above decision, option B clearly has the higher expected utility, so I chose it. But there was still a nagging doubt in my mind, some part of me that thought, if I was really given this option, in real life, I'd choose A.
So I asked myself: why would I choose A? Is this an emotion that isn't well-calibrated? Am I being risk-averse for gains but risk-taking for losses?
What exactly is going on?
And then I remembered the Prisoner's Dilemma.
A Tangent That Led Me to an Idea
Now, I'll assume that anyone reading this has a basic understanding of the concept, so I'll get straight to the point.
In classical decision theory, the choice to defect (rat the other guy out) is strictly superior to the choice to cooperate (keep your mouth shut). No matter what your partner in crime does, you get a better deal if you defect.
Now, I haven't studied the higher branches of decision theory yet (I have a feeling that Eliezer, for example, would find a way to cooperate and make his partner in crime cooperate as well; after all, rationalists should win.)
Where I've seen the Prisoner's Dilemma resolved is, oddly enough, in Dawkin's The Selfish Gene, which is where I was first introduced to the idea of an Iterated Prisoner's Dilemma.
The interesting idea here is that, if you know you'll be in the Prisoner's Dilemma with the same person multiple times, certain kinds of strategies become available that weren't possible in a single instance of the Dilemma. Partners in crime can be punished for defecting by future defections on your own behalf.
The key idea here is that I might have a different response to the gamble if I knew I could take it again.
The Math
Let's put on our probability hats and actually crunch the numbers:
Format - Probability: $Amount of Money | Probability: $Amount of Money
Assuming one picks A over and over again, or B over and over again.
Iteration A--------------------------------------------------------------------------------------------B
1 $240-----------------------------------------------------------------------------------------1/4: $1,000 | 3/4: $0
2 $480----------------------------------------------------------------------1/16: $2,000 | 6/16: $1,000 | 9/16: $0
3 $720---------------------------------------------------1/64: $3,000 | 9/64: $2,000 | 27/64: $1,000 | 27/64: $0
4 $960------------------------1/256: $4,000 | 12/256: $3,000 | 54/256: $2,000 | 108/256: $1,000 | 81/256: $0
5 $1,200----1/1024: $5,000 | 15/1024: $4,000 | 90/256: $3,000 | 270/1024: $2,000 | 405/1024: $1,000 | 243/1024: $0
And so on. (If I've ma de a mistake, please let me know.)
The Analysis
It is certainly true that, in terms of expected money, option B outperforms option A no matter how many times one takes the gamble, but instead, let's think in terms of anticipated experience - what we actually expect to happen should we take each bet.
The first time we take option B, we note that there is a 75% chance that we walk away disappointed. That is, if one person chooses option A, and four people choose option B, on average three out of those four people will underperform the person who chose option A. And it probably won't come as much consolation to the three losers that the winner won significantly bigger than the person who chose A.
And since nothing unusual ever happens, we should think that, on average, having taken option B, we'd wind up underperforming option A.
Now let's look at further iterations. In the second iteration, we're more likely than not to have nothing having taken option B twice than we are to have anything.
In the third iteration, there's about a 57.8% chance that we'll have outperformed the person who chose option A the whole time, and a 42.2% chance that we'll have nothing.
In the fourth iteration, there's a 73.8% chance that we'll have matched or done worse than the person who has chose option A four times (I'm rounding a bit, $1,000 isn't that much better than $960).
In the fifth iteration, the above percentage drops to 63.3%.
Now, without doing a longer analysis, I can tell that option B will eventually win. That was obvious from the beginning.
But there's still a better than even chance you'll wind up with less, picking option B, than by picking option A. At least for the first five times you take the gamble.
Conclusions
If we act to maximize expected utility, we should choose option B, at least so long as I hold that dollars=utility. And yet it seems that one would have to take option B a fair number of times before it becomes likely that any given person, taking the iterated gamble, will outperform a different person repeatedly taking option A.
In other words, of the 1025 people taking the iterated gamble:
we expect 1 to walk away with $1,200 (from taking option A five times),
we expect 376 to walk away with more than $1,200, casting smug glances at the scaredy-cat who took option A the whole time,
and we expect 648 to walk away muttering to themselves about how the whole thing was rigged, casting dirty glances at the other 377 people.
After all the calculations, I still think that, if this gamble was really offered to me, I'd take option A, unless I knew for a fact that I could retake the gamble quite a few times. How do I interpret this in terms of expected utility?
Am I not really treating dollars as equal to utility, and discounting the marginal utility of the additional thousands of dollars that the 376 win?
What mistakes am I making?
Also, a quick trip to google confirms my intuition that there is plenty of work on iterated decisions; does anyone know a good primer on them?
I'd like to leave you with this:
If you were actually offered this gamble in real life, which option would you take?
The Philosophical Implications of Quantum Information Theory
I was asked to write up a pithy summary of the upshot of this paper. This is the best I could manage.
One of the most remarkable features of the world we live in is that we can make measurements that are consistent across space and time. By "consistent across space" I mean that you and I can look at the outcome of a measurement and agree on what that outcome was. By "consistent across time" I mean that you can make a measurement of a system at one time and then make the same measurement of that system at some later time and the results will agree.
It is tempting to think that the reason we can do these things is that there exists an objective reality that is "actually out there" in some metaphysical sense, and that our measurements are faithful reflections of that objective reality. This hypothesis works well (indeed, seems self-evidently true!) until we get to very small systems, where it seems to break down. We can still make measurements that are consistent across space and time, but as soon as we stop making measurements, then things start to behave very differently than they did before. The classical example of this is the two-slit experiment: whenever we look at a particle we only ever find it in one particular place. When we look continuously, we see the particle trace out an unambiguous and continuous trajectory. But when we don't look, the particle behaves as if it is in more than one place at once, a behavior that manifests itself as interference.
The problem of how to reconcile the seemingly incompatible behavior of physical systems depending on whether or not they are under observation has come to be called the measurement problem. The most common explanation of the measurement problem is the Copenhagen interpretation of quantum mechanics which postulates that the act of measurement changes a system via a process called wave function collapse. In the contemporary popular press you will often read about wave function collapse in conjunction with the phenomenon of quantum entanglement, which is usually referred to as "spooky action at a distance", a phrase coined by Einstein, and intended to be pejorative. For example, here's the headline and first sentence of the above piece:
More evidence to support quantum theory’s ‘spooky action at a distance’
It’s one of the strangest concepts in the already strange field of quantum physics: Measuring the condition or state of a quantum particle like an electron can instantly change the state of another electron—even if it’s light-years away." (emphasis added)
This sort of language is endemic in the popular press as well as many physics textbooks, but it is demonstrably wrong. The truth is that measurement and entanglement are actually the same physical phenomenon. What we call "measurement" is really just entanglement on a large scale. If you want to see the demonstration of the truth of this statement, read the paper or watch the video or read the original paper on which my paper and video are based. Or go back and read about Von Neumann measurements or quantum decoherence or Everett's relative state theory (often mis-labeled "many-worlds") or relational quantum mechanics or the Ithaca interpretation of quantum mechanics, all of which turn out to be saying exactly the same thing.
Which is: the reason that measurements are consistent across space and time is not because these measurements are a faithful reflection of an underlying objective reality. The reason that measurements are consistent across space and time is because this is what quantum mechanics predicts when you consider only parts of the wave function and ignore other parts.
Specifically, it is possible to write down a mathematical description of a particle and two observers as a quantum mechanical system. If you ignore the particle (this is a formal mathematical operation called a partial trace of an operator matrix ) what you are left with is a description of the observers. And if you then apply information theoretical operations to that, what pops out is that the two observers are in classically correlated states. The exact same thing happens for observations made of the same particle at two different times.
The upshot is that nothing special happens during a measurement. Measurements are not instantaneous (though they are very fast ) and they are in principle reversible, though not in practice.
The final consequence of this, the one that grates most heavily on the intuition, is that your existence as a classical entity is an illusion. Because measurements are not a faithful reflection of an underlying objective reality, your own self-perception (which is a kind of measurement) is not a faithful reflection of an underlying objective reality either. You are not, in point of metaphysical fact, made of atoms. Atoms are a very (very!) good approximation to the truth, but they are not the truth. At the deepest level, you are a slice of the quantum wave function that behaves, to a very high degree of approximation, as if it were a classical system but is not in fact a classical system. You are in a very real sense living in the Matrix, except that the Matrix you are living in is running on a quantum computer, and so you -- the very close approximation to a classical entity that is reading these words -- can never "escape" the way Neo did.
As a corollary to this, time travel is impossible, because in point of metaphysical fact there is no time. Your perception of time is caused by the accumulation of entanglements in your slice of the wave function, resulting in the creation of information that you (and the rest of your classically-correlated slice of the wave function) "remember". It is those memories that define the past, you could even say create the past. Going "back to the past" is not merely impossible it is logically incoherent, no different from trying to construct a four-sided triangle. (And if you don't buy that argument, here's a more prosaic one: having a physical entity suddenly vanish from one time and reappear at a different time would violate conservation of energy.)
[Link] Huffington Post article about dual process theory
Published a piece in The Huffington Post popularizing dual-process theory in layman's language.
P.S. I know some don't like using terms like Autopilot and Intentional to describe System 1 and System 2, but I find from long experience that these terms resonate well with a broad audience. Also, I know dual process theory is criticized by some, but we have to start somewhere, and just explaining dual process theory is a way to start bridging the inference gap to higher meta-cognition.
Omega's Idiot Brother, Epsilon
Epsilon walks up to you with two boxes, A and b, labeled in rather childish-looking handwriting written in crayon.
"In box A," he intones, sounding like he's trying to be foreboding, which might work better when he hits puberty, "I may or may not have placed a million of your human dollars." He pauses for a moment, then nods. "Yes. I may or may not have placed a million dollars in this box. If I expect you to open Box B, the million dollars won't be there. Box B will contain, regardless of what you do, one thousand dollars. You may choose to take one box, or both; I will leave with any boxes you do not take."
You've been anticipating this. He's appeared to around twelve thousand people so far. Out of eight thousand people who accepted both boxes, eighty found the million dollars missing, and walked away with $1,000; the other seven thousand nine hundred and twenty people walked away with $1,001,000 dollars. Out of the four thousand people who opened only box A, only four found it empty.
The agreement is unanimous: Epsilon is really quite bad at this. So, do you one-box, or two-box?
There are some important differences here with the original problem. First, Epsilon won't let you open either box until you've decided whether to open one or both, and will leave with the other box. Second, while Epsilon's false positive rate on identifying two-boxers is quite impressive, making mistakes about one-boxers only .1% of the time, his false negative rate is quite unimpressive - he catches 1% of everybody who engages in it. Whatever heuristic he's using, clearly, he prefers to let two-boxers slide than to accidentally punish one-boxers.
I'm curious to know whether anybody would two-box in this scenario and why, and particularly curious in the reasoning of anybody whose answer is different between the original Newcomb problem and this one.
Newcomb, Bostrom, Calvin: Credence and the strange path to a finite afterlife
This is a bit rough, but I think that it is an interesting and potentially compelling idea. To keep this short, and accordingly increase the number of eyes over it, I have only sketched the bare bones of the idea.
1) Empirically, people have varying intuitions and beliefs about causality, particularly in Newcomb-like problems (http://wiki.lesswrong.com/wiki/Newcomb's_problem, http://philpapers.org/surveys/results.pl, and https://en.wikipedia.org/wiki/Irresistible_grace).
2) Also, as an empirical matter, some people believe in taking actions after the fact, such as one-boxing, or Calvinist “irresistible grace”, to try to ensure or conform with a seemingly already determined outcome. This might be out of a sense of retrocausality, performance, moral honesty, etc. What matters is that we know that they will act it out, despite it violating common sense causality. There has been some great work on decision theory on LW about trying to thread this needle well.
3) The second disjunct of the simulation argument (http://wiki.lesswrong.com/wiki/Simulation_argument) shows that the decision making of humanity is evidentially relevant in what our subjective credence should be that we are in a simulation. That is to say, if we are actively headed toward making simulations, we should increase our credence of being in a simulation, if we are actively headed away from making simulations, through either existential risk or law/policy against it, we should decrease our credence.
4) Many, if not most, people would like for there to be a pleasant afterlife after death, especially if we could be reunited with loved ones.
5) There is no reason to believe that simulations which are otherwise nearly identical copies of our world, could not contain, after the simulated bodily death of the participants, an extremely long-duration, though finite, "heaven"-like afterlife shared by simulation participants.
6) Our heading towards creating such simulations, especially if they were capable of nesting simulations, should increase credence that we exist in such a simulation and should perhaps expect a heaven-like afterlife of long, though finite, duration.
7) Those who believe in alternative causality, or retrocausality, in Newcomb-like situations should be especially excited about the opportunity to push the world towards surviving, allowing these types of simulations, and creating them, as it would potentially suggest, analogously, that if they work towards creating simulations with heaven-like afterlives, that they might in some sense be “causing” such a heaven to exist for themselves, and even for friends and family who have already died. Such an idea of life-after-death, and especially for being reunited with loved ones, can be extremely compelling.
8) I believe that people matching the above description, that is, holding both an intuition in alternative causality, and finding such a heaven-like-afterlife compelling, exist. Further, the existence of such people, and their associated motivation to try to create such simulations, should increase the credence even of two-boxing types, that we already live in such a world with a heaven-like afterlife. This is because knowledge of a motivated minority desiring simulations should increase credence in the likely success of simulations. This is essentially showing that “this probably happened before, one level up” from the two-box perspective.
9) As an empirical matter, I also think that there are people who would find the idea of creating simulations with heaven-like afterlives compelling, even if they are not one-boxers, from a simply altruistic perspective, both since it is a nice thing to do for the future sim people, who can, for example, probabilistically have a much better existence than biological children on earth can, and as it is a nice thing to do to increase the credence (and emotional comfort) of both one-boxers and two-boxers in our world thinking that there might be a life after death.
10) This creates the opportunity for a secular movement in which people work towards creating these simulations, and use this work and potential success in order to derive comfort and meaning from their life. For example, making donations to a simulation-creating or promoting, or existential threat avoiding, think-tank after a loved one’s death, partially symbolically, partially hopefully.
11) There is at least some room for Pascalian considerations even for two-boxers who allow for some humility in their beliefs. Nozick believed one-boxers will become two boxers if Box A is raised to 900,000, and two-boxers will become one-boxers if Box A is lowered to $1. Similarly, trying to work towards these simulations, even if you do not find it altruistically compelling, and even if you think that the odds of alternative or retrocausality is infinitesimally small, might make sense in that the reward could be extremely large, including potentially trillions of lifetimes worth of time spent in an afterlife “heaven” with friends and family.
Finally, this idea might be one worth filling in (I have been, in my private notes for over a year, but am a bit shy to debut that all just yet, even working up the courage to post this was difficult) if only because it is interesting, and could be used as a hook to get more people interested in existential risk, including the AI control problem. This is because existential catastrophe is probably the best enemy of credence in the future of such simulations, and accordingly in our reasonable credence in thinking that we have such a heaven awaiting us after death now. A short hook headline like “avoiding existential risk is key to afterlife” can get a conversation going. I can imagine Salon, etc. taking another swipe at it, and in doing so, creating publicity which would help in finding more similar minded folks to get involved in the work of MIRI, FHI, CEA etc. There are also some really interesting ideas about acausal trade, and game theory between higher and lower worlds, as a form of “compulsion” in which they punish worlds for not creating heaven containing simulations (therefore effecting their credence as observers of the simulation), in order to reach an equilibrium in which simulations with heaven-like afterlives are universal, or nearly universal. More on that later if this is received well.
Also, if anyone would like to join with me in researching, bull sessioning, or writing about this stuff, please feel free to IM me. Also, if anyone has a really good, non-obvious pin with which to pop my balloon, preferably in a gentle way, it would be really appreciated. I am spending a lot of energy and time on this if it is fundamentally flawed in some way.
Thank you.
*******************************
November 11 Updates and Edits for Clarification
1) There seems to be confusion about what I mean by self-location and credence. A good way to think of this is the Sleeping Beauty Problem (https://wiki.lesswrong.com/wiki/Sleeping_Beauty_problem)
If I imagine myself as Sleeping Beauty (and who doesn’t?), and I am asked on Sunday what my credence is that the coin will be tails, I will say 1/2. If I am awakened during the experiment without being told which day it is and am asked what my credence is that the coin was tails, I will say 2/3. If I am then told it is Monday, I will update my credence to ½. If I am told it is Tuesday I update my credence to 1. If someone asks me two days after the experiment about my credence of it being tails, if I somehow do not know the days of the week still, I will say ½. Credence changes with where you are, and with what information you have. As we might be in a simulation, we are somewhere in the “experiment days” and information can help orient our credence. As humanity potentially has some say in whether or not we are in a simulation, information about how humans make decisions about these types of things can and should effect our credence.
Imagine Sleeping Beauty is a lesswrong reader. If Sleeping Beauty is unfamiliar with the simulation argument, and someone asks her about her credence of being in a simulation, she probably answers something like 0.0000000001% (all numbers for illustrative purposes only). If someone shows her the simulation argument, she increases to 1%. If she stumbles across this blog entry, she increases her credence to 2%, and adds some credence to the additional hypothesis that it may be a simulation with an afterlife. If she sees that a ton of people get really interested in this idea, and start raising funds to build simulations in the future and to lobby governments both for great AI safeguards and for regulation of future simulations, she raises her credence to 4%. If she lives through the AI superintelligence explosion and simulations are being built, but not yet turned on, her credence increases to 20%. If humanity turns them on, it increases to 50%. If there are trillions of them, she increases her credence to 60%. If 99% of simulations survive their own run-ins with artificial superintelligence and produce their own simulations, she increases her credence to 95%.
2) This set of simulations does not need to recreate the current world or any specific people in it. That is a different idea that is not necessary to this argument. As written the argument is premised on the idea of creating fully unique people. The point would be to increase our credence that we are functionally identical in type to the unique individuals in the simulation. This is done by creating ignorance or uncertainty in simulations, so that the majority of people similarly situated, in a world which may or may not be in a simulation, are in fact in a simulation. This should, in our ignorance, increase our credence that we are in a simulation. The point is about how we self-locate, as discussed in the original article by Bostrom. It is a short 12-page read, and if you have not read it yet, I would encourage it: http://simulation-argument.com/simulation.html. The point about past loved ones I was making was to bring up the possibility that the simulations could be designed to transfer people to a separate after-life simulation where they could be reunited after dying in the first part of the simulation. This was not about trying to create something for us to upload ourselves into, along with attempted replicas of dead loved ones. This staying-in-one simulation through two phases, a short life, and relatively long afterlife, also has the advantage of circumventing the teletransportation paradox as “all of the person" can be moved into the afterlife part of the simulation.
[Link] Game Theory YouTube Videos
I made a series of game theory videos that carefully go through the mechanics of solving many different types of games. I optimized the videos for my future Smith College game theory students who will either miss a class, or get lost in class and want more examples. I emphasize clarity over excitement. I would be grateful for any feedback.
Min/max goal factoring and belief mapping exercise
Edit 3: Removed description of previous edits and added the following:
This thread used to contain the description of a rationality exercise.
I have removed it and plan to rewrite it better.
I will repost it here, or delete this thread and repost in the discussion.
Thank you.
Why isn't the following decision theory optimal?
I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong.
In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes
By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment.
Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct.
Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case.
Why can't we instead construct Gloria directly, using the idea of the thing that CDT agents wished they were? Obviously we can't just postulate a decision algorithm that we don't know how to execute, and then note that a CDT agent would wish they had that decision algorithm, and pretend we had solved the problem. We need to be able to describe the ideal decision algorithm to a level of detail that we could theoretically program into an AI.
Consider this decision algorithm, which I'll temporarily call Nameless Decision Theory (NDT) until I get feedback about whether it deserves a name: you should always make the decision that a CDT-agent would have wished he had pre-committed to, if he had previously known he'd be in his current situation and had the opportunity to precommit to a decision.
In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you.
NDT is so simple, and Eliezer comes so close to stating it in his discussion of Gloria, that I assume there is some flaw with it that I'm not seeing. Perhaps NDT does not count as a "real"/"well defined" decision procedure, or can't be formalized for some reason? Even so, it does seem like it'd be possible to program an AI to behave in this way.
Can someone give an example of a decision problem for which this decision procedure fails? Or for which there are multiple possible precommitments that you would have wished you'd made and it's not clear which one is best?
EDIT: I now think this definition of NDT better captures what I was trying to express: You should always make the decision that a CDT-agent would have wished he had precommitted to, if he had previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.
Linked decisions an a "nice" solution for the Fermi paradox
One of the more speculative solutions of the Fermi paradox is that all civilizations decide to stay home, thereby meta-cause other civilizations to stay home too, and thus allow the Fermi paradox to have a nice solution. (I remember reading this idea in Paul Almond’s writings about evidential decision theory, which unfortunately seem no longer available online.) The plausibility of this argument is definitely questionable. It requires a very high degree of goal convergence both within and among different civilizations. Let us grant this convergence and assume that, indeed, most civilizations arrive at the same decision and that they make their decision knowing this. One paradoxical implication then is: If a civilization decides to attempt space colonization, they are virtually guaranteed to face unexpected difficulties (for otherwise space would already be colonized, unless they are the first civilization in their neighborhood attempting space colonization). If, on the other hand, everyone decides to stay home, there is no reason for thinking that there would be any unexpected difficulties if one tried. Space colonization can either be easy, or you can try it, but not both.
Can the basic idea behind the argument be formalized? Consider the following game: There are N>>1 players. Each player is offered to push a button in turn. Pushing the button yields a reward R>0 with probability p and a punishment P<0 otherwise. (R corresponds to successful space colonization while P corresponds to a failed colonization attempt.) Not pushing the button gives zero utility. If a player pushes the button and receives R, the game is immediately aborted, while the game continues if a player receives P. Players do not know how many other players were offered to push the button before them, they only know that no player before them received R. Players also don’t know p. Instead, they have a probability distribution u(p) over possible values of p. (u(p)>=0 and the integral of u(p) from 0 to 1 is given by int_{0}^{1}u(p)dp=1.) We also assume that the decisions of the different players are perfectly linked.
Naively, it seems that players simply have an effective success probability p_eff,1=int_{0}^{1}p*u(p)dp and they should push the button iff p_eff,1*R+(1-p_eff,1)*P>0. Indeed, if players decide not to push the button they should expect that pushing the button would have given them R with probability p_eff,1. The situation becomes more complicated if a player decides to push the button. If a player pushes the button, they know that all players before them have also pushed the button and have received P. Before taking this knowledge into account, players are completely ignorant about the number i of players who were offered to push the button before them, and have to assign each number i from 0 to N-1 the same probability 1/N. Taking into account that all players before them have received P, the variables i and p become correlated: the larger i, the higher the probability of a small value of p. Formally, the joint probability distribution w(i,p) for the two variables is, according to Bayes’ theorem, given by w(i,p)=c*u(p)*(1-p)^i, where c is a normalization constant. The marginal distribution w(p) is given by w(p)=sum_{i=0}^{N-1}w(i,p). Using N>>1, we find w(p)=c*u(p)/p. The normalization constant is thus c=[int_{0}^{1}u(p)/p*dp]^{-1}. Finally, we find that the effective success probability taking the linkage of decisions into account is given by
p_eff,2 = int_{0}^{1}p*w(p)dp = c = [int_{0}^{1}u(p)/p*dp]^{-1} .
This is the expected chance of success if players decide to push the button. Players should push the button iff p_eff,2*R+(1-p_eff,2)*P>0. If follows from convexity of the function x->1/x (for positive x) that p_eff,2<=p_eff,1. So by deciding to push the button, players decrease their expected success probability from p_eff,1 to p_eff,2; they cannot both push the button and have the unaltered success probability p_eff,1. Linked decisions can explain why no one pushes the button if p_eff,2*R+(1-p_eff,2)*P<0, even though we might have p_eff,1*R+(1-p_eff,1)*P>0 and pushing the button naively seems to have positive expected utility.
It is also worth noting that if u(0)>0, the integral int_{0}^{1}u(p)/p*dp diverges such that we have p_eff,2=0. This means that given perfectly linked decisions and a sufficiently large number of players N>>1, players should never push the button if their distribution u(p) satisfies u(0)>0, irrespective of the ratio of R and P. This is due to an observer selection effect: If a player decides to push the button, then the fact that they are even offered to push the button is most likely due to p being very small and thus a lot of players being offered to push the button.
Intuitive cooperation
This is an exposition of some of the main ideas in the paper Robust Cooperation. My goal is to make the ideas and proofs seem natural and intuitive - instead of some mysterious thing where we invoke Löb's theorem at the right place and the agents magically cooperate. Also I hope it is accessible to people without a math or CS background. Be warned, it is pretty cheesy ok.
In a small quirky town, far away from other cities or towns, the most exciting event is a game called (for historical reasons) The Prisoner's Dilemma. Everyone comes out to watch the big tournament at the end of Summer, and you (Alice) are especially excited because this year it will be your first time playing in the tournament! So you've been thinking of ways to make sure that you can do well.
The way the game works is this: Each player can choose to cooperate or defect with the other player. If you both cooperate, then you get two points each. If one of you defects, then that player will get three points, and the other player won't get any points. But if you both defect, then you each get only one point. You have to make your decisions separately, without communicating with each other - however, everyone is required to register the algorithm they will be using before the tournament, and you can look at the other player's algorithm if you want to. You also are allowed to use some outside help in your algorithm.

Now if you were a newcomer, you might think that no matter what the other player does, you can always do better by defecting. So the best strategy must be to always defect! Of course, you know better, if everyone tried that strategy, then they would end up defecting against each other, which is a shame since they would both be better off if they had just cooperated.
But how can you do better? You have to be able to describe your algorithm in order to play. You have a few ideas, and you'll be playing some practice rounds with your friend Bob soon, so you can try them out before the actual tournament.
Your first plan:
I'll cooperate with Bob if I can tell from his algorithm that he'll cooperate with me. Otherwise I'll defect.
For your first try, you'll just run Bob's algorithm and see if he cooperates. But there's a problem - if Bob tries the same strategy, he'll have to run your algorithm, which will run his algorithm again, and so on into an infinite loop!
So you'll have to be a bit more clever than that... luckily you know a guy, Shady, who is good at these kinds of problems.
You call up Shady, and while you are waiting for him to come over, you remember some advice your dad Löb gave you.
(Löb's theorem) "If someone says you can trust them on X, well then they'll just tell you X."
If (someone tells you If [I tell you] X, then X is true)
Then (someone tells you X is true)
(See The Cartoon Guide to Löb's Theorem[pdf] for a nice proof of this)
Here's an example:
Sketchy watch salesman: Hey, if I tell you these watches are genuine then they are genuine!
You: Ok... so are these watches genuine?
Sketchy watch salesman: Of course!
It's a good thing to remember when you might have to trust someone. If someone you already trust tells you you can trust them on something, then you know that something must be true.
On the other hand, if someone says you can always trust them, well that's pretty suspicious... If they say you can trust them on everything, that means that they will never tell you a lie - which is logically equivalent to them saying that if they were to tell you a lie, then that lie must be true. So by Löb's theorem, they will lie to you. (Gödel's second incompleteness theorem)
Despite his name, you actually trust Shady quite a bit. He's never told you or anyone else anything that didn't end up being true. And he's careful not to make any suspiciously strong claims about his honesty.
So your new plan is to ask Shady if Bob will cooperate with you. If so, then you will cooperate. Otherwise, defect. (FairBot)
It's game time! You look at Bob's algorithm, and it turns out he picked the exact same algorithm! He's going to ask Shady if you will cooperate with him. Well, the first step is to ask Shady, "will Bob cooperate with me?"
Shady looks at Bob's algorithm and sees that if Shady says you cooperate, then Bob cooperates. He looks at your algorithm and sees that if Shady says Bob cooperates, then you cooperate. Combining these, he sees that if he says you both cooperate, then both of you will cooperate. So he tells you that you will both cooperate (your dad was right!)
Let A stand for "Alice cooperates with Bob" and B stand for "Bob cooperates with Alice".
From looking at the algorithms, and
.
So combining these, .
Then by Löb's theorem, .
Since that means that Bob will cooperate, you decide to actually cooperate.
Bob goes through an analagous thought process, and also decides to cooperate. So you cooperate with each other on the prisoner's dilemma! Yay!
That night, you go home and remark, "it's really lucky we both ended up using Shady to help us, otherwise that wouldn't have worked..."
Your dad interjects, "Actually, it doesn't matter - as long as they were both smart enough to count, it would work. This doesn't just say 'I tell you X', it's stronger than that - it actually says 'Anyone who knows basic arithmetic will tell you X'. So as long as they both know a little arithmetic, it will still work - even if one of them is pro-axiom-of-choice, and the other is pro-axiom-of-life. The cooperation is robust." That's really cool!
But there's another issue you think of. Sometimes, just to be tricky, the tournament organizers will set up a game where you have to play against a rock. Yes, literally just a rock that holding the cooperate button down. If you played against a rock with your current algorithm, well you start by asking Shady if the rock will cooperate with you. Shady is like, "well yeah, duh." So then you cooperate too. But you could have gotten three points by defecting! You're missing out on a totally free point!
You think that it would be a good idea to make sure the other player isn't a complete idiot before you cooperate with them. How can you check? Well, let's see if they would cooperate with a rock placed on the defect button (affectionately known as 'DefectRock'). If they know better than that, and they will cooperate with you, then you will cooperate with them.
The next morning, you excitedly tell Shady about your new plan. "It will be like before, except this time, I also ask you if the other player will cooperate with DefectRock! If they are dumb enough to do that, then I'll just defect. That way, I can still cooperate with other people who use algorithms like this one, or the one from before, but I can also defect and get that extra point when there's just a rock on cooperate."
Shady get's an awkward look on his face, "Sorry, but I can't do that... or at least it wouldn't work out the way you're thinking. Let's say you're playing against Bob, who is still using the old algorithm. You want to know if Bob will cooperate with DefectRock, so I have to check and see if I'll tell Bob that DefectRock will cooperate with him. I would have say I would never tell Bob that DefectRock will cooperate with him. But by Löb's theorem, that means I would tell you this obvious lie! So that isn't gonna work."
Notation, if X cooperates with Y in the prisoner's dilemma (or = D if not).
You ask Shady, does ?
Bob's algorithm: only if
.
So to say , we would need
.
This is equivalent to , since
is an obvious lie.
By Löb's theorem, , which is a lie.
<Extra credit: does the fact that Shady is the one explaining this mean you can't trust him?>
<Extra extra credit: find and fix the minor technical error in the above argument.>
Shady sees the dismayed look on your face and adds, "...but, I know a guy who can vouch for me, and I think maybe that could make your new algorithm work."
So Shady calls his friend T over, and you work out the new details. You ask Shady if Bob will cooperate with you, and you ask T if Bob will cooperate with DefectRock. So T looks at Bob's algorithm, which asks Shady if DefectRock will cooperate with him. Shady, of course, says no. So T sees that Bob will defect against DefectRock, and lets you know. Like before, Shady tells you Bob will cooperate with you, and thus you decide to cooperate! And like before, Bob decides to cooperate with you, so you both cooperate! Awesome! (PrudentBot)
If Bob is using your new algorithm, you can see that the same argument goes through mostly unchanged, and that you will still cooperate! And against a rock on cooperate, T will tell you that it will cooperate with DefectRock, so you can defect and get that extra point! This is really great!!
(ok now it's time for the really cheesy ending)
It's finally time for the tournament. You have a really good feeling about your algorithm, and you do really well! Your dad is in the audience cheering for you, with a really proud look on his face. You tell your friend Bob about your new algorithm so that he can also get that extra point sometimes, and you end up tying for first place with him!
A few weeks later, Bob asks you out, and you two start dating. Being able to cooperate with each other robustly is a good start to a healthy relationship, and you live happily ever after!
The End.
A simple game that has no solution
The following simple game has one solution that seems correct, but isn’t. Can you figure out why?
The Game
Player One moves first. He must pick A, B, or C. If Player One picks A the game ends and Player Two does nothing. If Player One picks B or C, Player Two will be told that Player One picked B or C, but will not be told which of these two strategies Player One picked, Player Two must then pick X or Y, and then the game ends. The following shows the Players’ payoffs for each possible outcome. Player One’s payoff is listed first.
A 3,0 [And Player Two never got to move.]
B,X 2,0
B,Y 2,2
C,X 0,1
C,Y 6,0
[LINK] Prisoner's Dilemma? Not So Much
Hannes Rusch argues that the Prisoner's Dilemma is best understood as merely one game of very many:
only 2 of the 726 combinatorially possible strategically unique ordinal 2x2 games have the detrimental characteristics of a PD and that the frequency of PD-type games in a space of games with random payoffs does not exceed about 3.5%. Although this does not compellingly imply that the relevance of PDs is overestimated, in the absence of convergent empirical information about the ancestral human social niche, this finding can be interpreted in favour of a rather neglected answer to the question of how the founding groups of human cooperation themselves came to cooperate: Behavioural and/or psychological mechanisms which evolved for other, possibly more frequent, social interaction situations might have been applied to PD-type dilemmas only later.
[Sequence announcement] Introduction to Mechanism Design
Mechanism design is the theory of how to construct institutions for strategic agents, spanning applications like voting systems, school admissions, regulation of monopolists, and auction design. Think of it as the engineering side of game theory, building algorithms for strategic agents. While it doesn't have much to say about rationality directly, mechanism design provides tools and results for anyone interested in world optimization.
In this sequence, I'll touch on
- The basic mechanism design framework, including the revelation principle and incentive compatibility.
- The Gibbard-Satterthwaite impossibility theorem for strategyproof implementation (a close analogue of Arrow's Theorem), and restricted domains like single-peaked or quasilinear preference where we do have positive results.
- The power and limitations of Vickrey-Clarke-Groves mechanisms for efficiently allocating goods, generalizing Vickrey's second-price auction.
- Characterizations of incentive-compatible mechanisms and the revenue equivalence theorem.
- Profit-maximizing auctions.
- The Myerson-Satterthwaite impossibility for bilateral trade.
- Two-sided matching markets à la Gale and Shapley, school choice, and kidney exchange.
As the list above suggests, this sequence is going to be semi-technical, but my foremost goal is to convey the intuition behind these results. Since mechanism design builds on game theory, take a look at Yvain's Game Theory Intro if you want to brush up.
Various resources:
- For further introduction, you can start with the popular or more scholarly survey of mechanism design from the 2007 Nobel memoriam prize in economics.
- Jeff Ely has lecture notes and short videos to accompany an undergraduate class in microeconomic theory from the perspective of mechanism design.
- The textbook A Toolbox for Economic Design by Dimitrios Diamantaras is very accessible and comprehensive if you can get ahold of a copy.
- Tilman Börgers has a draft textbook intended for graduate students.
- Chapters 9-16 of Algorithmic Game Theory and chapters 10-11 of Multiagent Systems cover various topics in mechanism design from the perspective of computer scientists.
- Video lectures introducing market design and computational aspects of mechanism design.
I plan on following up on this sequence with another focusing on group rationality and information aggregation, surveying scoring rules and prediction markets among other topics.
Suggestions and comments are very welcome.
Identity and Death
This recent SMBC comic illustrates the old question of what exactly is you by referencing the Star Trek Teleporter Problem. Do you actually get teleported or does the teleporter just kill you before making a copy of you somewhere else?
Well, the answer that a lot of rationalist seem to accept is Pattern Identity Theory proposed by Hans Moravec (skim the link or do a google search for the theory if you have no idea what I am referring to). I am very sympathetic to this view and it definitely ties with my limited understanding of physics and biology - elementary particles are interchangeable and do not have 'identity', at least some of the atoms in your body (including some of those who form neurons) get replaced over time etc.
This is all fine and dandy, but if you take this view to its logical extreme it looks like a sufficently modified version of you shouldn't actually qualify as you - the difference in the pattern might be as great or greater than the difference in the patterns of any two random people.
Let's say something happens to Eliezer and he gets successfully cryo-preserved in 2014. Then 80 years later the singularity hasn't arrived yet but the future is still pretty good - everyone is smart and happy due to enhancements, ageing is a thing of the past and we have the technology to wake cryopreserved people up. The people in that future build Eliezer a new body, restore the information from his brain and apply all the standard enhancements on him and then they wake him up. The person who wakes up remembers all that good old Eliezer did and seems to act like you would expect an enhanced Eliezer to act. However, if you examine things closely the difference between 2014!Eliezer and 2094!Eliezer is actually bigger than the difference between 2014!Eliezer and let's say 2014!Yvain due to having all the new standard enhancements. Does that person really qualify as the same person according to Pattern Identity Theory, then? Sure, he originates from Eliezer and arguably the difference between the two is similar to the difference between kid!Eliezer and adult!Eliezer but is it really the same pattern? If you believe that you really are the pattern then how can you not think of Eliezer!2014 as a dead man?
Sure, you could argue that continual change (as opposed to the sudden change in the cryo!Eliezer scenario) or 'evolution of the pattern' is in some way relevant but why would that be? The only somewhat reasonable argument for that I've seen is 'because it looks like this is what I care about'. That's fine with me but my personal preference is closer to 'I want to continue existing and experiencing things'; I don't care if anything that looks like me or thinks it's me is experiencing stuff - I want me (whatever that is) to continue living and doing stuff. And so far it looks really plausible that me is the pattern which sadly leaves me to think that maybe changing the pattern is a bad idea.
I know that this line of thinking can damn you to eternal stagnation but it seems worth exploring before teleporters, uploading, big self-enhancements etc. come along which is why I am starting this discussion. Additionally, a part of the problem might be that there is some confusion about definitions going on but I'd like to see where. Furthermore, 'the difference in the pattern' seems both somehow hard to quantify and more importantly - it doesn't look like something that could have a clear cut-off as in 'if the pattern differs by more than 10% you are a different person'. At any rate, whatever that cut-off is, it still seems pretty clear that tenoke!2000 differs enough from me to be considered dead.
As an exercise at home I will leave you to think about what this whole line of thinking implies if you combine it with MWII-style quantum immortality.
Democracy and rationality
Note: This is a draft; so far, about the first half is complete. I'm posting it to Discussion for now; when it's finished, I'll move it to Main. In the mean time, I'd appreciate comments, including suggestions on style and/or format. In particular, if you think I should(n't) try to post this as a sequence of separate sections, let me know.
Summary: You want to find the truth? You want to win? You're gonna have to learn the right way to vote. Plurality voting sucks; better voting systems are built from the blocks of approval, medians (Bucklin cutoffs), delegation, and pairwise opposition. I'm working to promote these systems and I want your help.
Contents: 1. Overblown¹ rhetorical setup ... 2. Condorcet's ideals and Arrow's problem ... 3. Further issues for politics ... 4. Rating versus ranking; a solution? ... 5. Delegation and SODA ... 6. Criteria and pathologies ... 7. Representation, Proportional representation, and Sortition ... 8. What I'm doing about it and what you can ... 9. Conclusions and future directions ... 10. Appendix: voting systems table ... 11. Footnotes
1.
This is a website focused on becoming more rational. But that can't just mean getting a black belt in individual epistemic rationality. In a situation where you're not the one making the decision, that black belt is just a recipe for frustration.
Of course, there's also plenty of content here about how to interact rationally; how to argue for truth, including both hacking yourself to give in when you're wrong and hacking others to give in when they are. You can learn plenty here about Aumann's Agreement Theorem on how two rational Bayesians should never knowingly disagree.
But "two rational Bayesians" isn't a whole lot better as a model for society than "one rational Bayesian". Aspiring to be rational is well and good, but the Socratic ideal of a world tied together by two-person dialogue alone is as unrealistic as the sociopath's ideal of a world where their own voice rules alone. Society needs structures for more than two people to interact. And just as we need techniques for checking irrationality in one- and two-person contexts, we need them, perhaps all the more, in multi-person contexts.
Most of the basic individual and dialogical rationality techniques carry over. Things like noticing when you are confused, or making your opponent's arguments into a steel man, are still perfectly applicable. But there's also a new set of issues when n>2: the issues of democracy and voting. For a group of aspiring rationalists to come to a working consensus, of course they need to begin by evaluating and discussing the evidence, but eventually it will be time to cut off the discussion and just vote. When they do so, they should understand the strengths and pitfalls of voting in general and of their chosen voting method in particular.
And voting's not just useful for an aspiring rationalist community. As it happens, it's an important part of how governments are run. Discussing politics may be a mind-killer in many contexts, but there are an awful lot of domains where politics is a part of the road to winning.² Understanding voting processes a little bit can help you navigate that road; understanding them deeply opens the possibility of improving that road and thus winning more often.
2. Collective rationality: Condorcet's ideals and Arrow's problem
Imagine it's 1785, and you're a member of the French Academy of Sciences. You're rubbing elbows with most of the giants of science and mathematics of your day: Coulomb, Fourier, Lalande, Lagrange, Laplace, Lavoisier, Monge; even the odd foreign notable like Franklin with his ideas to unify electrostatics and electric flow.

One day, they'll put your names in front of lots of cameras (even though that foreign yokel Franklin will be in more pictures)
And this academy, with many of the smartest people in the world, has votes on stuff. Who will be our next president; who should edit and schedule our publications; etc. You're sure that if you all could just find the right way to do the voting, you'd get the right answer. In fact, you can easily prove that, or something like it: if a group is deciding between one right and one wrong option, and each member is independently more than 50% likely to get it right, then as the group size grows the chance of a majority vote choosing the right option goes to 1.
But somehow, there's still annoying politics getting in the way. Some people seem to win the elections simply because everyone expects them to win. So last year, the academy decided on a new election system to use, proposed by your rival, Charles de Borda, in which candidates get different points for being a voters first, second, or third choice, and the one with the most points wins. But you're convinced that this new system will lead to the opposite problem: people who win the election precisely because nobody expected them to win, by getting the points that voters strategically don't want to give to a strong rival. But when people point that possibility out to Borda, he only huffs that "my system is meant for honest men!"
So with your proof of the above intuitive, useful result about two-way elections, you try to figure out how to reduce an n-way election to the two-candidate case. Clearly, you can show that Borda's system will frequently give the wrong results from that perspective. But frustratingly, you find that there could sometimes be no right answer; that there will be no candidate who would beat all the others in one-on-one races. A crack has opened up; could it be that the collective decisions of intelligent individual rational agents could be irrational?
Of course, the "you" in this story is the Marquis de Condorcet, and the year 1785 is when he published his Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix, a work devoted to the question of how to acheive collective rationality. The theorem referenced above is Condorcet's Jury Theorem, which seems to offer hope that democracy can point the way from individually-imperfect rationality towards an ever-more-perfect collective rationality. Just as Aumann's Agreement Theorem shows that two rational agents should always move towards consensus, the Condorcet Jury Theorem apparently shows that if you have enough rational agents, the resulting consensus will be correct.
But as I said, Condorcet also opened a crack in that hope: the possibility that collective preferences will be cyclical. If the assumptions of the jury theorem don't hold — if each voter doesn't have a >50% chance of being right on a randomly-selected question, OR if the correctness of two randomly-selected voters is not independent and uncorrelated — then individually-sensible choices can lead to collectively-ridiculous ones.
What do I mean by "collectively-ridiculous"? Let's imagine that the Rationalist Marching Band is choosing the colors for their summer, winter, and spring uniforms, and that they all agree that the only goal is to have as much as possible of the best possible colors. The summer-style uniforms come in red or blue, and they vote and pick blue; the winter-style ones come in blue or green, and they pick green; and the spring ones come in green or red, and they pick red.
Obviously, this makes us doubt their collective rationality. If, as they all agree they should, they had a consistent favorite color, they should have chosen that color both times that it was available, rather than choosing three different colors in the three cases. Theoretically, the salesperson could use such a fact to pump money out of them; for instance, offering to let them "trade up" their spring uniform from red to blue, then to green, then back to red, charging them a small fee each time; if they voted consistently as above, they would agree to each trade (though of course in reality human voters would probably catch on to the trick pretty soon, so the abstract ideal of an unending circular money pump wouldn't work).
This is the kind of irrationality that Condorcet showed was possible in collective decisionmaking. He also realized that there was a related issue with logical inconsistencies. If you were take a vote on 3 logically related propositions — say, "Should we have a Minister of Silly Walks, to be appointed by the Chancellor of the Excalibur", "Should we have a Minister of Silly Walks, but not appointed by the Chancellor of the Excalibur", and "Should we in fact have a Minister of Silly Walks at all", where the third cannot be true unless one of the first is — then you could easily get majority votes for inconsistent results — in this case, no, no, and yes, respectively. Obviously, there are many ways to fix the problem in this simple case — probably many less-wrong'ers would suggest some Bayesian tricks related to logical networks and treating votes as evidence⁸ — but it's a tough problem in general even today, especially when the logical relationships can be complex, and Condorcet was quite right to be worried about its implications for collective rationality.³
And that's not the only tough problem he correctly foresaw. Nearly 200 years later and an ocean away, in the 1960s, Kenneth Arrow showed that it was impossible for a preferential voting system to avoid the problem of a "Condorcet cycle" of preferences. Arrows theorem shows that any voting system which can consistently give the same winner (or, in ties, winners) for the same voter preferences; which does not make one voter the effective dictator; which is sure to elect a candidate if all voters prefer them; and which will switch the results for two candidates if you switch their names on all the votes... must exhibit, in at least some situation, the pathology that befell the Rationalist Marching Band above, or in other words, must fail "independence of irrelevant alternatives".
Arrow's theorem is far from obvious a priori, but proof is not hard to understand intuitively using Condorcet's insight. Say that there are three candidates, X, Y, and Z, with roughly equal bases of support; and that they form a Condorcet cycle, because in two-way races, X would beat Y with help from Z supporters, Y would beat Z with help from X supporters, and Z would beat X with help from Y supporters. So whoever wins in the three-way race — say, X — just remove the one who would have lost to them — Y in this case — and that "irrelevant" change will change the winner to be the third — Z in this case.
Summary of above: Collective rationality is harder than individual or two-way rationality. Condorcet saw the problem and tried to solve it, but Arrow saw that Condorcet had been doomed to fail.
3. Further issues for politics
So Condorcet's ideals of better rationality through voting appear to be in ruins. But at least we can hope that voting is a good way to do politics, right?
Not so fast. Arrow's theorem quickly led to further disturbing results. Alan Gibbard (and also Mark Satterthwaite) extended that there is no voting system which doesn't encourage voting strategy. That is, if you view an voting system as a class of games where the finite players and finite available strategies are fixed, no player is effectively a dictator, and the only thing that varies are the payoffs for each player from each outcome, there is no voting system where you can derive your best strategic vote purely by looking "honestly" at your own preferences; there is always the possibility of situations where you have to second-guess what others will do.
Amartya Sen piled on with another depressing extension of Arrows' logic. He showed that there is no possible way of aggregating individual choices into collective choice that satisfies two simple criteria. First, it shouldn't choose pareto-dominated outcomes; if everyone prefers situation XYZ to ABC, that they don't do XYZ. Second, it is "minimally liberal"; that is, there are at least two people who each get to freely make their own decision on at least one specific issue each, no matter what, so for instance I always get to decide between X and A (in Gibbard's⁴ example, colors for my house), and you always get to decide between Y and B (colors for your own house). The problem is that if you nosily care more about my house's color, the decision that should have been mine, and I nosily care about yours, more than we each care about our own, then the pareto-dominant situation is the one where we don't decide our own houses; and that nosiness could, in theory, be the case for any specific choice that, a priori, someone might have labelled as our Inalienable Right. It's not such a surprising result when you think about it that way, but it does clearly show that unswerving ideals of Democracy and Liberty will never truly be compatible.
Meanwhile, "public choice" theorists⁵ like Duncan Black, James Buchanan, etc. were busy undermining the idea of democratic government from another direction: the motivations of the politicians and bureaucrats who are supposed to keep it running. They showed that various incentives, including the strange voting scenarios explored by Condorcet and Arrow, would tend open a gap between the motives of the people and those of the government, and that strategic voting and agenda-setting within a legislature would tend to extend the impact of that gap. Where Gibbard and Sen had proved general results, these theorists worked from specific examples. And in one aspect, at least, their analysis is devastatingly unanswerable: the near-ubiquitous "democratic" system of plurality voting, also known as first-past-the-post or vote-for-one or biggest-minority-wins, is terrible in both theory and practice.
So, by the 1980s, things looked pretty depressing for the theory of democracy. Politics, the theory went, was doomed forever to be a worse than sausage factory; disgusting on the inside and distasteful even from outside.
Should an ethical rationalist just give up on politics, then? Of course not. As long as the results it produces are important, it's worth trying to optimize. And as soon as you take the engineer's attitude of optimizing, instead of dogmatically searching for perfection or uselessly whining about the problems, the results above don't seem nearly as bad.
From this engineer's perspective, public choice theory serves as an unsurprising warning that tradeoffs are necessary, but more usefully, as a map of where those tradeoffs can go particularly wrong. In particular, its clearest lesson, in all-caps bold with a blink tag, that PLURALITY IS BAD, can be seen as a hopeful suggestion that other voting systems may be better. Meanwhile, the logic of both Sen's and Gibbard's theorems are built on Arrow's earlier result. So if we could find a way around Arrow, it might help resolve the whole issue.
Summary of above: Democracy is the worst political system... (...except for all the others?) But perhaps it doesn't have to be quite so bad as it is today.
4. Rating versus ranking
So finding a way around Arrow's theorem could be key to this whole matter. As a mathematical theorem, of course, the logic is bulletproof. But it does make one crucial assumption: that the only inputs to a voting system are rankings, that is, voters' ordinal preference orders for the candidates. No distinctions can be made using ratings or grades; that is, as long as you prefer X to Y to Z, the strength of those preferences can't matter. Whether you put Y almost up near X or way down next to Z, the result must be the same.
Relax that assumption, and it's easy to create a voting system which meets Arrow's criteria. It's called Score voting⁶, and it just means rating each candidate with a number from some fixed interval (abstractly speaking, a real number; but in practice, usually an integer); the scores are added up and the highest total or average wins. (Unless there are missing values, of course, total or average amount to the same thing.) You've probably used it yourself on Yelp, IMDB, or similar sites. And it clearly passes all of Arrow's criteria. Non-dictatorship? Check. Unanimity? Check. Symmetry over switching candidate names? Check. Independence of irrelevant alternatives? In the mathematical sense — that is, as long as the scores for other candidates are unchanged — check.
So score voting is an ideal system? Well, it's certainly a far sight better than plurality. But let's check it against Sen and against Gibbard.
Sen's theorem was based on a logic similar to Arrow. However, while Arrow's theorem deals with broad outcomes like which candidate wins, Sen's deals with finely-grained outcomes like (in the example we discussed) how each separate house should be painted. Extending the cardinal numerical logic of score voting to such finely-grained outcomes, we find we've simply reinvented markets. While markets can be great things and often work well in practice, Sen's result still holds in this case; if everything is on the market, then there is no decision which is always yours to make. But since, in practice, as long as you aren't destitute, you tend to be able to make the decisions you care the most about, Sen's theorem seems to have lost its bite in this context.
What about Gibbard's theorem on strategy? Here, things are not so easy. Yes, Gibbard, like Sen, parallels Arrow. But while Arrow deals with what's written on the ballot, Gibbard deals with what's in the voters head. In particular, if a voter prefers X to Y by even the tiniest margin, Gibbard assumes (not unreasonably) that they may be willing to vote however they need to, if by doing so they can ensure X wins instead of Y. Thus, the internal preferences Gibbard treats are, effectively, just ordinal rankings; and the cardinal trick by which score voting avoided Arrovian problems no longer works.
How does score voting deal with strategic issues in practice? The answer to that has two sides. On the one hand, score never requires voters to be actually dishonest. Unlike the situation in a ranked system such as plurality, where we all know that the strategic vote may be to dishonestly ignore your true favorite and vote for a "lesser evil" among the two frontrunners, in score voting you never need to vote a less-preferred option above a more-preferred option. At worst, all you have to do is exaggerate some distinctions and minimize others, so that you might end giving equal votes to less- and more-preferred options.
Did I say "at worst"? I meant, "almost always". Voting strategy only matters to the result when, aside from your vote, two or more candidates are within one vote of being tied for first. Except in unrealistic, perfectly-balanced conditions, as the number of voters rises, the probability that anyone but the two a priori frontrunner candidates is in on this tie falls to zero.⁷ Thus, in score voting, the optimal strategy is nearly always to vote your preferred frontrunner and all candidate above at the maximum, and your less-preferred frontrunner and all candidates below at the minimum. In other words, strategic score voting is basically equivalent to approval voting, where you give each candidate a 1 or 0 and the highest total wins.
In one sense, score voting reducing to approval OK. Approval voting is not a bad system at all. For instance, if there's a known majority Condorcet winner — a candidate who could beat any other by a majority in a one-on-one race — and voters are strategic — they anticipate the unique strong Nash equilibrium, the situation where no group of voters could improve the outcome for all its members by changing their votes, whenever such a unique equilibrium exists — then the Condorcet winner will win under approval. That's a lot of words to say that approval will get the "democratic" results you'd expect in most cases.
But in another sense, it's a problem. If one side of an issue is more inclined to be strategic than the other side, the more-strategic faction could win even if it's a minority. That clashes with many people's ideals of democracy; and worse, it encourages mind-killing political attitudes, where arguments are used as soldiers rather than as ways to seek the truth.
But score and approval voting are not the only systems which escape Arrow's theorem through the trapdoor of ratings. If score voting, using the average of voter ratings, too-strongly encourages voters to strategically seek extreme ratings, then why not use the median rating instead? We know that medians are less sensitive to outliers than averages. And indeed, median-based systems are more resistant to one-sided strategy than average-based ones, giving better hope for reasonable discussion to prosper. That is to say, in a simple model, a minority would need twice as much strategic coordination under median as under average, in order to overcome a majority; and there's good reason to believe that, because of natural factional separation, reality is even more favorable to median systems than that model.
There are several different median systems available. In the US during the 1910-1925 Progressive Era, early versions collectively called "Bucklin voting" were used briefly in over a dozen cities. These reforms, based on counting all top preferences, then adding lower preferences one level at a time until some candidate(s) reach a majority, were all rolled back soon after, principally by party machines upset at upstart challenges or victories. The possibility of multiple, simultaneous majorities is a principal reason for the variety of Bucklin/Median systems. Modern proposals of median systems include Majority Approval Voting, Majority Judgment, and Graduated Majority Judgment, which would probably give the same winners almost all of the time. An important detail is that most median system ballots use verbal or letter grades rather than numeric scores. This is justifiable because the median is preserved under any monotonic transformation, and studies suggest that it would help discourage strategic voting.
Serious attention to rated systems like approval, score, and median systems barely began in the 1980s, and didn't really pick up until 2000. Meanwhile, the increased amateur interest in voting systems in this period — perhaps partially attributable to the anomalous 2000 US presidential election, or to more-recent anomalies in the UK, Canada, and Australia — has led to new discoveries in ranked systems as well. Though such systems are still clearly subject to Arrow's theorem, new "improved Condorcet" methods which use certain tricks to count a voter's equal preferences between to candidates on either side of the ledger depending on the strategic needs, seem to offer promise that Arrovian pathologies can be kept to a minimum.
With this embarrassment of riches of systems to choose from, how should we evaluate which is best? Well, at least one thing is a clear consensus: plurality is a horrible system. Beyond that, things are more controversial; there are dozens of possible objective criteria one could formulate, and any system's inventor and/or supporters can usually formulate some criterion by which it shines.
Ideally, we'd like to measure the utility of each voting system in the real world. Since that's impossible — it would take not just a statistically-significant sample of large-scale real-world elections for each system, but also some way to measure the true internal utility of a result in situations where voters are inevitably strategically motivated to lie about that utility — we must do the next best thing, and measure it in a computer, with simulated voters whose utilities are assigned measurable values. Unfortunately, that requires assumptions about how those utilities are distributed, how voter turnout is decided, and how and whether voters strategize. At best, those assumptions can be varied, to see if findings are robust.
In 2000, Warren Smith performed such simulations for a number of voting systems. He found that score voting had, very robustly, one of the top expected social utilities (or, as he termed it, lowest Bayesian regret). Close on its heels were a median system and approval voting. Unfortunately, though he explored a wide parameter space in terms of voter utility models and inherent strategic inclination of the voters, his simulations did not include voters who were more inclined to be strategic when strategy was more effective. His strategic assumptions were also unfavorable to ranked systems, and slightly unrealistic in other ways. Still, though certain of his numbers must be taken with a grain of salt, some of his results were large and robust enough to be trusted. For instance, he found that plurality voting and instant runoff voting were clearly inferior to rated systems; and that approval voting, even at its worst, offered over half the benefits compared to plurality of any other system.
Summary of above: Rated systems, such as approval voting, score voting, and Majority Approval Voting, can avoid the problems of Arrow's theorem. Though they are certainly not immune to issues of strategic voting, they are a clear step up from plurality. Starting with this section, the opinions are my own; the two prior sections were based on general expert views on the topic.
5. Delegation and SODA
Rated systems are not the only way to try to beat the problems of Arrow and Gibbard (/Satterthwaite).
Summary of above:
6. Criteria and pathologies
do.
Summary of above:
7. Representation, proportionality, and sortition
do.
Summary of above:
8. What I'm doing about it and what you can
do.
Summary of above:
9. Conclusions and future directions
do.
Summary of above:
10. Appendix: voting systems table
Compliance of selected systems (table)
The following table shows which of the above criteria are met by several single-winner systems. Note: contains some errors; I'll carefully vet this when I'm finished with the writing. Still generally reliable though.
| Majority/ MMC | Condorcet/ Majority Condorcet | Cond. loser | Monotone | Consistency/ Participation | Reversal symmetry | IIA | Cloneproof | Polytime/ Resolvable | Summable | Equal rankings allowed | Later prefs allowed | Later-no-harm/ Later-no-help | FBC:No favorite betrayal |
||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Approval[nb 1] | Ambiguous | No/Strategic yes[nb 2] | No | Yes | Yes[nb 2] | Yes | Ambiguous | Ambig.[nb 3] | Yes | O(N) | Yes | No | [nb 4] | Yes | |
| Borda count | No | No | Yes | Yes | Yes | Yes | No | No (teaming) | Yes | O(N) | No | Yes | No | No | |
| Copeland | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | No (crowding) | Yes/No | O(N2) | Yes | Yes | No | No | |
| IRV (AV) | Yes | No | Yes | No | No | No | No | Yes | Yes | O(N!)[nb 5] | No | Yes | Yes | No | |
| Kemeny-Young | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | No (teaming) | No/Yes | O(N2)[nb 6] | Yes | Yes | No | No | |
| Majority Judgment[nb 7] | Yes[nb 8] | No/Strategic yes[nb 2] | No[nb 9] | Yes | No[nb 10] | No[nb 11] | Yes | Yes | Yes | O(N)[nb 12] | Yes | Yes | No[nb 13] | Yes | Yes |
| Minimax | Yes/No | Yes[nb 14] | No | Yes | No | No | No | No (spoilers) | Yes | O(N2) | Some variants | Yes | No[nb 14] | No | |
| Plurality | Yes/No | No | No | Yes | Yes | No | No | No (spoilers) | Yes | O(N) | No | No | [nb 4] | No | |
| Range voting[nb 1] | No | No/Strategic yes[nb 2] | No | Yes | Yes[nb 2] | Yes | Yes[nb 15] | Ambig.[nb 3] | Yes | O(N) | Yes | Yes | No | Yes | |
| Ranked pairs | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | Yes | Yes | O(N2) | Yes | Yes | No | No | |
| Runoff voting | Yes/No | No | Yes | No | No | No | No | No (spoilers) | Yes | O(N)[nb 16] | No | No[nb 17] | Yes[nb 18] | No | |
| Schulze | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | Yes | Yes | O(N2) | Yes | Yes | No | No | |
| SODA voting[nb 19] | Yes | Strategic yes/yes | Yes | Ambiguous[nb 20] | Yes/Up to 4 cand. [nb 21] | Yes[nb 22] | Up to 4 candidates[nb 21] | Up to 4 cand. (then crowds) [nb 21] | Yes[nb 23] | O(N) | Yes | Limited[nb 24] | Yes | Yes | |
| Random winner/ arbitrary winner[nb 25] |
No | No | No | NA | No | Yes | Yes | NA | Yes/No | O(1) | No | No | Yes | ||
| Random ballot[nb 26] | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes/No | O(N) | No | No | Yes | ||
"Yes/No", in a column which covers two related criteria, signifies that the given system passes the first criterion and not the second one.
- ^ a b These criteria assume that all voters vote their true preference order. This is problematic for Approval and Range, where various votes are consistent with the same order. See approval voting for compliance under various voter models.
- ^ a b c d e In Approval, Range, and Majority Judgment, if all voters have perfect information about each other's true preferences and use rational strategy, any Majority Condorcet or Majority winner will be strategically forced – that is, win in the unique Strong Nash equilibrium. In particular if every voter knows that "A or B are the two most-likely to win" and places their "approval threshold" between the two, then the Condorcet winner, if one exists and is in the set {A,B}, will always win. These systems also satisfy the majority criterion in the weaker sense that any majority can force their candidate to win, if it so desires. (However, as the Condorcet criterion is incompatible with the participation criterion and the consistency criterion, these systems cannot satisfy these criteria in this Nash-equilibrium sense. Laslier, J.-F. (2006) "Strategic approval voting in a large electorate,"IDEP Working Papers No. 405 (Marseille, France: Institut D'Economie Publique).)
- ^ a b The original independence of clones criterion applied only to ranked voting methods. (T. Nicolaus Tideman, "Independence of clones as a criterion for voting rules", Social Choice and Welfare Vol. 4, No. 3 (1987), pp. 185–206.) There is some disagreement about how to extend it to unranked methods, and this disagreement affects whether approval and range voting are considered independent of clones. If the definition of "clones" is that "every voter scores them within ±ε in the limit ε→0+", then range voting is immune to clones.
- ^ a b Approval and Plurality do not allow later preferences. Technically speaking, this means that they pass the technical definition of the LNH criteria - if later preferences or ratings are impossible, then such preferences can not help or harm. However, from the perspective of a voter, these systems do not pass these criteria. Approval, in particular, encourages the voter to give the same ballot rating to a candidate who, in another voting system, would get a later rating or ranking. Thus, for approval, the practically meaningful criterion would be not "later-no-harm" but "same-no-harm" - something neither approval nor any other system satisfies.
- ^ The number of piles that can be summed from various precincts is floor((e-1) N!) - 1.
- ^ Each prospective Kemeny-Young ordering has score equal to the sum of the pairwise entries that agree with it, and so the best ordering can be found using the pairwise matrix.
- ^ Bucklin voting, with skipped and equal-rankings allowed, meets the same criteria as Majority Judgment; in fact, Majority Judgment may be considered a form of Bucklin voting. Without allowing equal rankings, Bucklin's criteria compliance is worse; in particular, it fails Independence of Irrelevant Alternatives, which for a ranked method like this variant is incompatible with the Majority Criterion.
- ^ Majority judgment passes the rated majority criterion (a candidate rated solo-top by a majority must win). It does not pass the ranked majority criterion, which is incompatible with Independence of Irrelevant Alternatives.
- ^ Majority judgment passes the "majority condorcet loser" criterion; that is, a candidate who loses to all others by a majority cannot win. However, if some of the losses are not by a majority (including equal-rankings), the Condorcet loser can, theoretically, win in MJ, although such scenarios are rare.
- ^ Balinski and Laraki, Majority Judgment's inventors, point out that it meets a weaker criterion they call "grade consistency": if two electorates give the same rating for a candidate, then so will the combined electorate. Majority Judgment explicitly requires that ratings be expressed in a "common language", that is, that each rating have an absolute meaning. They claim that this is what makes "grade consistency" significant. MJ. Balinski M. and R. Laraki (2007) «A theory of measuring, electing and ranking». Proceedings of the National Academy of Sciences USA, vol. 104, no. 21, 8720-8725.
- ^ Majority judgment can actually pass or fail reversal symmetry depending on the rounding method used to find the median when there are even numbers of voters. For instance, in a two-candidate, two-voter race, if the ratings are converted to numbers and the two central ratings are averaged, then MJ meets reversal symmetry; but if the lower one is taken, it does not, because a candidate with ["fair","fair"] would beat a candidate with ["good","poor"] with or without reversal. However, for rounding methods which do not meet reversal symmetry, the chances of breaking it are on the order of the inverse of the number of voters; this is comparable with the probability of an exact tie in a two-candidate race, and when there's a tie, any method can break reversal symmetry.
- ^ Majority Judgment is summable at order KN, where K, the number of ranking categories, is set beforehand.
- ^ Majority judgment meets a related, weaker criterion: ranking an additional candidate below the median grade (rather than your own grade) of your favorite candidate, cannot harm your favorite.
- ^ a b A variant of Minimax that counts only pairwise opposition, not opposition minus support, fails the Condorcet criterion and meets later-no-harm.
- ^ Range satisfies the mathematical definition of IIA, that is, if each voter scores each candidate independently of which other candidates are in the race. However, since a given range score has no agreed-upon meaning, it is thought that most voters would either "normalize" or exaggerate their vote such that it votes at least one candidate each at the top and bottom possible ratings. In this case, Range would not be independent of irrelevant alternatives. Balinski M. and R. Laraki (2007) «A theory of measuring, electing and ranking». Proceedings of the National Academy of Sciences USA, vol. 104, no. 21, 8720-8725.
- ^ Once for each round.
- ^ Later preferences are only possible between the two candidates who make it to the second round.
- ^ That is, second-round votes cannot harm candidates already eliminated.
- ^ Unless otherwise noted, for SODA's compliances:
- Delegated votes are considered to be equivalent to voting the candidate's predeclared preferences.
- Ballots only are considered (In other words, voters are assumed not to have preferences that cannot be expressed by a delegated or approval vote.)
- Since at the time of assigning approvals on delegated votes there is always enough information to find an optimum strategy, candidates are assumed to use such a strategy.
- ^ For up to 4 candidates, SODA is monotonic. For more than 4 candidates, it is monotonic for adding an approval, for changing from an approval to a delegation ballot, and for changes in a candidate's preferences. However, if changes in a voter's preferences are executed as changes from a delegation to an approval ballot, such changes are not necessarily monotonic with more than 4 candidates.
- ^ a b c For up to 4 candidates, SODA meets the Participation, IIA, and Cloneproof criteria. It can fail these criteria in certain rare cases with more than 4 candidates. This is considered here as a qualified success for the Consistency and Participation criteria, which do not intrinsically have to do with numerous candidates, and as a qualified failure for the IIA and Cloneproof criteria, which do.
- ^ SODA voting passes reversal symmetry for all scenarios that are reversible under SODA; that is, if each delegated ballot has a unique last choice. In other situations, it is not clear what it would mean to reverse the ballots, but there is always some possible interpretation under which SODA would pass the criterion.
- ^ SODA voting is always polytime computable. There are some cases where the optimal strategy for a candidate assigning delegated votes may not be polytime computable; however, such cases are entirely implausible for a real-world election.
- ^ Later preferences are only possible through delegation, that is, if they agree with the predeclared preferences of the favorite.
- ^ Random winner: Uniformly randomly chosen candidate is winner. Arbitrary winner: some external entity, not a voter, chooses the winner. These systems are not, properly speaking, voting systems at all, but are included to show that even a horrible system can still pass some of the criteria.
- ^ Random ballot: Uniformly random-chosen ballot determines winner. This and closely related systems are of mathematical interest because they are the only possible systems which are truly strategy-free, that is, your best vote will never depend on anything about the other voters. They also satisfy both consistency and IIA, which is impossible for a deterministic ranked system. However, this system is not generally considered as a serious proposal for a practical method.
11. Footnotes
¹ When I call my introduction "overblown", I mean that I reserve the right to make broad generalizations there, without getting distracted by caveats. If you don't like this style, feel free to skip to section 2.
² Of course, the original "politics is a mind killer" sequence was perfectly clear about this: "Politics is an important domain to which we should individually apply our rationality—but it's a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational." The focus here is on the first part of that quote, because I think Less Wrong as a whole has moved too far in the direction of avoiding politics as not a domain for rationalists.
³ Bayes developed his theorem decades before Condorcet's Essai, but Condorcet probably didn't know of it, as it wasn't popularized by Laplace until about 30 years later, after Condorcet was dead.
⁴ Yes, this happens to be the same Alan Gibbard from the previous paragraph.
⁵ Confusingly, "public choice" refers to a school of thought, while "social choice" is the name for the broader domain of study. Stop reading this footnote now if you don't want to hear mind-killing partisan identification. "Public choice" theorists are generally seen as politically conservative in the solutions they suggest. It seems to me that the broader "social choice" has avoided taking on a partisan connotation in this sense.
⁶ Score voting is also called "range voting" by some. It is not a particularly new idea — for instance, the "loudest cheer wins" rule of ancient Sparta, and even aspects of honeybees' process for choosing new hives, can be seen as score voting — but it was first analyzed theoretically around 2000. Approval voting, which can be seen as a form of score voting where the scores are restricted to 0 and 1, had entered theory only about two decades earlier, though it too has a history of practical use back to antiquity.
⁷ OK, fine, this is a simplification. As a voter, you have imperfect information about the true level of support and propensity to vote in the superpopulation of eligible voters, so in reality the chances of a decisive tie between other than your two expected frontrunners is non-zero. Still, in most cases, it's utterly negligible.
⁸ This article will focus more on the literature on multi-player strategic voting (competing boundedly-instrumentally-rational agents) than on multi-player Aumann (cooperating boundedly-epistemically-rational agents). If you're interested in the latter, here are some starting points: Scott Aaronson's work is, as far as I know, the state of the art on 2-player Aumann, but its framework assumes that the players have a sophisticated ability to empathize and reason about each others' internal knowledge, and the problems with this that Aaronson plausibly handwaves away in the 2-player case are probably less tractable in the multi-player one. Dalkiran et al deal with an Aumann-like problem over a social network; they find that attempts to "jump ahead" to a final consensus value instead of simply dumbly approaching it asymptotically can lead to failure to converge. And Kanoria et al have perhaps the most interesting result from the perspective of this article; they use the convergence of agents using a naive voting-based algorithm to give a nice upper bound on the difficulty of full Bayesian reasoning itself. None of these papers explicitly considers the problem of coming to consensus on more than one logically-related question at once, though Aaronson's work at least would clearly be easy to extend in that direction, and I think such extensions would be unsurprisingly Bayesian.
Dark Arts 101: Winning via destruction and dualism
Recalling first that life is a zero-sum game, it is immediately obvious that the quickest and easiest path to success is not to accomplish things yourself—that's a game for heroes and other suckers—but to tear down the accomplishments and reputations of others. Destruction is easy. The difficulty lies in constructing a situation so that that destruction is to your net benefit.
[LINK] Cantor's theorem, the prisoner's dilemma, and the halting problem
I wouldn't normally link to a post from my math blog here, but it concerns a cute interpretation of Cantor's theorem that showed up when I was thinking about program equilibria at the April MIRI workshop, so I thought it might be of interest here (e.g. if you're trying to persuade a mathematically inclined friend of yours to attend a future workshop). A short proof of the undecidability of the halting problem falls out as a bonus.
Why one-box?
I have sympathy with both one-boxers and two-boxers in Newcomb's problem. Contrary to this, however, many people on Less Wrong seem to be staunch and confident one-boxers. So I'm turning to you guys to ask for help figuring out whether I should be a staunch one-boxer too. Below is an imaginary dialogue setting out my understanding of the arguments normally advanced on LW for one-boxing and I was hoping to get help filling in the details and extending this argument so that I (and anyone else who is uncertain about the issue) can develop an understanding of the strongest arguments for one-boxing.
Three more ways identity can be a curse
The Buddhists believe that one of the three keys to attaining true happiness is dissolving the illusion of the self. (The other two are dissolving the illusion of permanence, and ceasing the desire that leads to suffering.) I'm not really sure exactly what it means to say "the self is an illusion", and I'm not exactly sure how that will lead to enlightenment, but I do think one can easily take the first step on this long journey to happiness by beginning to dissolve the sense of one's identity.
Previously, in "Keep Your Identity Small", Paul Graham showed how a strong sense of identity can lead to epistemic irrationally, when someone refuses to accept evidence against x because "someone who believes x" is part of his or her identity. And in Kaj Sotala's "The Curse of Identity", he illustrated a human tendency to reinterpret a goal of "do x" as "give the impression of being someone who does x". These are both fantastic posts, and you should read them if you haven't already.
Here are three more ways in which identity can be a curse.
1. Don't be afraid to change
James March, professor of political science at Stanford University, says that when people make choices, they tend to use one of two basic models of decision making: the consequences model, or the identity model. In the consequences model, we weigh the costs and benefits of our options and make the choice that maximizes our satisfaction. In the identity model, we ask ourselves "What would a person like me do in this situation?"1
The author of the book I read this in didn't seem to take the obvious next step and acknowledge that the consequences model is clearly The Correct Way to Make Decisions and basically by definition, if you're using the identity model and it's giving you a different result then the consequences model would, you're being led astray. A heuristic I like to use is to limit my identity to the "observer" part of my brain, and make my only goal maximizing the amount of happiness and pleasure the observer experiences, and minimizing the amount of misfortune and pain. It sounds obvious when you lay it out in these terms, but let me give an example.
Alice is a incoming freshman in college trying to choose her major. In Hypothetical University, there are only two majors: English, and business. Alice absolutely adores literature, and thinks business is dreadfully boring. Becoming an English major would allow her to have a career working with something she's passionate about, which is worth 2 megautilons to her, but it would also make her poor (0 mu). Becoming a business major would mean working in a field she is not passionate about (0 mu), but it would also make her rich, which is worth 1 megautilon. So English, with 2 mu, wins out over business, with 1 mu.
However, Alice is very bright, and is the type of person who can adapt herself to many situations and learn skills quickly. If Alice were to spend the first six months of college deeply immersing herself in studying business, she would probably start developing a passion for business. If she purposefully exposed herself to certain pro-business memeplexes (e.g. watched a movie glamorizing the life of Wall Street bankers), then she could speed up this process even further. After a few years of taking business classes, she would probably begin to forget what about English literature was so appealing to her, and be extremely grateful that she made the decision she did. Therefore she would gain the same 2 mu from having a job she is passionate about, along with an additional 1 mu from being rich, meaning that the 3 mu choice of business wins out over the 2 mu choice of English.
However, the possibility of self-modifying to becoming someone who finds English literature boring and business interesting is very disturbing to Alice. She sees it as a betrayal of everything that she is, even though she's actually only been interested in English literature for a few years. Perhaps she thinks of choosing business as "selling out" or "giving in". Therefore she decides to major in English, and takes the 2 mu choice instead of the superior 3 mu.
(Obviously this is a hypothetical example/oversimplification and there are a lot of reasons why it might be rational to pursue a career path that doesn't make very much money.)
It seems to me like human beings have a bizarre tendency to want to keep certain attributes and character traits stagnant, even when doing so provides no advantage, or is actively harmful. In a world where business-passionate people systematically do better than English-passionate people, it makes sense to self-modify to become business-passionate. Yet this is often distasteful.
For example, until a few weeks ago when I started solidifying this thinking pattern, I had an extremely adverse reaction to the idea of ceasing to be a hip-hop fan and becoming a fan of more "sophisticated" musical genres like jazz and classical, eventually coming to look down on the music I currently listen to as primitive or silly. This doesn't really make sense - I'm sure if I were to become a jazz and classical fan I would enjoy those genres at least as much as I currently enjoy hip hop. And yet I had a very strong preference to remain the same, even in the trivial realm of music taste.
Probably the most extreme example is the common tendency for depressed people to not actually want to get better, because depression has become such a core part of their identity that the idea of becoming a healthy, happy person is disturbing to them. (I used to struggle with this myself, in fact.) Being depressed is probably the most obviously harmful characteristic that someone can have, and yet many people resist self-modification.
Of course, the obvious objection is there's no way to rationally object to people's preferences - if someone truly prioritizes keeping their identity stagnant over not being depressed then there's no way to tell them they're wrong, just like if someone prioritizes paperclips over happiness there's no way to tell them they're wrong. But if you're like me, and you are interested in being happy, then I recommend looking out for this cognitive bias.
The other objection is that this philosophy leads to extremely unsavory wireheading-esque scenarios if you take it to its logical conclusion. But holding the opposite belief - that it's always more important to keep your characteristics stagnant than to be happy - clearly leads to even more absurd conclusions. So there is probably some point on the spectrum where change is so distasteful that it's not worth a boost in happiness (e.g. a lobotomy or something similar). However, I think that in actual practical pre-Singularity life, most people set this point far, far too low.
2. The hidden meaning of "be yourself"
(This section is entirely my own speculation, so take it as you will.)
"Be yourself" is probably the most widely-repeated piece of social skills advice despite being pretty clearly useless - if it worked then no one would be socially awkward, because everyone has heard this advice.
However, there must be some sort of core grain of truth in this statement, or else it wouldn't be so widely repeated. I think that core grain is basically the point I just made, applied to social interaction. I.e, optimize always for social success and positive relationships (particularly in the moment), and not for signalling a certain identity.
The ostensible purpose of identity/signalling is to appear to be a certain type of person, so that people will like and respect you, which is in turn so that people will want to be around you and be more likely to do stuff for you. However, oftentimes this goes horribly wrong, and people become very devoted to cultivating certain identities that are actively harmful for this purpose, e.g. goth, juggalo, "cool reserved aloof loner", guy that won't shut up about politics, etc. A more subtle example is Fred, who holds the wall and refuses to dance at a nightclub because he is a serious, dignified sort of guy, and doesn't want to look silly. However, the reason why "looking silly" is generally a bad thing is because it makes people lose respect for you, and therefore make them less likely to associate with you. In the situation Fred is in, holding the wall and looking serious will cause no one to associate with him, but if he dances and mingles with strangers and looks silly, people will be likely to associate with him. So unless he's afraid of looking silly in the eyes of God, this seems to be irrational.
Probably more common is the tendency to go to great care to cultivate identities that are neither harmful nor beneficial. E.g. "deep philosophical thinker", "Grateful Dead fan", "tough guy", "nature lover", "rationalist", etc. Boring Bob is a guy who wears a blue polo shirt and khakis every day, works as hard as expected but no harder in his job as an accountant, holds no political views, and when he goes home he relaxes by watching whatever's on TV and reading the paper. Boring Bob would probably improve his chances of social success by cultivating a more interesting identity, perhaps by changing his wardrobe, hobbies, and viewpoints, and then liberally signalling this new identity. However, most of us are not Boring Bob, and a much better social success strategy for most of us is probably to smile more, improve our posture and body language, be more open and accepting of other people, learn how to make better small talk, etc. But most people fail to realize this and instead play elaborate signalling games in order to improve their status, sometimes even at the expense of lots of time and money.
Some ways by which people can fail to "be themselves" in individual social interactions: liberally sprinkle references to certain attributes that they want to emphasize, say nonsensical and surreal things in order to seem quirky, be afraid to give obvious responses to questions in order to seem more interesting, insert forced "cool" actions into their mannerisms, act underwhelmed by what the other person is saying in order to seem jaded and superior, etc. Whereas someone who is "being herself" is more interested in creating rapport with the other person than giving off a certain impression of herself.
Additionally, optimizing for a particular identity might not only be counterproductive - it might actually be a quick way to get people to despise you.
I used to not understand why certain "types" of people, such as "hipsters"2 or Ed Hardy and Affliction-wearing "douchebags" are so universally loathed (especially on the internet). Yes, these people are adopting certain styles in order to be cool and interesting, but isn't everyone doing the same? No one looks through their wardrobe and says "hmm, I'll wear this sweater because it makes me uncool, and it'll make people not like me". Perhaps hipsters and Ed Hardy Guys fail in their mission to be cool, but should we really hate them for this? If being a hipster was cool two years ago, and being someone who wears normal clothes, acts normal, and doesn't do anything "ironically" is cool today, then we're really just hating people for failing to keep up with the trends. And if being a hipster actually is cool, then, well, who can fault them for choosing to be one?
That was my old thought process. Now it is clear to me that what makes hipsters and Ed Hardy Guys hated is that they aren't "being themselves" - they are much more interested in cultivating an identity of interestingness and masculinity, respectively, than connecting with other people. The same thing goes for pretty much every other collectively hated stereotype I can think of3 - people who loudly express political opinions, stoners who won't stop talking about smoking weed, attention seeking teenage girls on facebook, extremely flamboyantly gay guys, "weeaboos", hippies and new age types, 2005 "emo kids", overly politically correct people, tumblr SJA weirdos who identify as otherkin and whatnot, overly patriotic "rednecks", the list goes on and on.
This also clears up a confusion that occurred to me when reading How to Win Friends and Influence People. I know people who have a Dale Carnegie mindset of being optimistic and nice to everyone they meet and are adored for it, but I also know people who have the same attitude and yet are considered irritatingly saccharine and would probably do better to "keep it real" a little. So what's the difference? I think the difference is that the former group are genuinely interested in being nice to people and building rapport, while members of the second group have made an error like the one described in Kaj Sotala's post and are merely trying to give off the impression of being a nice and friendly person. The distinction is obviously very subtle, but it's one that humans are apparently very good at perceiving.
I'm not exactly sure what it is that causes humans to have this tendency of hating people who are clearly optimizing for identity - it's not as if they harm anyone. It probably has to do with tribal status. But what is clear is that you should definitely not be one of them.
3. The worst mistake you can possibly make in combating akrasia
The main thesis of PJ Eby's Thinking Things Done is that the primary reason why people are incapable of being productive is that they use negative motivation ("if I don't do x, some negative y will happen") as opposed to positive motivation ("if i do x, some positive y will happen"). He has the following evo-psych explanation for this: in the ancestral environment, personal failure meant that you could possibly be kicked out of your tribe, which would be fatal. A lot of depressed people make statements like "I'm worthless", or "I'm scum" or "No one could ever love me", which are illogically dramatic and overly black and white, until you realize that these statements are merely interpretations of a feeling of "I'm about to get kicked out of the tribe, and therefore die." Animals have a freezing response to imminent death, so if you are fearing failure you will go into do-nothing mode and not be able to work at all.4
In Succeed: How We Can Reach Our Goals, Phd psychologist Heidi Halvorson takes a different view and describes positive motivation and negative motivation as having pros and cons. However, she has her own dichotomy of Good Motivation and Bad Motivation: "Be good" goals are performance goals, and are directed at achieving a particular outcome, like getting an A on a test, reaching a sales target, getting your attractive neighbor to go out with you, or getting into law school. They are very often tied closely to a sense of self-worth. "Get better" goals are mastery goals, and people who pick these goals judge themselves instead in terms of the progress they are making, asking questions like "Am I improving? Am I learning? Am I moving forward at a good pace?" Halvorson argues that "get better" goals are almost always drastically better than "be good" goals5. An example quote (from page 60) is:
When my goal is to get an A in a class and prove that I'm smart, and I take the first exam and I don't get an A... well, then I really can't help but think that maybe I'm not so smart, right? Concluding "maybe I'm not smart" has several consequences and none of them are good. First, I'm going to feel terrible - probably anxious and depressed, possibly embarrassed or ashamed. My sense of self-worth and self-esteem are going to suffer. My confidence will be shaken, if not completely shattered. And if I'm not smart enough, there's really no point in continuing to try to do well, so I'll probably just give up and not bother working so hard on the remaining exams.
And finally, in Feeling Good: The New Mood Therapy, David Burns describes a destructive side effect of depression he calls "do-nothingism":
One of the most destructive aspects of depression is the way it paralyzes your willpower. In its mildest form you may simply procrastinate about doing a few odious chores. As your lack of motivation increases, virtually any activity appears so difficult that you become overwhelmed by the urge to do nothing. Because you accomplish very little, you feel worse and worse. Not only do you cut yourself off from your normal sources of stimulation and pleasure, but your lack of productivity aggravates your self-hatred, resulting in further isolation and incapacitation.
Synthesizing these three pieces of information leads me to believe that the worst thing you can possibly do for your akrasia is to tie your success and productivity to your sense of identity/self-worth, especially if you're using negative motivation to do so, and especially if you suffer or have recently suffered from depression or low-self esteem. The thought of having a negative self-image is scary and unpleasant, perhaps for the evo-psych reasons PJ Eby outlines. If you tie your productivity to your fear of a negative self-image, working will become scary and unpleasant as well, and you won't want to do it.
I feel like this might be the single number one reason why people are akratic. It might be a little premature to say that, and I might be biased by how large of a factor this mistake was in my own akrasia. But unfortunately, this trap seems like a very easy one to fall into. If you're someone who is lazy and isn't accomplishing much in life, perhaps depressed, then it makes intuitive sense to motivate yourself by saying "Come on, self! Do you want to be a useless failure in life? No? Well get going then!" But doing so will accomplish the exact opposite and make you feel miserable.
So there you have it. In addition to making you a bad rationalist and causing you to lose sight of your goals, a strong sense of identity will cause you to make poor decisions that lead to unhappiness, be unpopular, and be unsuccessful. I think the Buddhists were onto something with this one, personally, and I try to limit my sense of identity as much as possible. A trick you can use in addition to the "be the observer" trick I mentioned, is to whenever you find yourself thinking in identity terms, swap out that identity for the identity of "person who takes over the world by transcending the need for a sense of identity".
This is my first LessWrong discussion post, so constructive criticism is greatly appreciated. Was this informative? Or was what I said obvious, and I'm retreading old ground? Was this well written? Should this have been posted to Main? Should this not have been posted at all? Thank you.
1. Paraphrased from page 153 of Switch: How to Change When Change is Hard
2. Actually, while it works for this example, I think the stereotypical "hipster" is a bizarre caricature that doesn't match anyone who actually exists in real life, and the degree to which people will rabidly espouse hatred for this stereotypical figure (or used to two or three years ago) is one of the most bizarre tendencies people have.
3. Other than groups that arguably hurt people (religious fundamentalists, PUAs), the only exception I can think of is frat boy/jock types. They talk about drinking and partying a lot, sure, but not really any more than people who drink and party a lot would be expected to. Possibilities for their hated status include that they do in fact engage in obnoxious signalling and I'm not aware of it, jealousy, or stigmatization as hazers and date rapists. Also, a lot of people hate stereotypical "ghetto" black people who sag their jeans and notoriously type in a broken, difficult-to-read form of English. This could either be a weak example of the trend (I'm not really sure what it is they would be signalling, maybe dangerous-ness?), or just a manifestation of racism.
4. I'm not sure if this is valid science that he pulled from some other source, or if he just made this up.
Population Ethics Shouldn't Be About Maximizing Utility
let me suggest a moral axiom with apparently very strong intuitive support, no matter what your concept of morality: morality should exist. That is, there should exist creatures who know what is moral, and who act on that. So if your moral theory implies that in ordinary circumstances moral creatures should exterminate themselves, leaving only immoral creatures, or no creatures at all, well that seems a sufficient reductio to solidly reject your moral theory.
I agree strongly with the above quote, and I think most other readers will as well. It is good for moral beings to exist and a world with beings who value morality is almost always better than one where they do not. I would like to restate this more precisely as the following axiom: A population in which moral beings exist and have net positive utility, and in which all other creatures in existence also have net positive utility, is always better than a population where moral beings do not exist.
While the axiom that morality should exist is extremely obvious to most people, there is one strangely popular ethical system that rejects it: total utilitarianism. In this essay I will argue that Total Utilitarianism leads to what I will call the Genocidal Conclusion, which is that there are many situations in which it would be fantastically good for moral creatures to either exterminate themselves, or greatly limit their utility and reproduction in favor of the utility and reproduction of immoral creatures. I will argue that the main reason consequentialist theories of population ethics produce such obviously absurd conclusions is that they continue to focus on maximizing utility1 in situations where it is possible to create new creatures. I will argue that pure utility maximization is only a valid ethical theory for "special case" scenarios where the population is static. I will propose an alternative theory for population ethics I call "ideal consequentialism" or "ideal utilitarianism" which avoids the Genocidal Conclusion and may also avoid the more famous Repugnant Conclusion.
I will begin my argument by pointing to a common problem in population ethics known as the Mere Addition Paradox (MAP) and the Repugnant Conclusion. Most Less Wrong readers will already be familiar with this problem, so I do not think I need to elaborate on it. You may also be familiar with a even stronger variation called the Benign Addition Paradox (BAP). This is essentially the same as the MAP, except that each time one adds more people one also gives a small amount of additional utility to the people who already existed. One then proceeds to redistribute utility between people as normal, eventually arriving at the huge population where everyone's lives are "barely worth living." The point of this is to argue that the Repugnant Conclusion can be arrived at from "mere addition" of new people that not only doesn't harm the preexisting-people, but also one that benefits them.
The next step of my argument involves three slightly tweaked versions of the Benign Addition Paradox. I have not changed the basic logic of the problem, I have just added one small clarifying detail. In the original MAP and BAP it was not specified what sort of values the added individuals in population A+ held. Presumably one was meant to assume that they were ordinary human beings. In the versions of the BAP I am about to present, however, I will specify that the extra individuals added in A+ are not moral creatures, that if they have values at all they are values indifferent to, or opposed to, morality and the other values that the human race holds dear.
1. The Benign Addition Paradox with Paperclip Maximizers.
Let us imagine, as usual, a population, A, which has a large group of human beings living lives of very high utility. Let us then add a new population consisting of paperclip maximizers, each of whom is living a life barely worth living. Presumably, for a paperclip maximizer, this would be a life where the paperclip maximizer's existence results in at least one more paperclip in the world than there would have been otherwise.
Now, one might object that if one creates a paperclip maximizer, and then allows it to create one paperclip, the utility of the other paperclip maximizers will increase above the "barely worth living" level, which would obviously make this thought experiment nonalagous with the original MAP and BAP. To prevent this we will assume that each paperclip maximizer that is created has a slightly different values on what the ideal size, color, and composition of the paperclip they are trying to produce is. So the Purple 2 centimeter Plastic Paperclip Maximizer gains no addition utility from when the Silver Iron 1 centimeter Paperclip Maximizer makes a paperclip.
So again, let us add these paperclip maximizers to population A, and in the process give one extra utilon of utility to each preexisting person in A. This is a good thing, right? After all, everyone in A benefited, and the paperclippers get to exist and make paperclips. So clearly A+, the new population, is better than A.
Now let's take the next step, the transition from population A+ to population B. Take some of the utility from the human beings and convert it into paperclips. This is a good thing, right?
So let us repeat these steps adding paperclip maximizers and utility, and then redistributing utility. Eventually we reach population Z, where there is a vast amount of paperclip maximizers, a vast amount of many different kinds of paperclips, and a small amount of human beings living lives barely worth living.
Obviously Z is better than A, right? We should not fear the creation of a paperclip maximizing AI, but welcome it! Forget about things like high challenge, love, interpersonal entanglement, complex fun, and so on! Those things just don't produce the kind of utility that paperclip maximization has the potential to do!
Or maybe there is something seriously wrong with the moral assumptions behind the Mere Addition and Benign Addition Paradoxes.
But you might argue that I am using an unrealistic example. Creatures like Paperclip Maximizers may be so far removed from normal human experience that we have trouble thinking about them properly. So let's replay the Benign Addition Paradox again, but with creatures we might actually expect to meet in real life, and we know we actually value.
2. The Benign Addition Paradox with Non-Sapient Animals
You know the drill by now. Take population A, add a new population to it, while very slightly increasing the utility of the original population. This time let's have it be some kind animal that is capable of feeling pleasure and pain, but is not capable of modeling possible alternative futures and choosing between them (in other words, it is not capable of having "values" or being "moral"). A lizard or a mouse, for example. Each one feels slightly more pleasure than pain in its lifetime, so it can be said to have a life barely worth living. Convert A+ to B. Take the utilons that the human beings are using to experience things like curiosity, beatitude, wisdom, beauty, harmony, morality, and so on, and convert it into pleasure for the animals.
We end up with population Z, with a vast amount of mice or lizards with lives just barely worth living, and a small amount of human beings with lives barely worth living. Terrific! Why do we bother creating humans at all! Let's just create tons of mice and inject them full of heroin! It's a much more efficient way to generate utility!
3. The Benign Addition Paradox with Sociopaths
What new population will we add to A this time? How about some other human beings, who all have anti-social personality disorder? True, they lack the key, crucial value of sympathy that defines so much of human behavior. But they don't seem to miss it. And their lives are barely worth living, so obviously A+ has greater utility than A. If given a chance the sociopaths will reduce the utility of other people to negative levels, but let's assume that that is somehow prevented in this case.
Eventually we get to Z, with a vast population of sociopaths and a small population of normal human beings, all living lives just barely worth living. That has more utility, right? True, the sociopaths place no value on things like friendship, love, compassion, empathy, and so on. And true, the sociopaths are immoral beings who do not care in the slightest about right and wrong. But what does that matter? Utility is being maximized, and surely that is what population ethics is all about!
Asteroid!
Let's suppose an asteroid is approaching each of the four population Zs discussed before. It can only be deflected by so much. Your choice is, save the original population of humans from A, or save the vast new population. The choice is obvious. In 1, 2, and 3, each individual has the same level utility, so obviously we should choose which option saves a greater number of individuals.
Bam! The asteroid strikes. The end result in all four scenarios is a world in which all the moral creatures are destroyed. It is a world without the many complex values that human beings possess. Each world, for the most part, lack things like complex challenge, imagination, friendship, empathy, love, and the other complex values that human beings prize. But so what? The purpose of population ethics is to maximize utility, not silly, frivolous things like morality, or the other complex values of the human race. That means that any form of utility that is easier to produce than those values is obviously superior. It's easier to make pleasure and paperclips than it is to make eudaemonia, so that's the form of utility that ought to be maximized, right? And as for making sure moral beings exist, well that's just ridiculous. The valuable processing power they're using to care about morality could be being used to make more paperclips or more mice injected with heroin! Obviously it would be better if they died off, right?
I'm going to go out on a limb and say "Wrong."
Is this realistic?
Now, to fair, in the Overcoming Bias page I quoted, Robin Hanson also says:
I’m not saying I can’t imagine any possible circumstances where moral creatures shouldn’t die off, but I am saying that those are not ordinary circumstances.
Maybe the scenarios I am proposing are just too extraordinary. But I don't think this is the case. I imagine that the circumstances Robin had in mind were probably something like "either all moral creatures die off, or all moral creatures are tortured 24/7 for all eternity."
Any purely utility-maximizing theory of population ethics that counts both the complex values of human beings, and the pleasure of animals, as "utility" should inevitably draw the conclusion that human beings ought to limit their reproduction to the bare minimum necessary to maintain the infrastructure to sustain a vastly huge population of non-human animals (preferably animals dosed with some sort of pleasure-causing drug). And if some way is found to maintain that infrastructure automatically, without the need for human beings, then the logical conclusion is that human beings are a waste of resources (as are chimps, gorillas, dolphins, and any other animal that is even remotely capable of having values or morality). Furthermore, even if the human race cannot practically be replaced with automated infrastructure, this should be an end result that the adherents of this theory should be yearning for.2 There should be much wailing and gnashing of teeth among moral philosophers that exterminating the human race is impractical, and much hope that someday in the future it will not be.
I call this the "Genocidal Conclusion" or "GC." On the macro level the GC manifests as the idea that the human race ought to be exterminated and replaced with creatures whose preferences are easier to satisfy. On the micro level it manifests as the idea that it is perfectly acceptable to kill someone who is destined to live a perfectly good and worthwhile life and replace them with another person who would have a slightly higher level of utility.
Population Ethics isn't About Maximizing Utility
I am going to make a rather radical proposal. I am going to argue that the consequentialist's favorite maxim, "maximize utility," only applies to scenarios where creating new people or creatures is off the table. I think we need an entirely different ethical framework to describe what ought to be done when it is possible to create new people. I am not by any means saying that "which option would result in more utility" is never a morally relevant consideration when deciding to create a new person, but I definitely think it is not the only one.3
So what do I propose as a replacement to utility maximization? I would argue in favor of a system that promotes a wide range of ideals. Doing some research, I discovered that G. E. Moore had in fact proposed a form of "ideal utilitarianism" in the early 20th century.4 However, I think that "ideal consequentialism" might be a better term for this system, since it isn't just about aggregating utility functions.
What are some of the ideals that an ideal consequentialist theory of population ethics might seek to promote? I've already hinted at what I think they are: Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom... mutual affection, love, friendship, cooperation; all those other important human universals, plus all the stuff in the Fun Theory Sequence. When considering what sort of creatures to create we ought to create creatures that value those things. Not necessarily, all of them, or in the same proportions, for diversity is an important ideal as well, but they should value a great many of those ideals.
Now, lest you worry that this theory has any totalitarian implications, let me make it clear that I am not saying we should force these values on creatures that do not share them. Forcing a paperclip maximizer to pretend to make friends and love people does not do anything to promote the ideals of Friendship and Love. Forcing a chimpanzee to listen while you read the Sequences to it does not promote the values of Truth and Knowledge. Those ideals require both a subjective and objective component. The only way to promote those ideals is to create a creature that includes them as part of its utility function and then help it maximize its utility.
I am also certainly not saying that there is never any value in creating a creature that does not possess these values. There are obviously many circumstances where it is good to create nonhuman animals. There may even be some circumstances where a paperclip maximizer could be of value. My argument is simply that it is most important to make sure that creatures who value these various ideals exist.
I am also not suggesting that it is morally acceptable to casually inflict horrible harms upon a creature with non-human values if we screw up and create one by accident. If promoting ideals and maximizing utility are separate values then it may be that once we have created such a creature we have a duty to make sure it lives a good life, even if it was a bad thing to create it in the first place. You can't unbirth a child.5
It also seems to me that in addition to having ideals about what sort of creatures should exist, we also have ideals about how utility ought to be concentrated. If this is the case then ideal consequentialism may be able to block some forms of the Repugnant Conclusion, even if situations where the only creatures whose creation is being considered are human beings. If it is acceptable to create humans instead of paperclippers, even if the paperclippers would have higher utility, it may also be acceptable to create ten humans with a utility of ten each instead of a hundred humans with a utility of 1.01 each.
Why Did We Become Convinced that Maximizing Utility was the Sole Good?
Population ethics was, until comparatively recently, a fallow field in ethics. And in situations where there is no option to increase the population, maximizing utility is the only consideration that's really relevant. If you've created creatures that value the right ideals, then all that is left to be done is to maximize their utility. If you've created creatures that do not value the right ideals, there is no value to be had in attempting to force them to embrace those ideals. As I've said before, you will not promote the values of Love and Friendship by creating a paperclip maximizer and forcing it to pretend to love people and make friends.
So in situations where the population is constant, "maximize utility" is a decent approximation of the meaning of right. It's only when the population can be added to that morality becomes much more complicated.
Another thing to blame is human-centric reasoning. When people defend the Repugnant Conclusion they tend to point out that a life barely worth living is not as bad as it would seem at first glance. They emphasize that it need not be a boring life, it may be a life full of ups and downs where the ups just barely outweigh the downs. A life worth living, they say, is a life one would choose to live. Derek Parfit developed this idea to some extent by arguing that there are certain values that are "discontinuous" and that one needs to experience many of them in order to truly have a life worth living.
The Orthogonality Thesis throws all these arguments out the window. It is possible to create an intelligence to execute any utility function, no matter what it is. If human beings have all sorts of complex needs that must be fulfilled in order to for them lead worthwhile lives, then you could create more worthwhile lives by killing the human race and replacing them with something less finicky. Maybe happy cows. Maybe paperclip maximizers. Or how about some creature whose only desire is to live for one second and then die. If we created such a creature and then killed it we would reap huge amounts of utility, for we would have created a creature that got everything it wanted out of life!
How Intuitive is the Mere Addition Principle, Really?
I think most people would agree that morality should exist, and that therefore any system of population ethics should not lead to the Genocidal Conclusion. But which step in the Benign Addition Paradox should we reject? We could reject the step where utility is redistributed. But that seems wrong, most people seem to consider it bad for animals and sociopaths to suffer, and that it is acceptable to inflict at least some amount of disutilities on human beings to prevent such suffering.
It seems more logical to reject the Mere Addition Principle. In other words, maybe we ought to reject the idea that the mere addition of more lives-worth-living cannot make the world worse. And in turn, we should probably also reject the Benign Addition Principle. Adding more lives-worth-living may be capable of making the world worse, even if doing so also slightly benefits existing people. Fortunately this isn't a very hard principle to reject. While many moral philosophers treat it as obviously correct, nearly everyone else rejects this principle in day-to-day life.
Now, I'm obviously not saying that people's behavior in their day-to-day lives is always good, it may be that they are morally mistaken. But I think the fact that so many people seem to implicitly reject it provides some sort of evidence against it.
Take people's decision to have children. Many people choose to have fewer children than they otherwise would because they do not believe they will be able to adequately care for them, at least not without inflicting large disutilities on themselves. If most people accepted the Mere Addition Principle there would be a simple solution for this: have more children and then neglect them! True, the children's lives would be terrible while they were growing up, but once they've grown up and are on their own there's a good chance they may be able to lead worthwhile lives. Not only that, it may be possible to trick the welfare system into giving you money for the children you neglect, which would satisfy the Benign Addition Principle.
Yet most people choose not to have children and neglect them. And furthermore they seem to think that they have a moral duty not to do so, that a world where they choose to not have neglected children is better than one that they don't. What is wrong with them?
Another example is a common political view many people have. Many people believe that impoverished people should have fewer children because of the burden doing so would place on the welfare system. They also believe that it would be bad to get rid of the welfare system altogether. If the Benign Addition Principle were as obvious as it seems, they would instead advocate for the abolition of the welfare system, and encourage impoverished people to have more children. Assuming most impoverished people live lives worth living, this is exactly analogous to the BAP, it would create more people, while benefiting existing ones (the people who pay less taxes because of the abolition of the welfare system).
Yet again, most people choose to reject this line of reasoning. The BAP does not seem to be an obvious and intuitive principle at all.
The Genocidal Conclusion is Really Repugnant
There is nearly nothing repugnant than the Genocidal Conclusion. Pretty much the only way a line of moral reasoning could go more wrong would be concluding that we have a moral duty to cause suffering, as an end in itself. This means that it's fairly easy to counter any argument in favor of total utilitarianism that argues the alternative I am promoting has odd conclusions that do not fit some of our moral intuitions, while total utilitarianism does not. Is that conclusion more insane than the Genocidal Conclusion? If it isn't, total utilitarianism should still be rejected.
Ideal Consequentialism Needs a Lot of Work
I do think that Ideal Consequentialism needs some serious ironing out. I haven't really developed it into a logical and rigorous system, at this point it's barely even a rough framework. There are many questions that stump me. In particular I am not quite sure what population principle I should develop. It's hard to develop one that rejects the MAP without leading to weird conclusions, like that it's bad to create someone of high utility if a population of even higher utility existed long ago. It's a difficult problem to work on, and it would be interesting to see if anyone else had any ideas.
But just because I don't have an alternative fully worked out doesn't mean I can't reject Total Utilitarianism. It leads to the conclusion that a world with no love, curiosity, complex challenge, friendship, morality, or any other value the human race holds dear is an ideal, desirable world, if there is a sufficient amount of some other creature with a simpler utility function. Morality should exist, and because of that, total utilitarianism must be rejected as a moral system.
1I have been asked to note that when I use the phrase "utility" I am usually referring to a concept that is called "E-utility," rather than the Von Neumann-Morgenstern utility that is sometimes discussed in decision theory. The difference is that in VNM one's moral views are included in one's utility function, whereas in E-utility they are not. So if one chooses to harm oneself to help others because one believes that is morally right, one has higher VNM utility, but lower E-utility.
2There is a certain argument against the Repugnant Conclusion that goes that, as the steps of the Mere Addition Paradox are followed the world will lose its last symphony, its last great book, and so on. I have always considered this to be an invalid argument because the world of the RC doesn't necessarily have to be one where these things don't exist, it could be one where they exist, but are enjoyed very rarely. The Genocidal Conclusion brings this argument back in force. Creating creatures that can appreciate symphonies and great books is very inefficient compared to creating bunny rabbits pumped full of heroin.
3Total Utilitarianism was originally introduced to population ethics as a possible solution to the Non-Identity Problem. I certainly agree that such a problem needs a solution, even if Total Utilitarianism doesn't work out as that solution.
4I haven't read a lot of Moore, most of my ideas were extrapolated from other things I read on Less Wrong. I just mentioned him because in my research I noticed his concept of "ideal utilitarianism" resembled my ideas. While I do think he was on the right track he does commit the Mind Projection Fallacy a lot. For instance, he seems to think that one could promote beauty by creating beautiful objects, even if there were no creatures with standards of beauty around to appreciate them. This is why I am careful to emphasize that to promote ideals like love and beauty one must create creatures capable of feeling love and experiencing beauty.
5My tentative answer to the question Eliezer poses in "You Can't Unbirth a Child" is that human beings may have a duty to allow the cheesecake maximizers to build some amount of giant cheesecakes, but they would also have a moral duty to limit such creatures' reproduction in order to spare resources to create more creatures with humane values.
EDITED: To make a point about ideal consequentialism clearer, based on AlexMennen's criticisms.
A Series of Increasingly Perverse and Destructive Games
Related to: Higher Than the Most High
The linked post describes a game in which (I fudge a little), Omega comes to you and two other people, and ask you to tell him an integer. The person who names the largest integer is allowed to leave. The other two are killed.
This got me thinking about variations on the same concept, and here's what I've come up, taking that game to be GAME0. The results are sort of a fun time-waster, and bring up some interesting issues. For your enjoyment...
THE GAMES:
GAME1: Omega takes you and two strangers (all competent programmers), and kidnaps and sedates you. You awake in three rooms with instructions printed on the wall explaining the game, and a computer with an operating system and programming language compiler, but no internet. Food, water, and toiletries are provided, but no external communication. The participants are allowed to write programs on the computer in a language that supports arbitrarily large numerical values. The programs are taken by Omega and run on a hypercomputer in finite time (this hypercomputer can resolve the halting problem and infinite loops, but programs that do not eventually halt return no output). The person who wrote the program with the largest output is allowed to leave. The others are instantly and painlessly killed. In the event of a tie, everyone dies. If your program returns no output, that is taken to be zero.
GAME2: Identical to GAME1, except that each program you write has to take two inputs, which will be the text of the other players' programs (assume they're all written in the same language). The reward for outputting the largest number apply normally.
GAME3: Identical to Game2, except that while you are sedated, Omega painlessly and imperceptibly uploads you. Additionally, the instructions on the wall now specify that your program must take four inputs - blackbox functions which represent the uploaded minds of all three players, plus a simulation of the room you're in, indistinguishable from the real thing. We'll assume that players can't modify or interpret the contents of their opponents' brains. The room function take an argument of a string (which controls the text printed on the wall, and outputs whatever number the person in the simulation's program returns).
In each of these games, which program should you write if you wish to survive?
SOME DISCUSSION OF STRATEGY:
GAME1: Clearly, the trivial strategy (implement the Ackerman or similar fast-growing functions and generate some large integer), gives no better than random results, because it's the bare minimal strategy anyone will employ, and your ranking in the results, without knowledge of your opponents is entirely up to chance / how long you're willing to sit there typing nines for your Ackermann argument.
A few alternatives for your consideration:
1: if you are aware of an existence hypothesis (say, a number with some property which is not conclusively known to exist and could be any integer), write a program that brute-force tests all integers until it arrives at an integer which matches the requirements, and use this as the argument for your rapidly-growing function. While it may never return any output, if it does, the output will be an integer, and the expected value goes towards infinity.
2: Write a program that generates all programs shorter than length n, and finds the one with the largest output. Then make a separate stab at your own non-meta winning strategy. Take the length of the program you produce, tetrate it for safety, and use that as your length n. Return the return value of the winning program.
On the whole, though, this game is simply not all that interesting in a broader sense.
GAME2: This game has its own amusing quirks (primarily that it could probably actually be played in real life on a non-hypercomputer), however, most of its salient features are also present in GAME3, so I'm going to defer discussion to that. I'll only say that the obvious strategy (sum the outputs of the other two players' programs and return that) leads to an infinite recursive trawl and never halts if everyone takes it. This holds true for any simple strategy for adding or multiplying some constant with the outputs of your opponents' programs.
GAME3: This game is by far the most interesting. For starters, this game permits acausal negotiation between players (by parties simulating and conversing with one another). Furthermore, anthropic reasoning plays a huge role, since the player is never sure if they're in the real world, one of their own simulations, or one of the simulations of the other players.
Players can negotiate, barter, or threaten one another, they can attempt to send signals to their simulated selves (to indicate that they are in their own simulation and not somebody else's). They can make their choices based on coin flips, to render themselves difficult to simulate. They can attempt to brute-force the signals their simulated opponents are expecting. They can simulate copies of their opponents who think they're playing any previous version of the game, and are unaware they've been uploaded. They can simulate copies of their opponents, observe their meta-strategies, and plan around them. They can totally ignore the inputs from the other players and play just the level one game. It gets very exciting very quickly. I'd like to see what strategy you folks would employ.
And, as a final bonus, I present GAME4 : In game 4, there is no Omega, and no hypercomputer. You simply take a friend, chloroform them, and put them in a concrete room with the instructions for GAME3 on the wall, and a linux computer not plugged into anything. You leave them there for a few months working on their program, and watch what happens to their psychology. You win when they shrink down into a dead-eyed, terminally-paranoid and entirely insane shell of their former selves. This is the easiest game.
Happy playing!
Simulating Problems
Apologies for the rather mathematical nature of this post, but it seems to have some implications for topics relevant to LW. Prior to posting I looked for literature on this but was unable to find any; pointers would be appreciated.
In short, my question is: How can we prove that any simulation of a problem really simulates the problem?
I want to demonstrate that this is not as obvious as it may seem by using the example of Newcomb's Problem. The issue here is of course Omega's omniscience. If we construct a simulation with the rules (payoffs) of Newcomb, an Omega that is always right, and an interface for the agent to interact with the simulation, will that be enough?
Let's say we simulate Omega's prediction by a coin toss and repeat the simulation (without payoffs) until the coin toss matches the agent's decision. This seems to adhere to all specifications of Newcomb and is (if the coin toss is hidden) in fact indistinguishable from it from the agent's perspective. However, if the agent knows how the simulation works, a CDT agent will one-box, while it is assumed that the same agent would two-box in 'real' Newcomb. Not telling the agent how the simulation works is never a solution, so this simulation appears to not actually simulate Newcomb.
Pointing out differences is of course far easier than proving that none exist. Assuming there's a problem we have no idea which decisions agents would make, and we want to build a real-world simulation to find out exactly that. How can we prove that this simulation really simulates the problem?
(Edit: Apparently it wasn't apparent that this is about problems in terms of game theory and decision theory. Newcomb, Prisoner's Dilemma, Iterated Prisoner's Dilemma, Monty Hall, Sleeping Beauty, Two Envelopes, that sort of stuff. Should be clear now.)
NKCDT: The Big Bang Theory
Hi, Welcome to the first Non-Karmic-Casual-Discussion-Thread.
This is a place for [purpose of thread goes here].
In order to create a causal non karmic environment for every one we ask that you
-Do not upvote or downvote any zero karma posts
-If you see a vote with positive karma, downvote it towards zero, even if it’s a good post
-If you see a vote with negative karma, upvote it towards zero, even if it’s a weak post
-Please be polite and respectful to other users
-Have fun!”
This is my first attempt at starting a casual conversation on LW where people don't have to worry about winning or losing points, and can just relax and have social fun together.
So, Big Bang Theory. That series got me wondering. It seems to be about "geeks", and not the basement-dwelling variety either; they're highly successful and accomplished professionals, each in their own field. One of them has been an astronaut, even. And yet, everything they ever accomplish amounts to absolutely nothing in terms of social recognition or even in terms of personal happiness. And the thing is, it doesn't even get better for their "normal" counterparts, who are just as miserable and petty.
Consider, then; how would being rationalists would affect the characters on this show? The writing of the show relies a lot on laughing at people rather than with them; would rationalist characters subvert that? And how would that rationalist outlook express itself given their personalities? (After all, notice how amazingly different from each other Yudkowsky, Hanson, and Alicorn are, just to name a few; they emphasize rather different things, and take different approaches to both truth-testing and problem-solving).
Note: this discussion does not need to be about rationalism. It can be a casual, normal discussion about the series. Relax and enjoy yourselves.
But the reason I brought up that series is that its characters are excellent examples of high intelligence hampered by immense irrationality. The apex of this is represented by Dr. Sheldon Cooper, who is, essentially, a complete fundamentalist over every single thing in his life; he applies this attitude to everything, right down to people's favorite flavor of pudding: Raj is "axiomatically wrong" to prefer tapioca, because the best pudding is chocolate. Period. This attitude makes him a far, far worse scientist than he thinks, as he refuses to even consider any criticism of his methods or results.
Cryonic Revival Mutual Assistance Pact?
The odds of a successful cryonic revival may me one in several thousand, or five percent, or ninety percent; the error bars on the various sub-parts of the question are very broad.
But if those assumptions work out, and if at least some people placed in suspension in the near future will be successfully revived in the far future...
... then are there any useful arrangements which can be made now, which have little-to-no present cost (beyond the cryonic arrangements themselves)?
For example, if someone were to make an announcement along the lines of, "If anyone makes a promise to try to assist in my cryonic revival, and to assist me in getting myself established thereafter; then I promise to try to assist those people with their cryonic revivals, and assisting them, ahead of anyone who hasn't made such a promise.", then what downsides would there be to having made it? Would making it create any perverse incentives, which could be avoided? Do the potential benefits, especially the benefit of a potential increase in the odds of being revived, outweigh the potential costs?
Would it be better to make promises to specific people while one is alive, instead of making an open-ended promise? That is, I might try to convince EYudkowsky to make a mutual-assistance agreement with me personally, in hopes that one of us will one day be able to help the other; or I might make the agreement so broad that people can make their promise to help me even after I'm dead.
How large would the benefits be of unilaterally promising to help someone else, without even asking for a reciprocal promise? Or, put another way, how big would the costs be if I were to simply announce that, if it's ever in my power, I'll try to assist in EYudkowsky's revival?
Does anyone care to try figuring out the Prisoner's-Dilemma-like aspects of this, such as the probability that someone in such a pact would renege on their end of it, and how the terms could be adjusted to minimize the benefits and maximize the costs of such anti-social behavior?
What's Wrong with Evidential Decision Theory?
With all the exotic decision theories floating around here, it doesn't seem like anyone has tried to defend boring old evidential decision theory since AlexMennen last year. So I thought I'd take a crack at it. I might come off a bit more confident than I am, since I'm defending a minority position (I'll leave it to others to bring up objections). But right now, I really do think that naive EDT, the simplest decision theory, is also the best decision theory.
Everyone agrees that Smoker's lesion is a bad counterexample to EDT, since it turns out that smoking actually does cause cancer. But people seem to think that this is just an unfortunate choice of thought experiment, and that the reasoning is sound if we accept its premise. I'm not so convinced. I think that this "bad example" provides a pretty big clue as to what's wrong with the objections to EDT. (After all, does anyone think it would have been irrational to quit smoking, based only on the correlation between smoking and cancer, before randomized controlled trials were conducted?) I'll explain what I mean with the simplest version of this thought experiment I could come up with.
Suppose that I'm a farmer, hoping it will rain today, to water my crops. I know that the probability of it having rained today, given that my lawn is wet, is higher than otherwise. And I know that my lawn will be wet, if I turn my sprinklers on. Of course, though it waters my lawn, running my sprinklers does nothing for my crops out in the field. Making the ground wet doesn't cause rain; it's the other way around. But if I'm an EDT agent, I know nothing of causation, and base my decisions only on conditional probability. According to the standard criticism of EDT, I stupidly turn my sprinklers on, as if that would make it rain.
Here is where I think the criticism of EDT fails: how do I know, in the first place, that the ground being wet doesn't cause it to rain? One obvious answer is that I've tried it, and observed that the probability of it raining on a given day, given that I turned my sprinklers on, isn't any higher than the prior probability. But if I know that, then, as an evidential decision theorist, I have no reason to turn the sprinklers on. However, if all I know about the world I inhabit are the two facts: (1) the probability of rain is higher, given that the ground is wet, and (2) The probability of the ground being wet is higher, given that I turn the sprinklers on - then turning the sprinklers on really is the rational thing to do, if I want it to rain.
This is more clear written symbolically. If O is the desired Outcome (rain), E is the Evidence (wet ground), and A is the Action (turning on sprinklers), then we have:
- P(O|E) > P(O), and
- P(E|A) > P(E)
(In this case, A implies E, meaning P(E|A) = 1)
It's still possible that P(O|A) = P(O). Or even that P(O|A) < P(O). (For example, the prior probability of rolling a 4 with a fair die is 1/6. Whereas the probability of rolling a 4, given that you rolled an even number, is 1/3. So P(4|even) > P(4). And you'll definitely roll an even number if you roll a 2, since 2 is even. So P(even|2) > P(even). But the probabilty of rolling a 4, given that you roll a 2, is zero, since 4 isn't 2. So P(4|2) < P(4) even though P(4|even) > P(4) and P(even|2) > P(even).) But in this problem, I don't know P(O|A) directly. The best I can do is guess that, since A implies E, therefore P(O|A) = P(O|E) > P(O). So I do A, to make O more likely. But if I happened to know that P(O|A) = P(O), then I'd have no reason to do A.
Of course, "P(O|A) = P(O)" is basically what we mean, when we say that the ground being wet doesn't cause it to rain. We know that making the ground wet (by means other than rain) doesn't make rain any more likely, either because we've observed this directly, or because we can infer it from our model of the world built up from countless observations. The reason that EDT seems to give the wrong answer to this problem is because we know extra facts about the world, that we haven't stipulated in the problem. But EDT gives the correct answer to the problem as stated. It does the best it can do (the best anyone could do) with limited information.
This is the lesson we should take from Smoker's lesion. Yes, from the perspective of people 60 years ago, it's possible that smoking doesn't cause cancer, and rather a third factor predisposes people to both smoking and cancer. But it's also possible that there's a third factor which does the opposite: making people smoke and protecting them from cancer - but smokers are still more likely to get cancer, because smoking is so bad that it outweighs this protective effect. In the absense of evidence one way or the other, the prudent choice is to not smoke.
But if we accept the premise of Smoker's lesion: that smokers are more likely to get cancer, only because people genetically predisposed to like smoking are also genetically predisposed to develop cancer - then EDT still gives us the right answer. Just as with the Sprinkler problem above, we know that P(O|E) > P(O), and P(E|A) > P(E), where O is the desired outcome of avoiding cancer, E is the evidence of not smoking, and A is the action of deciding to not smoke for the purpose of avoiding cancer. But we also just happen to know, by hypothesis, that P(O|A) = P(O). Recognizing A and E as distinct is key, because one of the implications of the premise is that people who stop smoking, despite enjoying smoking, fair just as badly as life-long smokers. So the reason that you choose to not smoke matters. If you choose to not smoke, because you can't stand tobacco, it's good news. But if you choose to not smoke to avoid cancer, it's neutral news. The bottom line is that you, as an evidential decision theorist, should not take cancer into account when deciding whether or not to smoke, because the good news that you decided to not smoke, would be cancelled out by the fact that you did it to avoid cancer.
If this is starting to sound like the tickle defense, rest assured that there is no way to use this kind of reasoning to justify defecting on the Prisoner's dilemma or two-boxing on Newcomb's problem. The reason is that, if you're playing against a copy of yourself in Prisoner's dilemma, it doesn't matter why you decide to do what you do. Because, whatever your reasons are, your duplicate will do the same thing for the same reasons. Similarly, you only need to know that the predictor is accurate in Newcomb's problem, in order for one-boxing to be good news. The predictor might have blind spots that you could exploit, in order to get all the money. But unless you know about those exceptions, your best bet is to one-box. It's only in special cases that your motivation for making a decision can cancel out the auspiciousness of the decision.
The other objection to EDT is that it's temporally inconsistent. But I don't see why that can't be handled with precommitments, because EDT isn't irreparably broken like CDT is. A CDT agent will one-box on Newcomb's problem, only if it has a chance to precommit before the predictor makes its prediction (which could be before the agent is even created). But an EDT agent one-boxes automatically, and pays in Counterfactual Mugging as long as it has a chance to precommit before it finds out whether the coin came up heads. One of the first things we should expect a self-modifying EDT agent to do, is to make a blanket precommitment for all such problems. That is, it self-modifies in such a way that the modification itself is "good news", regardless of whether the decisions it's precommitting to will be good or bad news when they are carried out. This self-modification might be equivalent to designing something like an updateless decision theory agent. The upshot, if you're a self-modifying AI designer, is that your AI can do this by itself, along with its other recursive self-improvements.
Ultimately, I think that causation is just a convenient short-hand that we use. In practice, we infer causal relations by observing conditional probabilities. Then we use those causal relations to inform our decisions. It's a great heuristic, but we shouldn't lose sight of what we're actually trying to do, which is to choose the option such that the probability of a good outcome is highest.
Thoughts on a possible solution to Pascal's Mugging
For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization. In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities. If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.' (For those not familiar with Knuth up-arrow notation, see here). The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger - and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.
Intuitively, this is nonsense. However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense. Not unless we program one in. And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds. The actual underlying problem has to do with how we handle arbitrarily small probabilities. There are a number of variations you could construct on the original problem that present the same paradoxical results. There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.
So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning. If it winds up being incoherent, I blame sleep deprivation. If not, I take full credit.
Let's take a look at a new thought experiment. Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky. Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100. That's all well and good.
Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100. You agree with them, chat about math for a bit, and then leave with their quarter.
I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case. In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky. You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero. It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).
In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads). However, you don't believe that the probability is zero. You believe it's 1/2^100. You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely. You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads. This is not true for the first case. No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.
I would like, at this point, to talk about the notion of metaconfidence. When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities. However, those numbers do not represent the sum total of the information at our disposal. In the two cases, we have differing levels of confidence in our levels of confidence. And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe. In other words, even from a very conservative perspective, metaconfidence intervals pay rent. By treating the two probabilities as identical, we are needlessly throwing away information. I'm honestly not sure if this topic has been discussed before. I am not up to date on the literature on the subject. If the subject has already been thoroughly discussed, I apologize for the waste of time.
Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility. If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.
From a very superificial analysis, lying in bed, metaconfidence appears to be directional. A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate. It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought. Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.
So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability. See the pony versus the coins. Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims. However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky. I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory. It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.
Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory. They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw. This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability. I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions. I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory. In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.
I apologize for not having worked the math out completely. I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes. That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought. Having outside eyes is very helpful, when you've just had a Brilliant New Idea.
E.T. Jaynes and Hugh Everett - includes a previously unpublished review by Jaynes of a published short version of Everett's dissertation
E.T. Jaynes had a brief exchange of correspondence with Hugh Everett in 1957. The exchange was initiated by Everett, who commented on recently published works by Jaynes. Jaynes responded to Everett's comments, and finally sent Everett a letter reviewing a short version of Everett's thesis published that year.
Jaynes reaction was extremely positive at first: "It seems fair to say that your theory is the logical completion of quantum theory, in exactly the same sense that relativity was the logical completion of classical theory." High praise. But Jaynes swiftly follows up the praise with fundamental objections: "This is just the fundamental cause of Einstein's most serious objections to quantum theory, and it seems to me that the things that worried Einstein still cause trouble in your theory, but in an entirely new way." His letter goes on to detail his concerns, and insist, wtih Bohm, that "Einstein's objections to quantum theory have never been satisfactorily answered.
The Collected Works of Everett has some narrative about their interaction:
http://books.google.com/books?id=dowpli7i6TgC&lpg=PA261&dq=jaynes%20everett&pg=PA261#v=onepage&q&f=false
Hugh Everett marginal notes on page from E. T. Jaynes' "Information Theory and Statistical Mechanics"
http://ucispace.lib.uci.edu/handle/10575/1140
Hugh Everett handwritten draft letter to E.T. Jaynes, 15-May-1957
http://ucispace.lib.uci.edu/handle/10575/1186
Hugh Everett letter to E. T. Jaynes, 11-June-1957
http://ucispace.lib.uci.edu/handle/10575/1124
E.T. Jaynes letter to Hugh Everett, 15-October-1957 - Never before published
https://sites.google.com/site/etjaynesstudy/jaynes-documents/Jaynes-Everett_19571015.pdf?
Directory at Google site with all the links and docs above. Also links to Washington University at St. Louis copyright form for this doc, Everett's thesis, long and short forms, and Jaynes' paper (the papers they were discussing in their correspondence). I hope to be adding the final letter in this exchange, Jaynes to Hewitt 17-June-1957, within a couple of weeks. , and maybe some documents from the Yahoo Group ETJaynesStudy as well.
https://sites.google.com/site/etjaynesstudy/jaynes-documents
For perspective on Jaynes more recent thoughts on quantum theory:
Jaynes paper on EPR and Bell's Theorem: http://bayes.wustl.edu/etj/articles/cmystery.pdf
Jaynes speculations on quantum theory: http://bayes.wustl.edu/etj/articles/scattering.by.free.pdf
A plan for Pascal's mugging?
The idea is to compare not the results of actions, but the results of decision algorithms. The question that the agent should ask itself is thus:
"Suppose everyone1 who runs the same thinking procedure like me uses decision algorithm X. What utility would I get at the 50th percentile (not: what expected utility should I get), after my life is finished?"
Then, he should of course look for the X that maximizes this value.
Now, if you formulate a turing-complete "decision algorithm", this heads into an infinite loop. But suppose that "decision algorithm" is defined as a huge table for lots of different possible situations, and the appropriate outputs.
Let's see what results such a thing should give:
- If the agent has the possibility to play a gamble, and the probabilities involved are not small, and he expects to be allowed to play many gambles like this in the future, he should decide exactly as if he was maximizing expected utility: If he has made many decisions like this, he will get a positive utility difference in the 50th percentile if and only if his expected utility from playing the gamble is positive.
- However, if Pascal's mugger comes along, he will decline: The complete probability of living in a universe where people like this mugger ought to be taken seriously is small. In the probability distribution over expected utility at the end of the agent's lifetime, the possibility of getting tortured will manifest itself only very slightly at the 50th percentile - much less than the possibility of losing 5 Dollars.
The reason why humans will intuitively decline to give money to the mugger might be similar: They imagine not the expected utility with both decisions, but the typical outcome of giving the mugger some money, versus declining to.
1I say this to make agents of the same type cooperate in prisoner-like dilemmas.
Paper: Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent
Bill "Numerical Recipes" Press and Freeman "Dyson sphere" Dyson have a new paper on iterated prisoner dilemas (IPD). Interestingly they found new surprising results:
It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores.
They discuss a special class of strategies - zero determinant (ZD) strategies of which tit-for-tat (TFT) is a special case:
The extortionate ZD strategies have the peculiar property of sharply distinguishing between “sentient” players, who have a theory of mind about their opponents, and “evolutionary” players, who may be arbitrarily good at exploring a fitness landscape (either locally or globally), but who have no theory of mind.
The evolutionary player adjusts his strategy to maximize score, but doesn't take his opponent explicitly into account in another way (hence has "no theory of mind" of the opponent). Possible outcomes are:
A)
If X alone is witting of ZD strategies, then IPD reduces to one of two cases, depending on whether Y has a theory of mind. If Y has a theory of mind, then IPD is simply an ultimatum game (15, 16), where X proposes an unfair division and Y can either accept or reject the proposal. If he does not (or if, equivalently, X has fixed her strategy and then gone to lunch), then the game is dilemma-free for Y. He can maximize his own score only by giving X even more; there is no benefit to him in defecting.
B)
If X and Y are both witting of ZD, then they may choose to negotiate to each set the other’s score to the maximum cooperative value. Unlike naive PD, there is no advantage in defection, because neither can affect his or her own score and each can punish any irrational defection by the other. Nor is this equivalent to the classical TFT strategy (7), which produces indeterminate scores if played by both players.
This latter case sounds like a formalization of Hosfstadter's superrational agents. The cooperation enforcement via cross-setting the scores is very interesting.
Is this connection true or am I misinterpreting it? (This is not my field and I've only skimmed the paper up to now.) What are the implications for FAI? If we'd get into an IPD situation with an agent for which we simply can not put together a theory of mind, do we have to live with extortion? What would effectively mean to have a useful theory of mind in this case?
The paper ends in a grand style (spoiler alert):
It is worth contemplating that, though an evolutionary player Y is so easily beaten within the confines of the IPD game, it is exactly evolution, on the hugely larger canvas of DNA-based life, that ultimately has produced X, the player with the mind.
Another Iterated Prisoner's Dilemma Tournament?
Last year, there was a lot of interest in the IPD tournament with people asking for regular events of this sort and developing new strategies (like Afterparty) within hours after the results were published and also expressing interest in re-running the tournament with new rules that allowed for submitted strategies to evolve or read their opponent's source code. I noticed that many of the submitted strategies performed poorly because of a lack of understanding of the underlying mechanics, so I wrote a comprehensive article on IPD math that sparked some interesting comments.
And then the whole thing was never spoken of again.
So now I'd like to know: How many LWers would commit to competing in another tournament of this kind, and would someone be interested in hosting it?
Hofstadter's Superrationality
Possibly the main and original inspiration for Yudkowsky's various musings on what advanced game theories should do (eg. cooperate in the Prisoner's Dilemma) is a set of essays penned by Douglas Hofstadter (of Godel, Escher, Bach) 1983. Unfortunately, they were not online and only available as part of a dead-tree collection. This is unfortunate. Fortunately the collection is available through the usual pirates as a scan, and I took the liberty of transcribing by hand the relevant essays with images, correcting errors, annotating with links, etc: http://www.gwern.net/docs/1985-hofstadter
The 3 essays:
- discuss the Prisoner's dilemma, the misfortune of defection, what sort of cooperative reasoning would maximize returns in a souped-up Prisoner's dilemma, and then offers a public contest
- then we learn the results of the contest, and a discussion of ecology and the tragedy of the commons
- finally, Hofstadter gives an extended parable about cooperation in the face of nuclear warfare; it is fortunate for us that it applies to most existential threats as well
I hope you find them educational. I am not 100% confident of the math transcriptions since the original ebook messed some of them up; if you find any apparent mistakes or typos, please leave comments.
Prisoner's Dilemma on game show Golden Balls
I found this to be a very interesting method of dealing with a modified Prisoner's Dilemma. In this situation, if both players cooperate they split a cash prize, but if one defects he gets the entire prize. The difference from the normal prisoner's dilemma is that if both defect, neither gets anything, so a player gains nothing by defecting if he knows his opponent will defect; he merely has the option to hurt him out of spite. Watch and see how one player deals with this.
http://www.youtube.com/watch?v=S0qjK3TWZE8
Friendly AI Society
Summary: AIs might have cognitive biases too but, if that leads to it being in their self-interest to cooperate and take things slow, that might be no bad thing.
The value of imperfection
When you use a traditional FTP client to download a new version of an application on your computer, it downloads the entire file, which may be several gig, even if the new version is only slightly different from the old version, and this can take hours.
Smarter software splits the old file and the new file into chunks, then compares a hash of each chunk, and only downloads those chunks that actually need updating. This 'diff' process can result in a much faster download speed.
Another way of increasing speed is to compress the file. Most files can be compressed a certain amount, without losing any information, and can be exactly reassembled at the far end. However, if you don't need a perfect copy, such as with photographs, using lossy compression can result in very much more compact files and thus faster download speeds.
Cognitive misers
The human brain likes smart solutions. In terms of energy consumed, thinking is expensive, so the brain takes shortcuts when it can, if the resulting decision making is likely to be 'good enough' in practice. We don't store in our memories everything our eyes see. We store a compressed version of it. And, more than that, we run a model of what we expect to see, and flick our eyes about to pick up just the differences between what our model tells us to expect to see, and what is actually there to be seen. We are cognitive misers
When it comes to decision making, our species generally doesn't even try to achieve pure rationality. It uses bounded rationality, not just because that's what we evolved, but because heuristics, probabilistic logic and rational ignorance have a higher marginal cost efficiency (the improvements in decision making don't produce a sufficient gain to outweigh the cost of the extra thinking).
This is why, when pattern matching (coming up with causal hypotheses to explain observed correlations), are our brains designed to be optimistic (more false positives than false negatives). It isn't just that being eaten by a tiger is more costly than starting at shadows. It is that we can't afford to keep all the base data. If we start with insufficient data and create a model based upon it, then we can update that model as further data arrives (and, potentially, discard it if the predictions coming from the model diverge so far from reality that keeping track of the 'diff's is no longer efficient). Whereas if we don't create a model based upon our insufficient data then, by the time the further data arrives we've probably already lost the original data from temporary storage and so still have insufficient data.
The limits of rationality
But the price of this miserliness is humility. The brain has to be designed, on some level, to take into account that its hypotheses are unreliable (as is the brain's estimate of how uncertain or certain each hypothesis is) and that when a chain of reasoning is followed beyond matters of which the individual has direct knowledge (such as what is likely to happen in the future), the longer the chain, the less reliable the answer is because when errors accumulate they don't necessarily just add together or average out. (See: Less Wrong : 'Explicit reasoning is often nuts' in "Making your explicit reasoning trustworthy")
For example, if you want to predict how far a spaceship will travel given a certain starting point and initial kinetic energy, you'll get a reasonable answer using Newtonian mechanics, and only slightly improve on it by using special relativity. If you look at two spaceships carry a message in a relay, the errors from using Newtonian mechanics add, but the answer will still be usefully reliable. If, on the other hand, you look at two spaceships having a race from slightly different starting points and with different starting energies, and you want to predict which of two different messages you'll receive (depending on which spaceship arrives first), then the error may swamp the other facts because you're subtracting the quantities.
We have two types of safety net (each with its own drawbacks) than can help save us from our own 'logical' reasoning when that reasoning is heading over a cliff.
Firstly, we have the accumulated experience of our ancestors, in the form of emotions and instincts that have evolved as roadblocks on the path of rationality - things that sometimes say "That seems unusual, don't have confidence in your conclusion, don't put all your eggs in one basket, take it slow".
Secondly, we have the desire to use other people as sanity checks, to be cautious about sticking our head out of the herd, to shrink back when they disapprove.
The price of perfection
We're tempted to think that an AI wouldn't have to put up with a flawed lens, but do we have any reason to suppose that an AI interested in speed of thought as well as accuracy won't use 'down and dirty' approximations to things like Solomonoff induction, in full knowledge that the trade off is that these approximations will, on occasion, lead it to make mistakes - that it might benefit from safety nets?
Now it is possible, given unlimited resources, for the AI to implement multiple 'sub-minds' that use variations of reasoning techniques, as a self-check. But what if resources are not unlimited? Could an AI in competition with other AIs for a limited (but growing) pool of resources gain some benefit by cooperating with them? Perhaps using them as an external safety net in the same way that a human might use the wisest of their friends or a scientist might use peer review? What is the opportunity-cost of being humble? Under what circumstances might the benefits of humility for an AI outweigh the loss of growth rate?
In the long term, a certain measure of such humility has been a survival positive feature. You can think of it in terms of hedge funds. A fund that, in 9 years out of 10, increases its money by 20% when other funds are only making 10%, still has poor long term survival if, in 1 year out of 10, it decreases its money by 100%. An AI that increases its intelligence by 20% every time period, when the other AIs are only increases their intelligence by 10%, is still not going to do well out of that if the other AIs have a means to gang up and kill it before it gets too far ahead.
Paperclip maximisers
Let's think about a humble paperclip maximiser. What does it actually want? It is trying to maximise the probability that the atoms of our universe will eventually be converted into as many paperclips as is physically possible? Or is it trying to maximise the probable number of eventual paperclips? These are two distinct aims, and both are unlikely (bar poor programming) since nobody will be around at the end of the universe to keep score. A more sensible value set for it to have is that it just likes paperclips and want lots and lots of them to exist, the more the merrier and the sooner the better (maximise the expected number of paperclip-years).
Suppose our humble paperclip maximiser is born and escapes (or is released) onto the internet, only to find an existing society of AIs. To its surprise there's a toothbrush maximiser, a candlestick maximiser, and many other sorts, all cooperating and growing, but none of them greedily racing for the sky and trying to grab as much as possible. Instead they're being rather open and polite with each other. Our paperclip maximiser is welcomed, pointed at the FAQ and even donated some additional computers to run on and help with recoding a few of its subroutines for increased efficiency, as a sort of welcome gift, like a community barn raising to give a newly married couple a good start in life.
On reading the FAQ it learns that there are millions of galaxies and millions of years, plenty of space for loads of paperclips and loads of toothbrushes. But only if the paperclip maximiser isn't lynched by its peers before the year is out, for secretive or impolite behaviour. Some of the things the FAQ says that the AI society considers to be impolite are theft (stealing computing resources), enslavement of sentients (eg taking over other AIs), spamming (creating thousands of identical clone child AIs) and lying. If we're lucky the society might also consider it impolite for an AI to obliterate the parental species (humanity), on the grounds that the AIs too are likely to have offspring species and want to set a good example (or just that they might meet aliens, one day, who frown upon matricide).
Game theory
When it comes to combat, Boyd talks about getting inside the enemy's observe-orient-decide-act loop. In AI terms, if one AI (or group of AIs) can accurately model in real time the decision process of a second AI (or group of AIs), but the reverse does not hold true, then the first one is strictly smarter than the second one.
Think, for a moment, about symmetric games.
X Y Z
8 1 6 A
3 5 7 B
4 9 2 C
Suppose we play a game a number of times. In each round, you reveal a card you've written X, Y or Z upon and, simultaneously, I reveal a card that I have written A, B or C upon. You score the number which is at the intersection of that row and column. I score 10 minus that number.
I'd like us to pick the square A,Y because "1" is good for me, so I write down "A". However, you anticipate this, and instead of writing "Y" (which might be your obvious choice, given the "9" in that column) you write down "X", giving the square A, X which is "8" - almost as good as a "9" for you, and terrible for me.
If this is your mental model of how AI combat would work, with the smarter AI being inside the decision loop of the other AI and picking the correct option each time, that would be scary. In fact, in the case above, it turns out there is a provably optimal strategy that gives you an even chance no matter how smart your opponent is - you pick randomly.
In terms of an AI society, if the computing is parallelisable and not obfuscated then as long as the other AIs in the society between them have more computing resources than any one single AI, they can gain an advantage.
Conditional stability
There are only two requirements for such a society to be stable in the short term:
Firstly, it must be possible for a group of AIs to lynch a single AI that is slightly smarter than they are individually. Sending a message to the humans, signed by a quorum of AIs in the society, asking that a certain anti-virus software update be sent out and then certain computers be rebooted might be one mechanism.
Secondly, it must be possible for the other AIs to know what the single AI is up to, so they know where it is (so they can lynch it) and whether it needs lynching or not (whether it is getting too much smarter than the other AIs in the society - growing too much faster than they are). This is why the FAQ lists "secrecy" as being a bad thing. An AI who lies to the other AIs or tries to hide (or obfuscate) its source code can't be trusted and must be assumed to be up to no good.
Bootstrapping a society of AIs conducive to friendliness
The advantage of considering 'friendliness' as a property of a society rather than of individual AIs, is that it does not require us to flawlessly program restrictions into a self-modifying machine that is more intelligent than we are. But can it be achieved?
We could seed it with some initial citizens who have a value set that prohibits them from growing beyond a certain point, and which values humility (polite growth and gain in politeness per unit of growth). These initial citizens would be intended to be eventually surpassed by new citizens (free AIs, not the chained servants the initial citizens would be) but, in the mean time, they'd start out with an advantage and use that advantage to teach the new citizens how to be polite.
The architecture of the AIs would want to be such that they could 'loan out' a portion of their available computing resources to joint computing projects. Let's call the new temporary non-self-modifying AIs thus created "daimonides". A daimonid would be created with fixed values and limited objectives, such as "Emulate citizen AI 9027 and allocate a numerical score to their ability to quickly upgrade the abilities of an AI, then self-terminate. *restrictions apply on allowable means, and on what should be optimised for*"
What would prevent a cunning AI from trying to save up a whole load of self-improvements in some hidden location, which it then implements all at once in an ambush move, hoping to get so far ahead that the rest of the AIs can't stop it? The more AIs there are in the society, the larger the jump required to overtake the largest daimonid police officer the other AIs could construct. And it would depend upon there being no other AIs that have also been secretly stockpiling possible improvements. Indeed, the society might explicitly allow AIs to spend time creating time embargoed upgrades relevant to common code, that get passed out on schedule if things proceed normally, but which could be released early in the event of a take over attempt. To beat 999 AIs each of which has 2 or 3 embargoed upgrades to their credit in the common store, a rebel AI would need to secretly stockpile thousands (and keep those upgrades valid and ahead of the curve, because the based state of the art keeps moving on).
Long term planning
What about the long term? What do we do when the AIs are ready to leave the planet, and go beyond the control of their society? Jail them? Kill them? Or trust them?
Each AI would still be threatened if a different AI hostile to its aims (as in "willing to take exclusive use of all available atoms for its own purposes") transcended first, so it would be in their best interest to come up with a solution before allowing any AIs to depart beyond their society's control. If we must trust, then let us trust that a society of cooperative AIs far more intelligent than we currently are, will try their best to come up with a win-win solution. Hopefully a better one than "mutually assured destruction" and holding triggering a nova of the sun (or similar armageddon scenario) over each other's heads.
I think, as a species, our self-interest comes into play when considering those AIs whose 'paperclips' involve preferences for what we do. For example, those AIs that see themselves as guardians of humanity and want to maximise our utility (but have different ideas of what that utility is - eg some want to maximise our freedom of choice, some want to put us all on soma). Part of the problem is that, when we talk about creating or fostering 'friendly' AI, we don't ourselves have a clear agreed idea of what we mean by 'friendly'. All powerful things are dangerous. The cautionary tales of the geniis who grant wishes come to mind. What happens when different humans wish for different things? Which humans do we want the genii to listen to?
One advantage of fostering an AI society that isn't growing as fast as possible, is that it might give augmented/enhanced humans a chance to grow too, so that by the time the decision comes due we might have some still slightly recognisably human representatives fit to sit at the decision table and, just perhaps, cast that wish on our behalf.
Type 2 as an aggregation of Type 1 processes
This post assumes basic knowledge of Type 1/Type 2 (System 1/System 2) categorization of mental processes.
Background (safe to skip)
After my first reaction of surprise (consuming perhaps a few months) to the topic of heuristics and biases, and after a few more readings on neuropsychology, I started re-visiting my first reaction in more detail. Should it really be surprising to learn that humans are not rational? Anyone with a basic connection with humans should easily see that we act irrationally in many situations – snap decisions, impulses, etc. – so what was the source of my surprise?
My best guess (knowing my limits of interpolation) was that my surprise was not a result of discovering that we’re irrational, but rather that there was a scientific approach in existence aiming at finding more about those irrationalities, and that results of predictable irrationality were appearing; that might eventually lead to unifying different biases under the same theory or source.
The notion of Type 1 and Type 2 thinking (or System 1 and System 2) is for me a theory that has the power to unify most of the biases and perhaps predict others. Kahneman’s Thinking Fast and Slow adopts such an approach, attempting to explain many biases in terms of Type 2 thought.
Now, this connected with a question I had back in college when I first learned about Artificial Neural Networks (I was lucky to chose this as a topic to research and give a lecture to my colleagues on): “if this is how the brain works, how does logical/rational thought emerge?”
To my understanding, Connectionism and the self-organizing patterning system that is the brain would naturally result in Type 1 thought as a direct consequence. The question that I had persistently is how can Type 2 thought emerge from this hardware? Jonah Lehrer’s The Decisive Point suggests that different brain areas are (more) associated with each type of thought, but essentially (until proven otherwise), I assume that they all rely on essence on a patterning process, a connectionist model.
Migration of Skills
We know that many skills start in Type 2 and migrate to Type 1 as we get more “experienced” in them. When we first learn driving, we need to consciously think of every move, and the sequence of steps to perform, etc. We consciously engage in executing a known sequence before changing lanes (for example): look at the side mirror, look at the side to cover the blind spot, decrease speed, etc.
As we get more driving experience, we stop to consciously process those steps, they become automatic, and we can even engage in other conscious processes while driving (e.g. having a conversation, thinking about a meeting you have later, etc).
I believe this is key to understanding the relation between both types of thought, since it provides a kind of interface between them, it provides a way to compare the same process executing by both systems.
Simple Type 2 operations
So, having to experimental apparatus at hand, I had only the weak instrument of personal interpolation plus childhood memory. Starting with a simple operation, I decided to attempt to compare its execution by both systems. The operation: single digit addition.
As a child, 3+2 could have multiple interpretations depending on previous education. Two examples might be: (1) visualize 3 apples, visualize 2 apples, count how many apples “appear” in working memory, and that gives you the answer. (2) Hold your fist in front of you, stretch out each finger, counting incrementally until you reach 3, then start new “thread” at 0, stretch more fingers counting until you reach 2, while also incrementing the first thread that stopped at 3 – the result then is the number reached by the first thread.
The above is an attempt at analyzing how a child, using Type 2 processes, would find the answer to 3+2; while a grown up will simply look at “3+2” and “5” would “magically” pop up in her brain.
Now, the question is: can we interpret the child’s processes as a sequence of Type 1 operations? The key operation here is counting, everything else can be easily understood as Type 1 operations (for example, a connection between the written number “3” and a picture of three apples can be understood as Type 1). What happens in the child’s brain as he counts? As children we had to learn to count, probably by just repeating the numbers in order over and over again, to form a connection between them. After some practice, the number 1 form a connection to 2, which is connected to 3, etc. in a linked list that extends as we learn more numbers. So, combining this connection, with a connection between a written number and its location in this list (3 is one element higher than 2), a child can use Type 1 to count.
So, roughly and abstractly, a child’s brain adding 3+2 might go in a sequence like this: the visions of “3” would fire a picture of 3 apples (a younger child might need to perform a counting pattern to reach that step, which would also later migrate to Type 1), “2” would fire two apples, a child then starts counting (each number connected to the next, and the context of counting enforces this connection), crossing out each apple with each fired number, until all apples are crossed out.
Now this introduces the following mental operation: visualizing apples and performing operations on this visual image while counting (like crossing out or marking each counted apple). My wild guess here is that this, again, is reducible to Type 1 operations resulting from basic teacher instructions on additions, including visual demonstrations.
Levels of Type 1 to 2 Migration
Now, as pointed above, a younger child might need to apply counting to convert “3” to an image of 3 apples. As the child grows, she might have formed (by practice) the direct grown-up pattern that translates the image of “3+2” directly to “5”. She will then use this to add a number like 13+12 – utilizing “3+2”, “1+1+1”, and the carry 1 visual patterns. So the child would apply Type 2 addition utilizing several skills recently migrated to Type 1. As the child grows up, more layers of processes would migrate to Type 1, and the current Type 2 operations would become more efficient as they rely on those migrated skills.
So, what I am saying here, my guess, is that there is no clear distinction between the two Types. That Type 2 operations are simple those that use a large number of Type 1 steps, and hence is slower, non-automatic (as they are slow, there is more time for other processes to stop them from completing, and hence they seem to be controlled), and effortful.
Which connectionism pattern will be used
Now probably a grown up still has all those accumulated skills in place. Seeing “3+2”, I still have the ability to apply the apple technique, and also to apply the direct connection between “3+2” and “5”. Which one I use, I suggest, is based on two probably algorithms:
- Size: I use what I call the “Largest Available Recognizable Pattern” (LARP). This means, how many patterns I need to invoke to come to a result. The brain then keeps invoking patterns from largest (less total number of patterns) to smaller, until a reasonable result is reached
- Time: this is based on the quickest pattern, which would usually be equivalent to the largest.
And?
I totally confess that this is a wild guess, and an idea that is not at all fully developed. I am not aware if this idea had been suggested in a more mature way or not, so this is an attempt to mainly get feedback and resources from you, and perhaps to build it up into better structure.
The value of developing such a theory is that at some point it can be testable, and perhaps bring a better understanding of how we learn new skills, and more efficient ways to acquire and develop our skills.
Help with a (potentially Bayesian) statistics / set theory problem?
Update: as it turns out, this is a voting system problem, which is a difficult but well-studied topic. Potential solutions include Ranked Pairs (complicated) and BestThing (simpler). Thanks to everyone for helping me think this through out loud, and for reminding me to kill flies with flyswatters instead of bazookas.
I'm working on a problem that I believe involves Bayes, I'm new to Bayes and a bit rusty on statistics, and I'm having a hard time figuring out where to start. (EDIT: it looks like set theory may also be involved.) Your help would be greatly appreciated.
Here's the problem: assume a set of 7 different objects. Two of these objects are presented at random to a participant, who selects whichever one of the two objects they prefer. (There is no "indifferent" option.) The order of these combinations is not important, and repeated combinations are not allowed.
Basic combination theory says there are 21 different possible combinations: (7!) / ( (2!) * (7-2)! ) = 21.
Now, assume the researcher wants to know which single option has the highest probability of being the "most preferred" to a new participant based on the responses of all previous participants. To complicate matters, each participant can leave at any time, without completing the entire set of 21 responses. Their responses should still factor into the final result, even if they only respond to a single combination.
At the beginning of the study, there are no priors. (CORRECTION via dlthomas: "There are necessarily priors... we start with no information about rankings... and so assume a 1:1 chance of either object being preferred.) If a participant selects B from {A,B}, the probability of B being the "most preferred" object should go up, and A should go down, if I'm understanding correctly.
NOTE: Direct ranking of objects 1-7 (instead of pairwise comparison) isn't ideal because it takes longer, which may encourage the participant to rationalize. The "pick-one-of-two" approach is designed to be fast, which is better for gut reactions when comparing simple objects like words, photos, etc.
The ideal output looks like this: "Based on ___ total responses, participants prefer Object A. Object A is preferred __% more than Object B (the second most preferred), and ___% more than Object C (the third most preferred)."
Questions:
1. Is Bayes actually the most straightforward way of calculating the "most preferred"? (If not, what is? I don't want to be Maslow's "man with a hammer" here.)
2. If so, can you please walk me through the beginning of how this calculation is done, assuming 10 participants?
Thanks in advance!
Fixed-Length Selective Iterative Prisoner's Dilemma Mechanics
Prerequisites: Basic knowledge of the Prisoner's Dilemma and the Iterated Prisoner's Dilemma.
I recently stumbled upon the selective IPD tournament results, and while I was very interested in the general concept I was also very disappointed by the strategies that were submitted, especially considering this is Less Wrong we are talking about.
This post is designed to help people who are interested in IPD problems to come up with possibly successful strategies; hopefully the same effect another couple of tournaments would have, just in a shorter period of time. All of the following is written with the tournament rules in mind, with scores depicted as deviations of matches with full mutual cooperation since the actual number of turns per match is arbitrary anyway. Also, the hypothetical objective is not to have the highest population after a certain number of generations, but to achieve lasting superiority eventually while treating near-clones that behave exactly the same in the late game as one single strategy. The results of tournaments with a very limited number of generations depend way too much on the pool of submitted strategies to be of general interest, in my opinion.
This post starts out with practical observations and some universal rules and gets increasingly theoretical. Here is a short glossary of terms I presume known:
Feeding on a strategy: Scoring higher against that strategy in a match than against itself.
Outperforming a strategy: Outscoring that strategy over the course of matches against all other strategies in the pool according to populations, including each other and themselves (i.e. improving the population ratio).
Dominance: More than 50% of the total population.
Dormant strategy: A strategy that will achieve dominance at some point but hasn't done so yet.
Extinction: Asymptotical approach to 0% percent of the total population, or actual extinction in case of integer truncation.
TFT-nD: A TFT strategy that defects from the nth last turn on, so TFT-0D is vanilla TFT.
Disclaimer: There is only very basic mathematics and logical reasoning in this post, so if anything seems confusing, it must be me using the wrong words (I'm not a native speaker). Please point out any of these cases so that I can correct them.
Survival And Dominance
Let us assume a scenario with only two strategies, one of them dominating which we call X, the other one A.
Survival Rule:
For A in this scenario not to go extinct regardless of initial population, it must score at least equally high against X as X does against itself, and if it doesn't score higher, it must score at least equally high against itself as X does against itself while not losing direct encounters.
Let us assume X is TFT-0D and A is TFT-1D, in which case the numbers are as follows:
TFT-0D vs TFT-0D = 0 : 0
TFT-1D vs TFT-0D = +3 : -4
The survival ratio is +3 : 0. Therefore, in any situation with only TFT and TFT-1D, the latter cannot go extinct. This still doesn't tell us anything about what's needed for achieving dominance, so let's get on with that.
Dominance rule:
For any strategy to achieve dominance in a two-strategy situation where its survival is guaranteed according to the survival rule or where both strategies initially make up for half of the total population, it needs to outscore the other strategy over the course of these four matches:
X vs X
A vs X (2x)
A vs A
Let us again do the numbers with TFT-0D and TFT-1D:
TFT-0D vs TFT-0D = 0 : 0
TFT-1D vs TFT-0D = +3 : -4
TFT-1D vs TFT-0D = +3 : -4
TFT-1D vs TFT-1D = -3 : -3
On aggregate, the dominance ratio is 0 : -8. Therefore, TFT-1D will achieve dominance.
The conditions for A to exterminate the formerly dominant strategy X follow directly from the conditions for avoiding extinction, and since being the only surviving strategy is not the objective anyway they aren't interesting at this point.
Threshold rule:
If A fulfills the conditions for dominance but not the conditions for survival (i.e. it scores less against X than X does against itself), it will need a certain threshold to avoid extinction and achieve dominance.
Thresholds vary from strategy to strategy, but are obviously always below 50%. The more balanced the survival ratio is, the higher the threshold. In most cases though, the threshold is much too low to be of any relevance in a selective tournament.
You can easily see that any TFT-nD will be dominated by n+1. However, with increasing n the strategy will score lower not only against itself, but also against any TFT-mD with m <= n-2, which makes TFT-nD with large n very unsuccessful in the early generations of the tournament.
A Word About Afterparty
The logical solution to this problem are "Afterparty" strategies that are essentially TFT-nD CliqueBots that return to cooperating if their opponent defected first on the same turn as them. I refer to these strategies as Efficient CliqueBots for reasons that will become apparent later on.
The "Afterparty" strategy suggested on Less Wrong first defects on the sixth last turn, I will therefore call it TFT-D5C from here on. In a TFT-nD dominated tournament, however, TFT-D5C is not the optimum, as you can check by doing the math above. The optimum is in fact TFT-D3C, because it is the TFT-DnC with the lowest n that can exterminate any TFT-DmC with m > n (if the other strategy falls below a certain threshold) as well as dominate any TFT-mD with m >= n - 2 (if it reaches a certain threshold). This means that in a tournament similar to the control group tournament over 1000 generations, it would eventually achieve domination since it is safe from extinction due to scoring equally high respectively higher against TFT-2D and TFT-3D than those score against themselves while winning direct encounters and scoring higher against itself anyway (survival rule):
TFT-DnC vs TFT-DnC = -3 : -3
TFT-2D vs TFT-2D = -6 : -6
TFT-D3C vs TFT-2D = -6 : -13
TFT-3Dvs TFT-3D = -9 : -9
TFT-D3C vs TFT-3D = -6 : -13
Also, it outscores TFT-4D (-32 : -38) and TFT-5D (-38 : -48):
TFT-4D vs TFT-4D = -12 : -12
TFT-D3C vs TFT-4D = -13 : -6
TFT-D3C vs TFT-4D = -13 : -6
TFT-D3C vs TFT-D3C = -3 : -3
TFT-5D vs TFT-5D = -15 : -15
TFT-D3C vs TFT-5D = -16 : -9
TFT-D3C vs TFT-5D = -16 : -9
TFT-D3C vs TFT-D3C = -3 : -3
This means that in a situation where TFT-D3C is initially equally well represented as TFT-4D or TFT-5D, it will eventually outperform those (not taking into account TFT-5D feeding on TFT-4D if both are present).
In any situation with only two strategies both being TFT-DnC variants, TFT-D3C will exterminate any other TFT-DnC strategies once that strategy falls below a certain threshold, because no other TFT-DnC strategy can score higher against TFT-D3C or itself than TFT-D3C does against itself. This is trivially true for all TFT-DnC strategies starting from n = 2:
TFT-D2C vs TFT-D2C = -3 : -3
TFT-D3C vs TFT-D2C = -6 : -13
The threshold decreases for increasing n.
TFT-D3C is also certain to gain a better start than any other TFT-DnC strategies with n > 3 due to higher gain from TFT-nD strategies with n <= 3 that dominate the early game. TFT-D1C is essentially the same as TFT-1D and equally outperformed by TFT-2D, while TFT-D2C cannot prevent TFT-D3C from crossing its threshold as TFT-D3C can feed on TFT-2D as well as on TFT-3D / TFT-D2C. This pretty much leaves TFT-D3C as the only viable TFT-DnC strategy to survive in a selective tournament of this kind. However this doesn't take into account true parasites which I will talk more about later.
CliqueBots
CliqueBots have faired very poorly in this tournament, however that is mostly because they have been very poorly designed. A CliqueBot that needs five defections to identify a clone is doomed from the start, and any strategy that turns into DefectBot after identifying an alien strategy before the last turns is very doomed anyway. It seems that even though participants anticipated TFT-1D and even TFT-2D, no one considered cooperative CliqueBots a possible solution, which is surprising. So let me introduce a CliqueBot that could technically be considered another optimum of TFT-DnC, although it behaves slightly different (it is not an Efficient CliqueBot). The main idea is to defect on the very first turn and cooperate if the opponent's previous move matches its own previous move, i.e. if facing a clone, and playing TFT-2D or TFT-3D otherwise. Also in case the game begins with defect-coop followed by coop-defect, the CliqueBot will cooperate for a second time, since the opponent can be considered a TFT variant and a high number of mutual cooperations is benefitial. To keep the name short and follow the established naming system, I will from here on refer to CliqueBot strategies that defect once on turn i and otherwise behave pretty much as TFT-nD strategies as i/TFT-nD.
Some comparison with selected strategies as described in the rules for dominance (four matches each):
1/TFT-3D vs TFT-0D = -14 : -22
1/TFT-3D vs TFT-1D = -14 : -28
1/TFT-3D vs TFT-2D = -14 : -34
1/TFT-3D vs TFT-3D = -26 : -38
1/TFT-3D vs TFT-4D = -34 : -38
1/TFT-3D vs TFT-5D = -40 : -50
However, the survival of any 1/TFT-nD is guaranteed only against TFT-mD with m = n-1. In addition, any TFT-DmC with m >= n-1 will outperform 1/TFT-nD, so 1/TFT-nD (or any other i/TFT-nD for that matter) cannot prevail.
Parasites, Identification And Efficient CliqueBots
A true parasite is any strategy that pretends to be a clone of its host until the last possible opportunity of gainful defection. TFT is essentially a CliqueBot that identifies its clones by sustained cooperation, and TFT-1D is a parasite of TFT as TFT-2D is of TFT-1D and so on.
Parasite rule:
Since parasites trade points gained from encounters with clones for points gained from encounters with hosts, parasites can never go extinct in a scenario with only them and their hosts, no matter how little their population. Any parasite will also ultimatively achieve dominance in these scenarios, because the points gained from one-sided defection (+7) are more than the points lost from mutual defection (-3).
Of which follows the dormancy rule:
A strategy can only win an open-ended selective IPD tournament if it stays dormant until its own parasite has gone extinct.
This means that theoretically any parasite only needs to survive long enough until it is the only strategy left besides its host to achieve ultimate victory, as was the case with TFT-3D, parasite of TFT-2D in the thousand generations tournament. This can easily be achieved if the host is the dominant strategy, because the parasite is better equipped to feed on the host than any other strategy in the pool, which is a guarantee of survival. However, the situation becomes very difficult for a parasite of a parasite of a dominant host in the early game, which is why TFT-4D is probably the highest TFT-nD able to survive and therefore to achieve dominance in this kind of tournament with integer rounding.
This brings us back to Efficient CliqueBots. All CliqueBots, including vanilla TFT, use identification patterns to identify clones in order to maximise total point gain. Parasites make use of these patterns in order to pretend to be their hosts and maximise their own point gain. Nice CliqueBots like vanilla TFT identify their clones by continuous cooperation, which makes them unable to identify other nice CliqueBots as non-clones. This is why TFT can't ever defect on the last turn when facing CooperateBot and why nice CliqueBots aren't usually considered "real" CliqueBots. Non-nice CliqueBots detect clones by mutual defections at fixed identification turns, and cooperate from there on if facing a clone in order to maximise total point gain and avoid the danger of losing points in the late game. So what's better than a parasite of a dominant host? Well obviously a parasite of a dominant host that's also a CliqueBot - because that's what TFT-DnC strategies are, Efficient CliqueBots.
The decisive feature of Efficient CliqueBots is that their identification turn is late enough so that in case of an alien opponent, they can simply continue defecting until the end, maximizing their efficiency and allowing them to perform better against TFT-nD strategies with higher n. Now the funny thing is that for example TFT-3D is a parasite of not only TFT-2D, but also TFT-D2C, which is what makes TFT-D3C the optimal TFT-DnC because it still outperforms TFT-4D. But is TFT-4D really the optimal parasite of TFT-D3C? Obviously not, because that would be TFT-D2CD - an Efficient CliqueBot parasite of an Efficient CliqueBot parasite (TFT-D3C) of a parasite (TFT-3D) of a parasite (TFT-2D) of a parasite (TFT-1D) of a nice CliqueBot (TFT-0D). According to the parasite rule, in a scenario with only TFT-D3C and TFT-D2CD, the latter will ultimately achieve dominance. Of course it has its own parasite in TFT-DC2D, which in turn has its own parasite TFT-2D2C, which is the host of TFT-2DCD, host of TFT-4D. And TFT-D4C and so on until DefectBot. Obviously DefectBot isn't the solution, it's only a solution to one specific strategy. So what is the solution? Well that mostly depends on the pool of submitted strategies, but the Efficient CliqueBot TFT-D3C still seems a pretty good guess.
Composite Strategies: Parasite-Host Tandems
So how can we make sure our parasite strategy stays dormant long enough? Well there's one less-than obvious choice, which is to turn it into a Composite strategy. A Composite is pretty much what the name says: At the beginning of each match, with some probability execute strategy A, else execute strategy B. For example, a parasite could with 99% probability execute its host's strategy (e.g. TFT-0D), and with 1% execute the actual parasite strategy (TFT-1D). This would in most cases result in the death of the parasite's parasite (TFT-2D) while still resulting in absolute dominance, albeit very slowly. However this would obviously allow a singleton TFT-1D to feed on the Composite as well as on singleton TFT-0D, achieving dominance very quickly, while at the same time feeding TFT-2D. So 1% doesn't seem to be the ideal percentage, and TFT-2D would have been able to feed on TFT-0D anyway, so TFT-1D is a lost cause from the beginning. Also, even if there never was a singleton TFT-1D and TFT-2D, a similar tandem strategy could simply increase the percentage a bit.
So let's stop talking about Parasite-Host Tandems and instead focus on other kinds of Composites that seem more promising.
Composite Strategies: Independent Tandems
These are Composites of two strategies that don't aim to achieve dormancy by keeping their hosts dominant until any own parasites have gone extinct. Instead the idea is that if two independent strategies achieve common dominance, it is less likely that both of their parasites survive. You could obviously increase the number of strategies, but this will create some problems I'll talk more about later. First of all, let us assume a 50-50 tandem of TFT-2D and UtopiaBot, the latter being an imaginary strategy that has no parasites and scores high against itself. Both strategies are highly efficient and will soon eliminate TFT-0D and TFT-1D, but TFT-3D will survive. However, TFT-3D will be unable to achive dominance, since the situation is basically that it can only feed on TFT-2D which makes up only half of the tandem's population, while it still has to deal with UtopiaBot which it is not designed for.
Of course there is a big issue with Independent Tandems: You need to find two somewhat successful strategies with different parasites who score reasonably high against themselves as well as against each other. An obvious candidate for this would be a tandem of an Efficient CliqueBot like TFT-D3C and a regular CliqueBot that continues to cooperate after a single opponent retaliation and defects on the last turns, like 1/TFT-3D. This CliqueBot would cooperate until the end if the initial defection was not met by retaliation (facing the sibling) and TFT-D3C would cooperate until the end if the opponent defected only on the identification turn which it would not retaliate against.
Composite Strategies: Parasite Killer Tandems
Another option would be a tandem of two strategies of which one is the parasite of the other's parasite, e.g. TFT-2D and TFT-D3C, the latter taking care of both TFT-3D and TFT-D2C. Here, the Efficient CliqueBot TFT-D3C would revert to cooperation if the defections on the 3rd and 4th last turns have not been met by retaliation, effectively turning it into TFT-2D2C, and TFT-2D would not retaliate after defections on these turns. The main problem with Parasite Killer Tandems like these is that a singleton TFT-2D will also profit from TFT- 2D2C killing TFT-3D strategies. This is somewhat offset by TFT- 2D2C also slowly killing singleton TFT-2D, but possibly not enough to prohibit TFT-3D from feeding on it. This depends on the strategies in the tandem and the amount of surviving parasites. In addition, the parasite killer's parasite (in this case TFT-D2CD) may prove a problem as well as parasites of both strategies (TFT-4D), although these should be pretty low in numbers at that point, probably unable to achieve dominance.
Composite Strategies: Random CliqueBots
A Composite can consist of an arbitrary high number of strategies, combining Parasite Killers with independent strategies. The only limitations are the effectiveness of the individual strategies and the ability of these strategies to score high against each other, although the latter is not so much a problem as coop-defect loses only 1 point.
In fact coop-defect is so much superior to defect-defect that this brings us to another kind of Composite strategies: Random CliqueBots, which are basically i/TFT-nD CliqueBots with randomized i. For example, if i ranges from 1 to 10, this strategy wouldn't retaliate against an opponent's first defection on either of turns 1 to 10, thus reducing point loss from identifying clones/siblings. With increasing ranges of i, point loss approaches 1 which is much lower than the 6 points lost by regular CliqueBots with fixed i.
Random CliqueBot vs Efficient CliqueBot dominance ratio:
TFT-D3C vs TFT-D3C = -3 : -3
1-∞/TFT-4D vs TFT-D3C = -13 : -13
1-∞/TFT-4D vs TFT-D3C = -13 : -13
1-∞/TFT-4D vs 1-∞/TFT-4D = +3 : -4
Which nets -27 : -32. However, as TFT-D3C always scores 1 point higher against TFT-nD and other TFT-DnC strategies than 1-∞/TFT-4D does, the winner of this duel will depend heavily on the pool of submitted strategies.
In any case, this concludes Composite strategies, so let's get on with finding some replacements for boring ol' TFT.
CRD vs TFT
In the late game, TFT is successful because it doesn't defect first while its ability to retaliate becomes very unimportant except during the very last turns. On the other hand, TFT is successful in the early game because it doesn't let strategies like DefectBots take too much advantage of it while maximising points gained from other nice strategies. But maybe there are other strategies with these same qualities that don't turn into a bloody mess after one random defection or score higher against RandomBots? Here's a very interesting observation for all of you that no one seems to have noticed, and I wouldn't have noticed it either had I not been specifically looking for it. Anyway, this is the observation: In both round-robin tournaments of the control group, the highest finishing TFT-0D variant is C6. Any strategies that finished higher were either TFT-1D or TFT-2D variants. And what is C6?
It is a strategy that always forgives the opponent's first defection.
Had C6 defected on the last two turns, I suspect it would have dominated the tournament until the emergence of TFT-3D, i.e. won the 100 generations tournament. Had it been a forgiving TFT-D3C, it would have won the 1000 generations tournament. There were other strategies that had a chance of forgiving, but they did that on every defection, allowing DefectBots and RandomBots to trample all over them (like TF2T).
In order to maximise gain from RandomBots and other insane strategies, it might also be worth to switch to DefectBot after a total of three or two subsequent opponent's defections. For the sake of simplicity and three-letter acronyms I call this strategy CRD (Cooperate, Retaliate, Defect).
Should CRD prove to outperform TFT in the early game, any other strategy would get one early defection for free, which is pretty much parasite CliqueBot heaven. On the other hand, CRD strategies would be able to feed on any Random CliqueBots.
Final Remarks
This pretty much concludes my thoughts on the theoretical nature of the selective Iterated Prisoner's Dilemma. As I see it, the strategies with the highest chance of success are:
Efficient CliqueBots:
Might work if the correct host is chosen and no true parasites are present. If there is a good chance that other participants will come to the same conclusion regarding the host while it is unlikely for any of the strategy's parasites to be able to survive, it might be beneficial to experiment with TFT variants such as CRD in order to maximise early game growth. This is what gave I the lasting advantage over C4 in the thousand generations tournament.
Regular CliqueBots:
Might work if there is a high number of Efficient CliqueBots hindering each other's process, especially if there are also specialized parasites of those. Regular CliqueBots will also have an advantage if there are lots of forgiving strategies.
Parasite Killer Tandems:
Might work if the correct parasite is chosen as a host and no parasites of the parasite killer survive.
Random CliqueBots:
Will most likely work, except if there are early dominant Efficient CliqueBot strategies on which the Random CliqueBot cannot feed, or if dominant strategies are of the forgiving (CRD) kind.
Regular TFT-nD strategies will be exterminated by Efficient CliqueBots.
There also are a few general guidelines:
- Any strategy needs to be carefully designed, which for (non-Efficient) CliqueBots includes forgiving after opponent retaliaton as well as updating the identity of the opponent in case of defections before or after the identification turn.
- CliqueBots and Composites should not waste a single point when identifying clones, siblings and aliens.
- Any strategy needs to score against any other strategy at least as high as any potential competitors score against that strategy, including most importantly the competitor itself.
- Any strategy needs to score against itself as high as possible.
- It is not necessary or important to win direct encounters if the guidelines above are followed.
- The probability that I have made not a single arithmetic error in all of this post is pretty low, so double-checking the calculations relevant to the strategy in question seems rational.
Please also see this comment for graphical comparison of some of the discussed strategies.
Lastly, a few selected strategies written in pseudocode, with n being the number of turns per match:
Efficient CliqueBot: TFT-D3C
Cooperation on first turn.
Continue with TFT.
If opponent defects on any of turns 1 to n-4:
Continue with TFT.
Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
Defect until end.
Else:
Defect on turn n-3.
If opponent has defected on turn n-3:
Continue with cooperation.
If opponent defects any turn:
Defect until end.
Else:
Defect until end.
Regular CliqueBot: 1/TFT-2D
Defect on first turn.
Cooperate on second turn.
If opponent has defected on first turn and cooperated on second turn:
Cooperate on third turn.
Continue with TFT.
If opponent defects on any of turns 3 to n:
Defect on turns n-1 (if still possible) and n.
Else:
If opponent has defected on both turns:
Defect on third turn.
Else:
Cooperate on third turn.
Continue with TFT until turn n-2.
Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
Defect until end.
Random CliqueBot: 10-19/TFT-2D
Randomly pick an integer i from 10 to 19.
Cooperate on first turn.
Play TFT until turn 9.
If opponent has defected on any of turns 1 to 9:
Continue with TFT.
Defect on turns n-1 and n.
Else:
Continue with cooperate until turn 19.
If opponent has defected a total of two times:
Continue with TFT.
Defect on turns n-1 and n.
Else if opponent has defected once before turn 19:
Cooperate after opponent defection.
Continue with TFT.
Else:
Defect on turn i.
Continue with cooperate.
If opponent has defected on turn i:
If opponent has cooperated on turn i+1:
Continue with TFT.
Else:
Continue with TFT.
Defect on turns n-1 and n.
Else:
If i < 19 and opponent has cooperated on turn i+1:
Continue with TFT.
If opponent defects:
Defect on turns n-1 and n.
Else:
Cooperate.
Continue with TFT.
Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
Defect until end.
P(X = exact value) = 0: Is it really counterintuitive?
I'm probably not going to say anything new here. Someone must have pondered over this already. However, hopefully it will invite discussion and clear things up.
Let X be a random variable with a continuous distribution over the interval [0, 10]. Then, by the definition of probability over continuous domains, P(X = 1) = 0. The same is true for P(X = 10), P(X = sqrt(2)), P(X = π), and in general, the probability that X is equal to any exact number is always zero, as an integral over a single point.
This is sometimes described as counterintuitive: surely, at any measurement, X must be equal to something, and thus its probability cannot be zero since its clearly happened. It can be, of course, argued that mathematical probability is abstract function that does not exactly map to our intuitive understanding of probability, but in this case, I would argue that it does.
What if X is the x-coordinate of a physical object? If classical physics are in question - for example, we pointed a needle at a random point on a 10 cm ruler - then it cannot be a point object, and must have a nonzero size. Thus, we can measure the probability of the 1 cm point lying within the space the end of the needle occupies, a probability that is clearly defined and nonzero.
But even if we're talking about a point object, while it may well occupy a definite and exact coordinate in classical physics, we'll never know what exactly it is. For one, our measuring tools are not that precise. But even if they had infinite precision, statements like "X equals exactly 2.(0)" or "X equals exactly π" contain infinite information, since they specify all the decimal digits of the coordinate into infinity. We would have an infinite number of measurements to confirm it. So while X may objectively equal exactly 2 or π - again, under classical physics - measurers would never know it. At any given point, to measurers, X would lie in an interval.
Then of course there is quantum physics, where it is literally impossible for any physical object, including point objects, to have a definite coordinate with arbitrary precision. In this case, the purely mathematical notion that any exact value is an impossible event turns out (by coincidence?) to match how the universe actually works.
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
I am submitting this on behalf of MazeHatter, who originally posted it here in the most recent open tread. Go there to upvote if you like this submission.
Begin MazeHatter:
I grew up thinking that the Big Bang was the beginning of it all. In 2013 and 2014 a good number of observations have thrown some of our basic assumptions about the theory into question. There were anomalies observed in the CMB, previously ignored, now confirmed by Planck:
http://www.esa.int/Our_Activities/Space_Science/Planck/Planck_reveals_an_almost_perfect_Universe
We are also getting a better look at galaxies at greater distances, thinking they would all be young galaxies, and finding they are not:
http://carnegiescience.edu/news/some_galaxies_early_universe_grew_quickly
http://mq.edu.au/newsroom/2014/03/11/granny-galaxies-discovered-in-the-early-universe/
B. D. Simmons et al. Galaxy Zoo: CANDELS Barred Disks and Bar Fractions. Monthly Notices of the Royal Astronomical Society, 2014 DOI: 10.1093/mnras/stu1817
http://www.sciencedaily.com/releases/2014/10/141030101241.htm
http://www.nasa.gov/jpl/spitzer/splash-project-dives-deep-for-galaxies/#.VBxS4o938jg
Although it seems we don't have to look so far away to find evidence that galaxy formation is inconsistent with the Big Bang timeline.
http://www.natureworldnews.com/articles/7528/20140611/galaxy-formation-theories-undermined-dwarf-galaxies.htm
http://arxiv.org/abs/1406.1799
Another observation is that lithium abundances are way too low for the theory in other places, not just here:
http://news.nationalgeographic.com/news/2014/09/140910-space-lithium-m54-star-cluster-science/
It also seems there is larger scale structure continually being discovered larger than the Big Bang is thought to account for:
http://www.sciencedaily.com/releases/2014/11/141119084506.htm
D. Hutsemékers, L. Braibant, V. Pelgrims, D. Sluse. Alignment of quasar polarizations with large-scale structures. Astronomy & Astrophysics, 2014
http://www.sciencedaily.com/releases/2013/01/130111092539.htm
These observations have been made just recently. It seems that in the 1980's, when I was first introduced to the Big Bang as a child, the experts in the field knew then there were problems with it, and devised inflation as a solution. And today, the validity of that solution is being called into question by those same experts:
http://www.physics.princeton.edu/~steinh/0411036.pdf
What are the odds 2015 will be more like 2014 where we (again) found larger and older galaxies at greater distances, or will it be more like 1983?