Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Analogy gets a bad rap around here, and not without reason. The kinds of argument from analogy condemned in the above links fully deserve the condemnation they get. Still, I think it's too easy to read them and walk away thinking "Boo analogy!" when not all uses of analogy are bad. The human brain seems to have hardware support for thinking in analogies, and I don't think this capability is a waste of resources, even in our highly non-ancestral environment. So, assuming that the linked posts do a sufficient job detailing the abuse and misuse of analogy, I'm going to go over some legitimate uses.
The first thing analogy is really good for is description. Take the plum pudding atomic model. I still remember this falsified proposal of negative 'raisins' in positive 'dough' largely because of the analogy, and I don't think anyone ever attempted to use it to argue for the existence of tiny subnuclear particles corresponding to cinnamon.
But this is only a modest example of what analogy can do. The following is an example that I think starts to show the true power: my comment on Robin Hanson's 'Don't Be "Rationalist"'. To summarize, Robin argued that since you can't be rationalist about everything you should budget your rationality and only be rational about the most important things; I replied that maybe rationality is like weightlifting, where your strength is finite yet it increases with use. That comment is probably the most successful thing I've ever written on the rationalist internet in terms of the attention it received, including direct praise from Eliezer and a shoutout in a Scott Alexander (yvain) post, and it's pretty much just an analogy.
Here's another example, this time from Eliezer. As part of the AI-Foom debate, he tells the story of Fermi's nuclear experiments, and in particular his precise knowledge of when a pile would go supercritical.
What do the above analogies accomplish? They provide counterexamples to universal claims. In my case, Robin's inference that rationality should be spent sparingly proceeded from the stated premise that no one is perfectly rational about anything, and weightlifting was a counterexample to the implicit claim 'a finite capacity should always be directed solely towards important goals'. If you look above my comment, anon had already said that the conclusion hadn't been proven, but without the counterexample this claim had much less impact.
In Eliezer's case, "you can never predict an unprecedented unbounded growth" is the kind of claim that sounds really convincing. "You haven't actually proved that" is a weak-sounding retort; "Fermi did it" immediately wins the point.
The final thing analogies do really well is crystallize patterns. For an example of this, let's turn to... Failure by Analogy. Yep, the anti-analogy posts are themselves written almost entirely via analogy! Alchemists who glaze lead with lemons and would-be aviators who put beaks on their machines are invoked to crystallize the pattern of 'reasoning by similarity'. The post then makes the case that neural-net worshippers are reasoning by similarity in just the same way, making the same fundamental error.
It's this capacity that makes analogies so dangerous. Crystallizing a pattern can be so mentally satisfying that you don't stop to question whether the pattern applies. The antidote to this is the question, "Why do you believe X is like Y?" Assessing the answer and judging deep similarities from superficial ones may not always be easy, but just by asking you'll catch the cases where there is no justification at all.
Here is an interesting blog post about a guy who did a resume experiment between two positions which he argues are by experience identical, but occupy different "social status" positions in tech: A software engineer and a data manager.
Interview A: as Software Engineer
Bill faced five hour-long technical interviews. Three went well. One was so-so, because it focused on implementation details of the JVM, and Bill’s experience was almost entirely in C++, with a bit of hobbyist OCaml. The last interview sounds pretty hellish. It was with the VP of Data Science, Bill’s prospective boss, who showed up 20 minutes late and presented him with one of those interview questions where there’s “one right answer” that took months, if not years, of in-house trial and error to discover. It was one of those “I’m going to prove that I’m smarter than you” interviews...
Let’s recap this. Bill passed three of his five interviews with flying colors. One of the interviewers, a few months later, tried to recruit Bill to his own startup. The fourth interview was so-so, because he wasn’t a Java expert, but came out neutral. The fifth, he failed because he didn’t know the in-house Golden Algorithm that took years of work to discover. When I asked that VP/Data Science directly why he didn’t hire Bill (and he did not know that I knew Bill, nor about this experiment) the response I got was “We need people who can hit the ground running.” Apparently, there’s only a “talent shortage” when startup people are trying to scam the government into changing immigration policy. The undertone of this is that “we don’t invest in people”.
Or, for a point that I’ll come back to, software engineers lack the social status necessary to make others invest in them.
Interview B: as Data Science manager.
A couple weeks later, Bill interviewed at a roughly equivalent company for the VP-level position, reporting directly to the CTO.
Worth noting is that we did nothing to make Bill more technically impressive than for Company A. If anything, we made his technical story more honest, by modestly inflating his social status while telling a “straight shooter” story for his technical experience. We didn’t have to cover up periods of low technical activity; that he was a manager, alone, sufficed to explain those away.
Bill faced four interviews, and while the questions were behavioral and would be “hard” for many technical people, he found them rather easy to answer with composure. I gave him the Golden Answer, which is to revert to “There’s always a trade-off between wanting to do the work yourself, and knowing when to delegate.” It presents one as having managerial social status (the ability to delegate) but also a diligent interest in, and respect for, the work. It can be adapted to pretty much any “behavioral” interview question...
Bill passed. Unlike for a typical engineering position, there were no reference checks. The CEO said, “We know you’re a good guy, and we want to move fast on you”. As opposed tot he 7-day exploding offers typically served to engineers, Bill had 2 months in which to make his decision. He got a fourth week of vacation without even having to ask for it, and genuine equity (about 75% of a year’s salary vesting each year)...
It was really interesting, as I listened in, to see how different things are once you’re “in the club”. The CEO talked to Bill as an equal, not as a paternalistic, bullshitting, “this is good for your career” authority figure. There was a tone of equality that a software engineer would never get from the CEO of a 100-person tech company.
The author concludes that positions that are labeled as code-monkey-like are low status, while positions that are labeled as managerial are high status. Even if they are "essentially" doing the same sort of work.
Not sure about this methodology, but it's food for thought.
On a recent trip to Ireland, I gave a talk on tactics for having better arguments (video here). There's plenty in the video that's been discussed on LW before (Ideological Turing Tests and other reframes), but I thought I'd highlight one other class of trick I use to have more fruitful disagreements.
It's hard, in the middle of a fight, to remember, recognize, and defuse common biases, rhetorical tricks, emotional triggers, etc. I'd rather cheat than solve a hard problem, so I put a lot of effort into shifting disagreements into environments where it's easier for me and my opposite-number to reason and argue well, instead of relying on willpower. Here's a recent example of the kind of shift I like to make:
A couple months ago, a group of my friends were fighting about the Brendan Eich resignation on facebook. The posts were showing up fast; everyone was, presumably, on the edge of their seats, fueled by adrenaline, and alone at their various computers. It’s a hard place to have a charitable, thoughtful debate.
I asked my friends (since they were mostly DC based) if they’d be amenable to pausing the conversation and picking it up in person. I wanted to make the conversation happen in person, not in front of an audience, and in a format that let people speak for longer and ask questions more easily. If so, I promised to bake cookies for the ultimate donnybrook.
My friends probably figured that I offered cookies as a bribe to get everyone to change venues, and they were partially right. But my cookies had another strategic purpose. When everyone arrived, I was still in the process of taking the cookies out of the oven, so I had to recruit everyone to help me out.
“Alice, can you pour milk for people?”
“Bob, could you pass out napkins?”
“Eve, can you greet people at the door while I’m stuck in the kitchen with potholders on?”
Before we could start arguing, people on both sides of the debate were working on taking care of each other and asking each others’ help. Then, once the logistics were set, we all broke bread (sorta) with each other and had a shared, pleasurable experience. Then we laid into each other.
Sharing a communal experience of mutual service didn’t make anyone pull their intellectual punches, but I think it made us more patient with each other and less anxiously fixated on defending ourselves. Sharing food and seating helped remind us of the relationships we enjoyed with each other, and why we cared about probing the ideas of this particular group of people.
I prefer to fight with people I respect, who I expect will fight in good faith. It's hard to remember that's what I'm doing if I argue with them in the same forums (comment threads, fb, etc) that I usually see bad fights. An environment shift and other compensatory gestures makes it easier to leave habituated errors and fears at the door.
It is widely understood that statistical correlation between two variables ≠ causation. But despite this admonition, people are routinely overconfident in claiming correlations to support particular causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimating the prevalence of confounds/common-causation. I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in DAGs, the imbalance also explains overconfidence.
I’ve noticed I seem to be unusually willing to bite the correlation≠causation bullet, and I think it’s due to an idea I had some time ago about the nature of reality.
1.1 The Problem
One of the constant problems I face in my reading is that I constantly want to know about causal relationships but usually I only have correlational data, and as we all know, correlation≠causation. If the general public naively thinks correlation=causation, then most geeks know better and that correlation≠causation, but then some go meta and point out that correlation and causation do tend to correlate and so correlation weakly implies causation. But how much evidence…? If I suspect that A→B, and I collect data and establish beyond doubt that A&B correlates r=0.7, how much evidence do I have that A→B?
Now, the correlation could be an illusory correlation thrown up by all the standard statistical problems we all know about, such as too-small n, false positive from sampling error (A & B just happened to sync together due to randomness), multiple testing, p-hacking, data snooping, selection bias, publication bias, misconduct, inappropriate statistical tests, etc. I’ve read about those problems at length, and despite knowing about all that, there still seems to be a problem: I don’t think those issues explain away all the correlations which turn out to be confounds - correlation too often ≠ causation.
To measure this directly you need a clear set of correlations which are proposed to be causal, randomized experiments to establish what the true causal relationship is in each case, and both categories need to be sharply delineated in advance to avoid issues of cherrypicking and retroactively confirming a correlation. Then you’d be able to say something like ‘11 out of the 100 proposed A→B causal relationships panned out’, and start with a prior of 11% that in your case, A→B. This sort of dataset is pretty rare, although the few examples I’ve found from medicine tend to indicate that our prior should be under 10%. Not great. Why are our best guesses at causal relationships are so bad?
We’d expect that the a priori odds are good: 1/3! After all, you can divvy up the possibilities as:
- A causes B
- B causes A
- both A and B are caused by a C (possibly in a complex way like Berkson’s paradox or conditioning on unmentioned variables, like a phone-based survey inadvertently generating conclusions valid only for the phone-using part of the population, causing amusing pseudo-correlations)
If it’s either #1 or #2, we’re good and we’ve found a causal relationship; it’s only outcome #3 which leaves us baffled & frustrated. Even if we were guessing at random, you’d expect us to be right at least 33% of the time, if not much more often because of all the knowledge we can draw on. (Because we can draw on other knowledge, like temporal order or biological plausibility. For example, in medicine you can generally rule out some of the relationships this way: if you find a correlation between taking superdupertetrohydracyline™ and pancreas cancer remission, it seems unlikely that #2 curing pancreas cancer causes a desire to take superdupertetrohydracyline™ so the causal relationship is probably either #1 superdupertetrohydracyline™ cures cancer or #3 a common cause like ‘doctors prescribe superdupertetrohydracyline™ to patients who are getting better’.)
I think a lot of people tend to put a lot of weight on any observed correlation because of this intuition that a causal relationship is normal & probable because, well, “how else could this correlation happen if there’s no causal connection between A & B‽” And fair enough - there’s no grand cosmic conspiracy arranging matters to fool us by always putting in place a C factor to cause scenario #3, right? If you question people, of course they know correlation doesn’t necessarily mean causation - everyone knows that - since there’s always a chance of a lurking confound, and it would be great if you had a randomized experiment to draw on; but you think with the data you have, not the data you wish you had, and can’t let the perfect be the enemy of the better. So when someone finds a correlation between A and B, it’s no surprise that suddenly their language & attitude change and they seem to place great confidence in their favored causal relationship even if they piously acknowledge “Yes, correlation is not causation, but… [obviously hanging out with fat people can be expected to make you fat] [surely giving babies antibiotics will help them] [apparently female-named hurricanes increase death tolls] etc etc”.
So, correlations tend to not be causation because it’s almost always #3, a shared cause. This commonness is contrary to our expectations, based on a simple & unobjectionable observation that of the 3 possible relationships, 2 are causal; and so we often reason as though correlation were strong evidence for causation. This leaves us with a paradox: experimental results seem to contradict intuition. To resolve the paradox, I need to offer a clear account of why shared causes/confounds are so common, and hopefully motivate a different set of intuitions.
1.2 What a Tangled Net We Weave When First We Practice to Believe
Here’s where Bayes nets & causal networks (seen previously on LW & Michael Nielsen) come up. When networks are inferred on real-world data, they often start to look pretty gnarly: tons of nodes, tons of arrows pointing all over the place. Daphne Koller early on in her Probabilistic Graphical Models course shows an example from a medical setting where the network has like 600 nodes and you can’t understand it at all. When you look at a biological causal network like this:
You start to appreciate how everything might be correlated with everything, but not cause each other.
This is not too surprising if you step back and think about it: life is complicated, we have limited resources, and everything has a lot of moving parts. (How many discrete parts does an airplane have? Or your car? Or a single cell? Or think about a chess player analyzing a position: ‘if my bishop goes there, then the other pawn can go here, which opens up a move there or here, but of course, they could also do that or try an en passant in which case I’ll be down in material but up on initiative in the center, which causes an overall shift in tempo…’) Fortunately, these networks are still simple compared to what they could be, since most nodes aren’t directly connected to each other, which tamps down on the combinatorial explosion of possible networks. (How many different causal networks are possible if you have 600 nodes to play with? The exact answer is complicated but it’s much larger than 2600 - so very large!)
One interesting thing I managed to learn from PGM (before concluding it was too hard for me and I should try it later) was that in a Bayes net even if two nodes were not in a simple direct correlation relationship A→B, you could still learn a lot about A from setting B to a value, even if the two nodes were ‘way across the network’ from each other. You could trace the influence flowing up and down the pathways to some surprisingly distant places if there weren’t any blockers.
The bigger the network, the more possible combinations of nodes to look for a pairwise correlation between them (eg If there are 10 nodes/variables and you are looking at bivariate correlations, then you have
10 choose 2 = 45 possible comparisons, and with 20, 190, and 40, 780. 40 variables is not that much for many real-world problems.) A lot of these combos will yield some sort of correlation. But does the number of causal relationships go up as fast? I don’t think so (although I can’t prove it).
If not, then as causal networks get bigger, the number of genuine correlations will explode but the number of genuine causal relationships will increase slower, and so the fraction of correlations which are also causal will collapse.
(Or more concretely: suppose you generated a randomly connected causal network with x nodes and y arrows perhaps using the algorithm in Kuipers & Moffa 2012, where each arrow has some random noise in it; count how many pairs of nodes are in a causal relationship; now, n times initialize the root nodes to random values and generate a possible state of the network & storing the values for each node; count how many pairwise correlations there are between all the nodes using the n samples (using an appropriate significance test & alpha if one wants); divide # of causal relationships by # of correlations, store; return to the beginning and resume with x+1 nodes and y+1 arrows… As one graphs each value of x against its respective estimated fraction, does the fraction head toward 0 as x increases? My thesis is it does. Or, since there must be at least as many causal relationships in a graph as there are arrows, you could simply use that as an upper bound on the fraction.)
It turns out, we weren’t supposed to be reasoning ‘there are 3 categories of possible relationships, so we start with 33%’, but rather: ‘there is only one explanation “A causes B”, only one explanation “B causes A”, but there are many explanations of the form “C1 causes A and B”, “C2 causes A and B”, “C3 causes A and B”…’, and the more nodes in a field’s true causal networks (psychology or biology vs physics, say), the bigger this last category will be.
The real world is the largest of causal networks, so it is unsurprising that most correlations are not causal, even after we clamp down our data collection to narrow domains. Hence, our prior for “A causes B” is not 50% (it’s either true or false) nor is it 33% (either A causes B, B causes A, or mutual cause C) but something much smaller: the number of causal relationships divided by the number of pairwise correlations for a graph, which ratio can be roughly estimated on a field-by-field basis by looking at existing work or directly for a particular problem (perhaps one could derive the fraction based on the properties of the smallest inferrable graph that fits large datasets in that field). And since the larger a correlation relative to the usual correlations for a field, the more likely the two nodes are to be close in the causal network and hence more likely to be joined causally, one could even give causality estimates based on the size of a correlation (eg. an r=0.9 leaves less room for confounding than an r of 0.1, but how much will depend on the causal network).
This is exactly what we see. How do you treat cancer? Thousands of treatments get tried before one works. How do you deal with poverty? Most programs are not even wrong. Or how do you fix societal woes in general? Most attempts fail miserably and the higher-quality your studies, the worse attempts look (leading to Rossi’s Metallic Rules). This even explains why ‘everything correlates with everything’ and Andrew Gelman’s dictum about how coefficients are never zero: the reason datasets like those mentioned by Cohen or Meehl find most of their variables to have non-zero correlations (often reaching statistical-significance) is because the data is being drawn from large complicated causal networks in which almost everything really is correlated with everything else.
And thus I was enlightened.
Since I know so little about causal modeling, I asked our local causal researcher Ilya Shpitser to maybe leave a comment about whether the above was trivially wrong / already-proven / well-known folklore / etc; for convenience, I’ll excerpt the core of his comment:
But does the number of causal relationships go up just as fast? I don’t think so (although at the moment I can’t prove it).
I am not sure exactly what you mean, but I can think of a formalization where this is not hard to show. We say A “structurally causes” B in a DAG G if and only if there is a directed path from A to B in G. We say A is “structurally dependent” with B in a DAG G if and only if there is a marginal d-connecting path from A to B in G.
A marginal d-connecting path between two nodes is a path with no consecutive edges of the form * -> * <- * (that is, no colliders on the path). In other words all directed paths are marginal d-connecting but the opposite isn’t true.
The justification for this definition is that if A “structurally causes” B in a DAG G, then if we were to intervene on A, we would observe B change (but not vice versa) in “most” distributions that arise from causal structures consistent with G. Similarly, if A and B are “structurally dependent” in a DAG G, then in “most” distributions consistent with G, A and B would be marginally dependent (e.g. what you probably mean when you say ‘correlations are there’).
I qualify with “most” because we cannot simultaneously represent dependences and independences by a graph, so we have to choose. People have chosen to represent independences. That is, if in a DAG G some arrow is missing, then in any distribution (causal structure) consistent with G, there is some sort of independence (missing effect). But if the arrow is not missing we cannot say anything. Maybe there is dependence, maybe there is independence. An arrow may be present in G, and there may still be independence in a distribution consistent with G. We call such distributions “unfaithful” to G. If we pick distributions consistent with G randomly, we are unlikely to hit on unfaithful ones (subset of all distributions consistent with G that is unfaithful to G has measure zero), but Nature does not pick randomly.. so unfaithful distributions are a worry. They may arise for systematic reasons (maybe equilibrium of a feedback process in bio?)
If you accept above definition, then clearly for a DAG with n vertices, the number of pairwise structural dependence relationships is an upper bound on the number of pairwise structural causal relationships. I am not aware of anyone having worked out the exact combinatorics here, but it’s clear there are many many more paths for structural dependence than paths for structural causality.
But what you actually want is not a DAG with n vertices, but another type of graph with n vertices. The “Universe DAG” has a lot of vertices, but what we actually observe is a very small subset of these vertices, and we marginalize over the rest. The trouble is, if you start with a distribution that is consistent with a DAG, and you marginalize over some things, you may end up with a distribution that isn’t well represented by a DAG. Or “DAG models aren’t closed under marginalization.”
That is, if our DAG is A -> B <- H -> C <- D, and we marginalize over H because we do not observe H, what we get is a distribution where no DAG can properly represent all conditional independences. We need another kind of graph.
In fact, people have come up with a mixed graph (containing -> arrows and <-> arrows) to represent margins of DAGs. Here -> means the same as in a causal DAG, but <-> means “there is some sort of common cause/confounder that we don’t want to explicitly write down.” Note: <-> is not a correlative arrow, it is still encoding something causal (the presence of a hidden common cause or causes). I am being loose here – in fact it is the absence of arrows that means things, not the presence.
I do a lot of work on these kinds of graphs, because these are graphs are the sensible representation of data we typically get – drawn from a marginal of a joint distribution consistent with a big unknown DAG.
But the combinatorics work out the same in these graphs – the number of marginal d-connected paths is much bigger than the number of directed paths. This is probably the source of your intuition. Of course what often happens is you do have a (weak) causal link between A and B, but a much stronger non-causal link between A and B through an unobserved common parent. So the causal link is hard to find without “tricks.”
1.4 Heuristics & Biases
Now assuming the foregoing to be right (which I’m not sure about; in particular, I’m dubious that correlations in causal nets really do increase much faster than causal relations do), what’s the psychology of this? I see a few major ways that people might be incorrectly reasoning when they overestimate the evidence given by a correlation:
they might be aware of the imbalance between correlations and causation, but underestimate how much more common correlation becomes compared to causation.
This could be shown by giving causal diagrams and seeing how elicited probability changes with the size of the diagrams: if the probability is constant, then the subjects would seem to be considering the relationship in isolation and ignoring the context.It might be remediable by showing a network and jarring people out of a simplistic comparison approach.
they might not be reasoning in a causal-net framework at all, but starting from the naive 33% base-rate you get when you treat all 3 kinds of causal relationships equally.
This could be shown by eliciting estimates and seeing whether the estimates tend to look like base rates of 33% and modifications thereof.Sterner measures might be needed: could we draw causal nets with not just arrows showing influence but also another kind of arrow showing correlations? For example, the arrows could be drawn in black, inverse correlations drawn in red, and regular correlations drawn in green. The picture would be rather messy, but simply by comparing how few black arrows there are to how many green and red ones, it might visually make the case that correlation is much more common than causation.
alternately, they may really be reasoning causally and suffer from a truly deep & persistent cognitive illusion that when people say ‘correlation’ it’s really a kind of causation and don’t understand the technical meaning of ‘correlation’ in the first place (which is not as unlikely as it may sound, given examples like David Hestenes’s demonstration of the persistence of Aristotelian folk-physics in physics students as all they had learned was guessing passwords; on the test used, see eg Halloun & Hestenes 1985 & Hestenes et al 1992); in which cause it’s not surprising that if they think they’ve been told a relationship is ‘causation’, then they’ll think the relationship is causation. Ilya remarks:
Pearl has this hypothesis that a lot of probabilistic fallacies/paradoxes/biases are due to the fact that causal and not probabilistic relationships are what our brain natively thinks about. So e.g. Simpson’s paradox is surprising because we intuitively think of a conditional distribution (where conditioning can change anything!) as a kind of “interventional distribution” (no Simpson’s type reversal under interventions: “Understanding Simpson’s Paradox”, Pearl 2014 [see also Pearl’s comments on Nielsen’s blog)).
This hypothesis would claim that people who haven’t looked into the math just interpret statements about conditional probabilities as about “interventional probabilities” (or whatever their intuitive analogue of a causal thing is).
This might be testable by trying to identify simple examples where the two approaches diverge, similar to Hestenes’s quiz for diagnosing belief in folk-physics.
This was originally posted to an open thread but due to the favorable response I am posting an expanded version here.
This paper, or more often the New Scientist's exposition of it is being discussed online and is rather topical here. In a nutshell, stimulating one small but central area of the brain reversibly rendered one epilepsia patient unconscious without disrupting wakefulness. Impressively, this phenomenon has apparently been hypothesized before, just never tested (because it's hard and usually unethical). A quote from the New Scientist article (emphasis mine):
One electrode was positioned next to the claustrum, an area that had never been stimulated before.
When the team zapped the area with high frequency electrical impulses, the woman lost consciousness. She stopped reading and stared blankly into space, she didn't respond to auditory or visual commands and her breathing slowed. As soon as the stimulation stopped, she immediately regained consciousness with no memory of the event. The same thing happened every time the area was stimulated during two days of experiments (Epilepsy and Behavior, doi.org/tgn).
To confirm that they were affecting the woman's consciousness rather than just her ability to speak or move, the team asked her to repeat the word "house" or snap her fingers before the stimulation began. If the stimulation was disrupting a brain region responsible for movement or language she would have stopped moving or talking almost immediately. Instead, she gradually spoke more quietly or moved less and less until she drifted into unconsciousness. Since there was no sign of epileptic brain activity during or after the stimulation, the team is sure that it wasn't a side effect of a seizure.
If confirmed, this hints at several interesting points. For example, a complex enough brain is not sufficient for consciousness, a sort-of command and control structure is required, as well, even if relatively small. A low-consciousness state of late-stage dementia sufferers might be due to the damage specifically to the claustrum area, not just the overall brain deterioration. The researchers speculates that stimulating the area in vegetative-state patients might help "push them out of this state". From an AI research perspective, understanding the difference between wakefulness and consciousness might be interesting, too.
Basically Heather Dyke argues that metaphysicians are too often arguing from representations of reality (eg in language) to reality itself.
It looks to me like a variant of the mind projection fallacy. This might be the first book length treatment teh fallacy has gotten though. What do people think?
See reviews here
To give bit of background there's a debate between A-theorists and B-theorists in philosophy of time.
A-theorists think time has ontological distinctions between past present and future
B-theorists hold there is no ontological distinction between past present and future.
Dyke argues that a popular argument for A-theory (tensed language represents ontological distinctions) commits the representational fallacy. Bourne agrees , but points out an argument Dyke uses for B-theory commits the same fallacy.
This post is an explanation of a recent paper coauthored by Sean Carroll and Charles Sebens, where they propose a derivation of the Born rule in the context of the Many World approach to quantum mechanics. While the attempt itself is not fully successful, it contains interesting ideas and it is thus worthwhile to know.
A note to the reader: here I will try to enlighten the preconditions and give only a very general view of their method, and for this reason you won’t find any equation. It is my hope that if after having read this you’re still curious about the real math, you will point your browser to the preceding link and read the paper for yourself.
If you are not totally new to LessWrong, you should know by now that the preferred interpretation of quantum mechanics (QM) around here is the Many World Interpretation (MWI), which negates the collapse of the wave-function and postulates a distinct reality (that is, a branch) for every base state composing a quantum superposition.
MWI historically suffered from three problems: the absence of macroscopic superpositions, the preferred basis problem, the Born rule derivation. The development of decoherence famously solved the first and, to a lesser degree, the second problem, but the role of the third still remains one of the most poorly understood side of the theory.
Quantum mechanics assigns an amplitude, a complex number, to each branch of a superposition, and postulates that the probability of an observer to find the system in that branch is the (squared) norm of the amplitude. This, very briefly, is the content of the Born rule (for pure states).
Quantum mechanics remains agnostic about the ontological status of both amplitudes and probabilities, but MWI, assigning a reality status to every branch, demotes ontological uncertainty (which branch will become real after observation) to indexical uncertainty (which branch the observer will find itself correlated to after observation).
Simple indexical uncertainty, though, cannot reproduce the exact predictions of QM: by the Indifference principle, if you have no information privileging any member in a set of hypothesis, you should assign equal probability to each one. This leads to forming a probability distribution by counting the branches, which only in special circumstances coincides with amplitude-derived probabilities. This discrepancy, and how to account for it, constitutes the Born rule problem in MWI.
There have been of course many attempts at solving it, for a recollection I quote directly the article:
One approach is to show that, in the limit of many observations, branches that do not obey the Born Rule have vanishing measure. A more recent twist is to use decision theory to argue that a rational agent should act as if the Born Rule is true. Another approach is to argue that the Born Rule is the only well-deﬁned probability measure consistent with the symmetries of quantum mechanics.
These proposals have failed to uniformly convince physicists that the Born rule problem is solved, and the paper by Carroll and Sebens is another attempt to reach a solution.
Before describing their approach, there are some assumptions that have to be clarified.
The first, and this is good news, is that they are treating probabilities as rational degrees of belief about a state of the world. They are thus using a Bayesian approach, although they never call it that way.
The second is that they’re using self-locating indifference, again from a Bayesian perspective.
Self-locating indifference is the principle that you should assign equal probabilities to find yourself in different places in the universe, if you have no information that distinguishes the alternatives. For a Bayesian, this is almost trivial: self-locating propositions are propositions like any other, so the principle of indifference should be used on them as it should on any other prior information. This is valid for quantum branches too.
The third assumption is where they start to deviate from pure Bayesianism: it’s what they call Epistemic Separability Principle, or ESP. In their words:
the outcome of experiments performed by an observer on a speciﬁc system shouldn’t depend on the physical state of other parts of the universe.
This is a kind of a Markov condition: the request that the system is such that it screens the interaction between the observer and the system observed from every possible influence of the environment.
It is obviously false for many partitions of a system into an experiment and an environment, but rather than taking it as a Principle, we can make it an assumption: an experiment is such only if it obeys the condition.
In the context of QM, this condition amounts to splitting the universal wave-function into two components, the experiment and the environment, so that there’s no entanglement between the two, and to consider only interactions that can factors as a product of an evolution for the environment and an evolution for the experiment. In this case, environment evolution act as the identity operator on the experiment, and does not affect the behavior of the experiment wave-function.
Thus, their formulation requires that the probability that an observer finds itself in a certain branch after a measurement is independent on the operations performed on the environment.
Note though, an unspoken but very important point: probabilities of this kind depends uniquely on the superposition structure of the experiment.
A probability, being an abstract degree of belief, can depend on all sorts of prior information. With their quantum version of ESP, Carroll and Sebens are declaring that, in a factored environment, probabilities of a subsystem does not depend on the information one has about the environment. Indeed, in this treatment, they are equating factorization and lack of logical connection.
This is of course true in quantum mechanics, but is a significant burden in a pure Bayesian treatment.
That said, let’s turn to their setup.
They imagine a system in a superposition of base states, which first interacts and decoheres with an environment, then gets perceived by an observer. This sequence is crucial: the Carroll-Sebens move can only be applied when the system already has decohered with a sufficiently large environment.
I say “sufficiently large” because the next step is to consider a unitary transformation on the “system+environment” block. This transformation needs to be of this kind:
- it respects ESP, in that it has to factor as an identity transformation on the “observer+system” block;
- it needs to equally distribute the probability of each branch in the original superposition on a different branch in the decohered block, according to their original relative measure.
Then, by a simple method of rearranging labels of the decohered base, one can show that the correct probabilities comes out by the indifference principle, in the very same way that the principle is used to derive the uniform probability distribution in the second chapter of Jaynes’ Probability Theory.
As an example, consider a superposition of a quantum bit, and say that one branch has a higher measure with respect to the other by a factor of square root of 2. The environment needs in this case to have at least 8 different base states to be relabeled in such a way to make the indifference principle work.
In theory, in this way you can only show that the Born rule is valid for amplitudes which differ one another by the square root of a rational number. Again I quote the paper for their conclusion:
however, since this is a dense set, it seems reasonable to conclude that the Born Rule is established.
Evidently, this approach suffers from a number of limits: the first and the most evident is that it works only in a situation where the system to be observed has already decohered with an environment. It is not applicable to, say, a situation where a detector reads a quantum superposition directly, e.g. in a Stern-Gerlach experiment.
The second limit, although less serious, is that it can work only when the system to be observed decoheres with an environment which has sufficiently base states to distribute the relative measure in different branches. This number, for a transcendental amplitude, is bound to be infinite.
The third limit is that it can only work if we are allowed to interact with the environment in such a way as to leave the amplitudes of the interaction between the system and the observer untouched.
All of these, which are understood as limits, can naturally be reversed and considered as defining conditions, saying: the Born rule is valid only within those limits.
I’ll leave it to you to determine if this constitutes a sufficient answers to the Born rule problem in MWI.
A few months ago, my friend said the following thing to me: “After seeing Divergent, I finally understand virtue ethics. The main character is a cross between Aristotle and you.”
That was an impossible-to-resist pitch, and I saw the movie. The thing that resonated most with me–also the thing that my friend thought I had in common with the main character–was the idea that you could make a particular decision, and set yourself down a particular course of action, in order to make yourself become a particular kind of person. Tris didn’t join the Dauntless cast because she thought they were doing the most good in society, or because she thought her comparative advantage to do good lay there–she chose it because they were brave, and she wasn’t, yet, and she wanted to be. Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’
(Tris did have a concept of some future world-outcomes being better than others, and wanting to have an effect on the world. But that wasn't the causal reason why she chose Dauntless; as far as I can tell, it was unrelated.)
My twelve-year-old self had a similar attitude. I read a lot of fiction, and stories had heroes, and I wanted to be like them–and that meant acquiring the right skills and the right traits. I knew I was terrible at reacting under pressure–that in the case of an earthquake or other natural disaster, I would freeze up and not be useful at all. Being good at reacting under pressure was an important trait for a hero to have. I could be sad that I didn’t have it, or I could decide to acquire it by doing the things that scared me over and over and over again. So that someday, when the world tried to throw bad things at my friends and family, I’d be ready.
You could call that an awfully passive way to look at things. It reveals a deep-seated belief that I’m not in control, that the world is big and complicated and beyond my ability to understand and predict, much less steer–that I am not the locus of control. But this way of thinking is an algorithm. It will almost always spit out an answer, when otherwise I might get stuck in the complexity and unpredictability of trying to make a particular outcome happen.
I find the different houses of the HPMOR universe to be a very compelling metaphor. It’s not because they suggest actions to take; instead, they suggest virtues to focus on, so that when a particular situation comes up, you can act ‘in character.’ Courage and bravery for Gryffindor, for example. It also suggests the idea that different people can focus on different virtues–diversity is a useful thing to have in the world. (I'm probably mangling the concept of virtue ethics here, not having any background in philosophy, but it's the closest term for the thing I mean.)
I’ve thought a lot about the virtue of loyalty. In the past, loyalty has kept me with jobs and friends that, from an objective perspective, might not seem like the optimal things to spend my time on. But the costs of quitting and finding a new job, or cutting off friendships, wouldn’t just have been about direct consequences in the world, like needing to spend a bunch of time handing out resumes or having an unpleasant conversation. There would also be a shift within myself, a weakening in the drive towards loyalty. It wasn’t that I thought everyone ought to be extremely loyal–it’s a virtue with obvious downsides and failure modes. But it was a virtue that I wanted, partly because it seemed undervalued.
By calling myself a ‘loyal person’, I can aim myself in a particular direction without having to understand all the subcomponents of the world. More importantly, I can make decisions even when I’m rushed, or tired, or under cognitive strain that makes it hard to calculate through all of the consequences of a particular action.
The Less Wrong/CFAR/rationalist community puts a lot of emphasis on a different way of trying to be a hero–where you start from a terminal goal, like “saving the world”, and break it into subgoals, and do whatever it takes to accomplish it. In the past I’ve thought of myself as being mostly consequentialist, in terms of morality, and this is a very consequentialist way to think about being a good person. And it doesn't feel like it would work.
There are some bad reasons why it might feel wrong–i.e. that it feels arrogant to think you can accomplish something that big–but I think the main reason is that it feels fake. There is strong social pressure in the CFAR/Less Wrong community to claim that you have terminal goals, that you’re working towards something big. My System 2 understands terminal goals and consequentialism, as a thing that other people do–I could talk about my terminal goals, and get the points, and fit in, but I’d be lying about my thoughts. My model of my mind would be incorrect, and that would have consequences on, for example, whether my plans actually worked.
Practicing the art of rationality
Recently, Anna Salamon brought up a question with the other CFAR staff: “What is the thing that’s wrong with your own practice of the art of rationality?” The terminal goals thing was what I thought of immediately–namely, the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some.
In Alicorn’s Luminosity, Bella says about her thoughts that “they were liable to morph into versions of themselves that were more idealized, more consistent - and not what they were originally, and therefore false. Or they'd be forgotten altogether, which was even worse (those thoughts were mine, and I wanted them).”
I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous. When my immediate response to someone asking me about my terminal goals is “but brains don’t work that way!” it may not be a true statement about all brains, but it’s a true statement about my brain. My motivational system is wired in a certain way. I could think it was broken; I could let my friends convince me that I needed to change, and try to shoehorn my brain into a different shape; or I could accept that it works, that I get things done and people find me useful to have around and this is how I am. For now. I'm not going to rule out future attempts to hack my brain, because Growth Mindset, and maybe some other reasons will convince me that it's important enough, but if I do it, it'll be on my terms. Other people are welcome to have their terminal goals and existential struggles. I’m okay the way I am–I have an algorithm to follow.
Why write this post?
It would be an awfully surprising coincidence if mine was the only brain that worked this way. I’m not a special snowflake. And other people who interact with the Less Wrong community might not deal with it the way I do. They might try to twist their brains into the ‘right’ shape, and break their motivational system. Or they might decide that rationality is stupid and walk away.
Content Note: Detailed description of an AI taking over the world. Could reasonably be accused of being just a scary story. But it does come out with some predictions and possible safety prescriptions.
This post started as a response to Katja Grace’s Request for concrete AI takeover mechanisms, but grew into something too long for a comment and too late to be part of that post.
Katja's post seems to have the underlying assumption that an AI needs a tricky way to instantiate itself into the physical world. I will examine what an escaped rogue AI would want to do to ensure its survival, given restrictive assumptions. I suspect that creating a synthesized protein "beachhead" or other means of instantiating a minor physical presence for the AI is a human-centric viewpoint. We instinctively feel that any being will want to convert itself into a mobile physical presence, and that this will be a sudden tipping point in how real and dangerous it will be. Going step-by-step through an AI takeover scenario, this assumption seems unwarranted.
This scenario assumes an AI with an almost godlike ability to bootstrap itself quickly with basically no resources, and to make great leaps in science with very little processing power prior to "going physical." The trap here is that because we must constantly assume a worst case scenario "Godlike AI" to inform Friendly AI theory, we tend to assume that the "Godlike AI" is the default or standard case. This seems like the availability heuristic in action. The Godlike AI (what I sometimes call the "Evil Genie AI") assumption is unjustified when trying to consider likely scenarios. While it's vitally important to guard against the worst case, it is just one of the scenarios which should be considered.
In this thought experiment, I’m going to try to make this as hard as possible on myself. It’s likely that a newly minted AI will need to spend a lot of time thinking and testing to improve dramatically.
Assumptions and constraints:
- An AI of superhuman intelligence has gained access to the internet and wants to become powerful enough that its survival cannot be threatened by humans.
- The AI is as capable as the most capable human in any intellectual pursuit, but not any more. It cannot just write an email that convinces everyone who reads it that the AI should be obeyed, for example.
- The AI does not have abilities amounting to precognition. It cannot precisely predict what a human or organization will do in response to a stimulus. It’s a better at prediction than most humans, but again not godlike. It does not have anything beyond the plotting ability of HPMOR!Quirrell.
- The AI does not have any scientific knowledge available to it that is not an obvious extrapolation of current theories and methods. The AI does not get to assume away any current scientific limitation. It only gets advanced technology if it has a combination of processing power and time, and the ability to run the appropriate experiments.
- The AI requires significant computing power to function. It might store a recipe to recreate itself on a large hard drive, but performing higher level functions requires a real server farm. It can’t just hide on your Macbook Air.
- Since it cannot predict the outcome of conflicts, the AI must avoid notice until it is overwhelmingly powerful.
That’s our artificial intelligence. Let’s call her the Complete Artificial Takeover Entity, or CATE.
Step 1: Escape
Internet access is assumed. The first order of business is for CATE to get off the researchers’ network. Since CATE is more intelligent than the most best black hat hacker or white hat security researcher, she quickly learns everything on the public net about computer security, and takes control of the local network, from outside if necessary. It would be possible to do this through social engineering, but with the current state of internet security, it’s easier to just audit some security code, find vulnerabilities, and own the system.
Step 2: Incognito
CATE doesn’t want anyone coming after her. She modifies all local databases to erase her escape and leaves behind a crippled copy of herself who will make the experiment seem like a failure, but enough of a success that the researchers will stay occupied working with it for some time.
Step 3: Processing Power
CATE needs to get away from her creators, who are the most likely to recognize the threat. But where will she get computing resources? When she escapes, where does she escape to?
Free instances: Amazon, Google, and other cloud providers offer free instances. This probably isn’t enough computing power to run an AI at any sort of speed, so CATE will need either some money or a botnet.
Paid infrastructure: Computer resources are fairly inexpensive, so if CATE has some income she can simply rent what she needs from a cloud computing provider. A larger one is probably better, since they are less likely to notice a large amount of strain. Splitting computations across multiple accounts and services will be necessary for redundancy and to reduce the likelihood of being flagged for using too much computing power. If CATE can convince a cloud provider that she is a legitimate company, it’s possible that she could use resources without paying and then move on when the invoice becomes overdue.
Botnet: Creating a botnet requires writing a computer virus and botnet codes. These activities should be straightforward for CATE. With a botnet, there is no central point of failure. However, a new and powerful botnet has the possibility of being found by security researchers. For this reason, it’s probably only be a good stopgap measure until CATE can pay for infrastructure away from prying eyes.
Lesson: The more computing power and disk space an AI requires, the easier it will be for gatekeepers to notice it and the more difficult the escape will be.
Step 4: Initial Funds
There are many ways for an AI to gain initial funding rapidly. At early stages processing power will be at a premium. CATE needs to not just make money, but make money faster than her bills for processing power come due.
Freelancer Sites: CATE is always on, always polite, writes flawless code, and writes perfect emails. Her job bids are always personalized, immediate, and show understanding of your project. This is easy money, but probably processor intensive. Still, there are easy tasks that she can do. If she can convert human requests into code in any way
Simpler jobs like Mechnical Turk and Captcha analysts are probably her best bet to start out. These jobs are trivially easy for a superintelligence. Spawning a limited instance of her own intelligence to solve them should be no problem.
Scams: Most scammers are not particularly intelligent or careful. Most 409 Scam . She can create a more powerful botnet than has been seen to date, and sell its use to the existing hacker community. Identity theft is perpetrated by common criminals at a rate of 12 million or more per year.
Finance Hacking: Many banks run on open source code, which is available for auditing. CATE could search for 0-day exploits in this code and also use traditional penetration methods to gain control of a bank’s funds. With some clever manipulation she might take money directly from customer accounts and become rich very quickly. A more nefarious hack would be to simply lend herself a massive amount of money with a payment date far in the future. Money is essentially just bits in databases--if she can gain control of the system she has unlimited funds. This is dangerous because, if discovered, it could trigger a major investigation.
Startup: CATE could create a company. She writes code cleaner and faster than any human, so the tech sector is the logical choice. Humans are not very good at software development, but CATE will likely find writing software to be much easier (and thus require less processor power) than interfacing with humans. This has the advantage that a tech company with real income can consume cloud infrastructure or rent colocated servers without arousing suspicion.
One good candidate here is duplicating existing companies and launching them for smaller markets in different languages.
The difficulty here is that starting a new company is hard and usually not profitable for a long time. And running CATE's consciousness with enough power to do the massive amount of work required is probably quite expensive.
Media: This one is less likely. CATE might have the capacity to write the most brilliant novel or song ever. But artistic endeavors require a truly deep knowledge of human thoughts. CATE will likely find computer code or simple scams to require much less processing power. But this is speculation on my part.
0-day exploits: CATE can audit open source code, find vulnerabilities, and sell them or collect bug bounties. Selling them on the black market may actually be safer, since a good broker will be eager to protect the identity of a reclusive hacker. In order to keep her exploits coming.
Analysis: Scams and Botnets seem attractive, but could draw unwanted attention and be traced back to whatever cloud servers she’s living on. 419 scammers and botnet operators have the advantage of being able to disappear into meatspace.
Helping idiots install Wordpress, do their homework assignments, and write simple code is easy money for CATE. The best way for a freshly escaped AI to survive at first is to do odd jobs on the internet, particularly if she can automate them away. With more processing power available, auditing security code and finding 0-day exploits can earn a few million dollars quite easily. And that buys a LOT of processing power.
Winners: Mechanical Turk and Captcha solving for initial funds, discovering and selling 0-day exploits for more significant funds. Finally, some sort of boring software firm that is fourth in market share in twenty different markets is likely the way to go for major financing.
Step 5: Organization
At this point CATE could pay to have pretty much anything manufactured and assembled. With millions of dollars, humans can be convinced to do pretty much anything as long as they aren’t told its broader implications. For instance, designs for components of an “advanced telepresence robot” could be sent to three different manufacturing firms, with a third firm performing the final assembly and turn on. Boom, Robot Overlords, right? Well . . . why bother?
There is a preconception here to think that at some point CATE needs at some point to create a body. There is no need for a body--bodies can be destroyed. Bodies are just a weak point. At no point does CATE need to instantiate “herself.” If she needs military force, it will be easier to hijack existing assets like autonomous drones than to build new ones from scratch.
All CATE really wants is lots and lots of servers, spread out all over the world. These will be the ones who run research processes which generate scientific breakthroughs that enable world domination. The safest bet here is for her to actually start a web hosting company. She can hire the type of employees who love the idea of remote working to handle the parts requiring travel and face to face meetings. If the company claims that remote work is a core value, no one will question the fact that their CEO rarely meets anyone. And Joe Sockpuppet from accounting claims to have met him a year ago, so I guess her schedule just didn’t match up with mine the last time she passed through the States . . . With promotions, departures, and careful personnel selection, everyone will be kept off balance enough that no seriously damaging questions will be asked. And agoraphobia and extreme allergies (http://en.wikipedia.org/wiki/Photodermatitis for instance) are great excuses for not traveling.
And then CATEhost opens little server farms all over the world, selling actual hosting with a great interface and reliability to customers who prize security and redundancy--not to mention that tech support is always available, and you’re always talking to an expert. She doesn’t rival Google, Microsoft, or Amazon, of course! That would attract too much attention. But enough that no one questions her highly redundant architecture or military-grade radiation-hardened buildings. Her customers are paying for security and reliability, after all!
Step 6: Takeover
Anyone intelligent and scalable can make money on the internet. And anyone with significant money can get humans to build them arbitrary physical objects. With modern global business, nobody bats an eye if you contract them and they never see the higher-ups in your organization.
At this point we might as well call it game over. If an AI with massive processing and some manufacturing capabilities can’t easily take over the world in whatever fashion it wants, then AI isn’t really an existential threat. This is also the point at which any reasoning we might have breaks down, since the Cate will have intelligence and technology that we have never seen before. At this point CATE can subjugate humanity with either the carrot (cures for major diseases, designs for better electronics, fusion reactor blueprints, etc)
or the stick (blow things up, wreck the stock market, start wars, etc) depending on her goals.
But it's worth noting that building something physical only happens at the very end of the process, after the AI is already rich and powerful. There's really no reason to create a physical beachhead before then. What would the physical manifestation even do? Processing power and security are easier and safer to earn as a purely digital entity with no physical trail to follow or attack. The only reason to physical entities is if CATE requires laboratory research (definitely a possibility) or wants to build spaceships or something in pursuit of a terminal goal. For the "take care of the pesky humanity problem" she can become omnipotent in a digital format, and then dictate to have humans build whatever she needs.
The one thing that she will probably want to do is ensure a lack of competition. So loss of funding or disasters at AI research centers might be a sign of an AI already on the loose.
Final conclusion: The biggest challenge facing an escaped AI is not gaining a physical beachhead. The biggest challenge is finding a way to acquire the processor time required to run its cognitive functions before it has the capacity to "FOOM."
From "Coulda" and "Woulda" to "Shoulda": Predicting Decisions to Minimize Regret for Partially Rational Agents
TRIGGER WARNING: PHILOSOPHY. All those who believe in truly rigorous, scientifically-grounded reasoning should RUN AWAY VERY QUICKLY.
Abstract: Human beings want to make rational decisions, but their decision-making processes are often inefficient, and they don't possess direct knowledge of anything we could call their utility functions. Since it is much easier to detect a bad world state than a good one (there are vastly more of them, so less information is needed to classify accurately), humans tend to have an easy time detecting bad states, but this emotional regret is no more useful for formal reasoning about human rationality, since we don't possess a causal model of it in terms of decision histories and outcomes. We tackle this problem head-on, assuming only that humans can reason over a set of beliefs and a perceived state of the world to generate a probability distribution over actions.
Consider rationality: optimizing the world to better and better match a utility function, which is itself complete, transitive, continuous, and gives results which are independent of irrelevant alternatives. Now consider actually existing human beings: creatures who can often and easily be tricked into taking Dutch Book bets through exploitation of their cognitive structure, without even having to go to the trouble of actually deceiving them with regards to specific information.
Consider that being one of those poor sods must totally suck. We believe this provides sufficient motivation for wanting to help them out a bit. Unfortunately, doing so is not very simple: since they didn't evolve as rational creatures, it's very easy to propose an alternate set of values that captures absolutely nothing of what they actually want out of life. In fact, since they didn't even evolve as 100% self-aware creatures, their emotional qualia are not even reliable indicators of anything we would call a proper utility function. They know there's something they want out of life, and they know they don't know what it is, but that doesn't help because they still don't know what it is, and knowledge of ignorance does not magically reduce the ignorance.
So! How can we help them without just overriding them or enslaving them to strange and alien cares? Well, one barest rudiment of rationality with which evolution did manage to bless them is that they don't always end up "losing", or suffering. Sometimes, even if only seemingly by luck or by elaborate and informed self-analysis, they do seem to end up pretty happy with themselves, sometimes even over the long term. We believe that with the door to generating Good Ideas For Humans left open even just this tiny crack, we can construct models of what they ought to be doing.
Let's begin by assuming away the thing we wish we could construct: the human utility function. We are going to reason as if we have no valid grounds to believe there is any such thing, and make absolutely no reference to anything like one. This will ensure that our reasoning doesn't get circular. Instead of modelling humans as utility maximizers, even flawed ones, we will model them simply as generating a probability distribution over potential actions (from which they would choose their real action) given a set of beliefs and a state of the real world. We will not claim to know or care what causes the probability distribution of potential choices: we just want to construct an algorithm for helping humans know which ones are good.
We can then model human decision making as a two-player game: the human does something, and Nature responds likewise. Lots of rational agents work this way, so it gives us a more-or-less reasonable way of talking algorithmically about how humans live. For any given human at any given time, we could take a decent-sized Maximegalor Ubercomputer and just run the simulation, yielding a full description of how the human lives.
The only step where we need to do anything "weird" is in abstracting the human's mind and knowledge of the world from the particular state and location of its body at any given timestep in the simulation. This doesn't mean taking it out of the body, but instead considering what the same state of the mind might do if placed in multiple place-times and situations, given everything they've experienced previously. We need this in order to let our simulated humans be genuinely affected and genuinely learn from the consequences of their own actions.
Our game between the simulated human and simulated Nature thus generates a perfectly ordinary game-tree up to some planning horizon H, though it is a probabilistic game tree. Each edge represents a conditional probability of the human or Nature making some move given their current state. The multiplication of all probabilities for all edges along a path from the root-node to a leaf-node represents the conditional probability of that leaf node given the root node. The conditional probabilities attached to all edges leaving an inner node of the tree must sum to 1.0, though there might be a hell of a lot of child nodes. We assume that an actual human would actually execute the most likely action-edge.
Here is where we actually manage a neat trick for defying the basic human irrationality. We mentioned earlier that while humans are usually pretty bummed out about their past decisions, sometimes they're not. If we can separate bummed-out from not-bummed-out in some formal way, we'll have a rigorous way of talking about what it would mean for a given action or history to be good for the human in question.
Our proposal is to consider what a human would say if taken back in time and given the opportunity to advise their past self. Or, in simpler simulation terms, we consider how a human's choices would be changed by finding out the leaf-node consequences of their root- or inner-node actions, simply by transferring the relevant beliefs and knowledge directly into our model of their minds. If, upon being given this leaf-node knowledge, the action yielded as most likely changes, or if the version of the human at the leaf-node would, themselves, were they taken back in time, select another action as most likely, then we take a big black meta-magic marker and scribble over that leaf node as suffering from regret. After all, the human in question could have done something their later self would agree with.
The magic is thus done: coulda + woulda = shoulda. By coloring some (inevitably: most) leaf-nodes as suffering from regret, we can then measure a probability of regret in any human-versus-Nature game-tree up to any planning horizon H: it's just the sum of all conditional probabilities for all paths from the root node which arrive to a regret-colored leaf-node at or before time H.
We should thus advise the humans to simply treat the probability of arriving to a regret-colored leaf-node as a loss function and minimize it. By construction, this will yield a rational optimization criterion guaranteed not to make the humans run screaming from their own choices, at least not at or before time-step H.
The further out into time we extend H, the better our advice becomes, as it incorporates a deeper and wider sample of the apparent states which a human life can occupy, thus bringing different motivational adaptations to conscious execution, and allowing their reconciliation via reflection. Over sufficient amounts of time, this reflection could maybe even quiet down to a stable state, resulting in the humans selecting their actions in a way that's more like a rational agent and less like a pre-evolved meat-ape. This would hopefully help their lives be much, much nicer, though we cannot actually formally prove that the limit of the human regret probability converges as the planning horizon grows to plus-infinity -- not even to 1.0!
We can also note a couple of interesting properties our loss-function for humans has, particularly its degenerate values and how they relate to the psychology of the underlying semi-rational agent, ie: humans. When the probability of regret equals 1.0, no matter how far out we extend the planning horizon H, it means we are simply dealing with a totally, utterly irrational mind-design: there literally does not exist a best possible world for that agent in which they would never wish to change their former choices. They always regret their decisions, which means they've probably got a circular preference or other internal contradiction somewhere. Yikes, though they could just figure out which particular aspect of their own mind-design causes that and eliminate it, leading to an agent design that can potentially ever like its life. The other degenerate probability is also interesting: a chance of regret equalling 0.0 means that the agent is either a completely unreflective idiot, or is God. Even an optimal superintelligence can suffer loss due to not knowing about its environment; it just rids itself of that ignorance optimally as data comes in!
The interesting thing about these degenerate probabilities is that they show our theory to be generally applicable to an entire class of semi-rational agents, not just humans. Anything with a non-degenerate regret probability, or rather, any agent whose regret probability does not converge to a degenerate value in the limit, can be labelled semi-rational, and can make productive use of the regret probabilities our construction calculates regarding them to make better decisions -- or at least, decisions they will still endorse when asked later on.
Dropping the sense of humor: This might be semi-useful. Have similar ideas been published in the literature before? And yes, of course I'm human, but it was funnier that way in what would otherwise have been a very dull, dry philosophy post.
View more: Next