Looking for a likely cause of a mental phenomenon
I'm currently testing a promising direction for a possible collection of units at CFAR. (For those who have attended some CFAR events or test sessions, this is a collection of refinements to the fudoshin/"panic" unit.) I've hit on what I think is a key puzzle whose answer might unlock a lot of the emerging art of rationality. I - and possibly most people here, eventually - would very much appreciate any insight you have to share.
The puzzle is how thought incubation works, ideally expressed in terms of neural systems or neuroanatomical structures. I'll first explain the phenomenon and then suggest the general reference class from which I'm hoping to get an answer.
The Phenomenon:
Mathematicians frequently report that often one of the most helpful things they can do to solve a problem they're stuck on is step away from it. Jacques Hadamard (1949) examined his own experiences and also talked to many of his colleagues to work out what the common structure of this experience was, and determined that there seems to be a fairly predictable sequence to it:
(1) Intensely focus on the problem, working through every permutation you can think of that's likely to produce an answer.
(2) Walk away from the problem and think about something else.
(3) The magic genie in your head might eventually, and often unexpectedly, yell a possible insight into your awareness.
For instance, Henri Poincaré reported struggling to work on Fuchsian functions over the course of several weeks and then being forced to walk away from the proof he had been stuck on due to a planned vacation. One day he was stepping onto a bus with his mind certainly not on mathematics, and suddenly the key insight he needed to finish the proof appeared in his mind. It was as though a part of his mind had been secretly working on the problem and then brought the finished product into his awareness. In this particular case it also came with a feeling of total confidence that verification would pan out (although Hadamard notes that the validation step after the insight is still essential because sometimes that feeling of total confidence is mistaken).
I definitely relate to this from when I was working on graduate mathematics. However, it also pattern-matches with other mental phenomena that are much more common. For instance, sometimes I think I know what a person's name is, but struggle as I might I can't quite remember it - and then a few minutes later after I've given up remembering the name the answer loudly announces itself, often quite out-of-context. Or when I'm trying to figure out a way of improving a throw in martial arts and then find the answer suddenly dawning on me at a random time.
I'm under the impression that this is a fairly universal kind of experience. I suspect you can think of examples in your own life where this has happened. ("Oh, now I remember where I put those keys!")
Reference Class for an Explanation:
I'm going to offer some overly simplistic examples of the kind of explanation I'm looking for. In this case, I think overly simplistic might be okay since I'm just trying to get a reasonable handle on how to munchkin the interaction between a few different neural systems. If it turns out that a more detailed and technically correct version is important, I'll probably dig into it (pending the VOI versus cost-of-information comparison).
There seems to be some evidence that one of the reasons children are as impulsive as they are is that they haven't yet developed their prefrontal cortices (PFCs) to the degree adults have. The prefrontal cortex seems to do at least two things: (1) hold long-term goals in mind and (2) engage executive function (i.e., halt orders on impulses, typically ones that don't match up with the long-term goals). This neuroanatomical structure seems to continue growing until sometime in one's early 20s - which might be why we also find that teenagers typically have less impulse control than twentysomethings but more than middle-schoolers, whereas we don't find such a clear distinction between twentysomethings and thirtysomethings. (Yes, this could also or even instead be cultural. I know it's complicated.) Incidentally, I understand that the PFC is also one of the neural structures most deactivated by alcohol - although my impression is that it shuts down the long-term goals thing and not the executive function. (This is based on my and others' experience that precommitment works perfectly well. It seems to me that saying things like "I couldn't help myself because I was drunk!" is more a social excuse than an actual explanation. But I'm only around 65% confident of this as a general claim.)
On a related note, it would seem that there's something in the same rough space as theory of mind that goes beyond the ability to pass the false-belief test. According to Rebecca Saxe, the capacity for empathy seems to come from a particular bit of the brain that doesn't finish growing until the mid-20s. Saxe also provides some evidence that a sufficiently strong and precisely directed magnet can basically deactivate that part of one's theory of mind temporarily. It seems quite plausible to me (though I really don't know) that activation of the sympathetic nervous system (SNS), such as in fight-or-flight reactions, decreases activation of this empathy part of the brain. This might be why, in a perceived crisis, some people switch to an almost tool-like view of others (e.g., knowing that overcoming the bystander effect requires pointing at a specific person and saying "You! Call 911!" but not really getting a sense in that moment of what that person's experience is like to be so singled out).
I'm quite aware that much of the above is speculation. I think speculation is fine, but having it grounded in some actual known neuroscience is ideal. That would give me something to dig into. E.g., if there's some reason to believe that this phenomenon is related to the enteric nervous system, I can start digging into the literature on that system to better understand how to munchkin its interactions with the (rest of the) autonomic nervous system.
An example of something outside the reference class I'm looking for is a "little man in the subconscious" explanation. I first read about this about twenty years ago as a model for how mental incubation works: you concentrate on a problem in order to communicate to a little man in your subconscious what you want to have done, and then you stop talking to him so he can go do what you just told him to do. Then he comes back with an answer once he's done, without regard to what you're doing when he's done. I agree that this seems to be a reasonable metaphor for what's going on, but it doesn't tell me for instance why the "little man" seems to respond so much more to SNS activity than parasympathetic activity, or why he can't go do his job once he has the instructions even if we continue to think about the problem.
More generally, psychodynamic "explanations" are unlikely to be helpful here. Talking about this as the "domain of the iNtuiting function" in reference to Jungian psychodynamic theory or Myers-Briggs won't tell me hardly anything about how this relates to stress oscillation.
So... Any suggestions about what this mysterious "little man" might actually be made of?
One thousand tips do not make a system
So, I've been thinking. We ought to have a system for rationality. What do I mean?
Well, consider a real-time strategy game like Starcraft II. One of the most important things to do in SC2 is macromanagement: making sure that your resources are all being used sensibly. Now, macromanagement could be learned as a big, long list of tips. Like this:
- Try to mine minerals.
- Recruit lots of soldiers.
- Recruit lots of workers.
- It's a good idea for a mineral site to have between 22 and 30 workers.
- Workers are recruited at a command center.
- Soldiers are recruited at a barracks.
- In order to build anything, you need workers.
- In order to build anything, you also need minerals.
- For that matter, in order to recruit more units, you need minerals.
- Workers mine minerals.
- Minerals should be used immediately; if you're storing them, you're wasting them.
[LINK] "Moral Machines" article in the New Yorker links to SI paper
Within two or three decades the difference between automated driving and human driving will be so great you may not be legally allowed to drive your own car, and even if you are allowed, it would immoral of you to drive, because the risk of you hurting yourself or another person will be far greater than if you allowed a machine to do the work.
That moment will be significant not just because it will signal the end of one more human niche, but because it will signal the beginning of another: the era in which it will no longer be optional for machines to have ethical systems.
The discussion itself is mainly concerned with the behavior of self-driving cars and robot soldiers rather than FAI, but Marcus does obliquely reference the prickliness of the problem. After briefly introducing wireheading (presumably as an example of what can go wrong), he links to http://singularity.org/files/SaME.pdf, saying:
Almost any easy solution that one might imagine leads to some variation or another on the Sorceror’s Apprentice, a genie that’s given us what we’ve asked for, rather than what we truly desire.
He also mentions FHI and Yale Bioethics Center along with SingInst:
A tiny cadre of brave-hearted souls at Oxford, Yale, and the Berkeley California Singularity Institute are working on these problems, but the annual amount of money being spent on developing machine morality is tiny.
It's a mainstream introduction, and perhaps not the best or most convincing one, but I think it's a positive development that machine ethics is getting a serious treatment in the mainstream media.
LW Women- Minimizing the Inferential Distance
Standard Intro
The following section will be at the top of all posts in the LW Women series.
About two months ago, I put out a call for anonymous submissions by the women on LW, with the idea that I would compile them into some kind of post. There is a LOT of material, so I am breaking them down into more manageable-sized themed posts.
Seven women submitted, totaling about 18 pages.
Crocker's Warning- Submitters were told to not hold back for politeness. You are allowed to disagree, but these are candid comments; if you consider candidness impolite, I suggest you not read this post
To the submittrs- If you would like to respond anonymously to a comment (for example if there is a comment questioning something in your post, and you want to clarify), you can PM your message and I will post it for you. If this happens a lot, I might create a LW_Women sockpuppet account for the submitters to share.
Standard Disclaimer- Women have many different viewpoints, and just because I am acting as an intermediary to allow for anonymous communication does NOT mean that I agree with everything that will be posted in this series. (It would be rather impossible to, since there are some posts arguing opposite sides!)
Please do NOT break anonymity, because it lowers the anonymity of the rest of the submitters.
[link] Misinformation and Its Correction: Continued Influence and Successful Debiasing
http://psi.sagepub.com/content/13/3/106.full
Abstract.
The widespread prevalence and persistence of misinformation in contemporary societies, such as the false belief that there is a link between childhood vaccinations and autism, is a matter of public concern. For example, the myths surrounding vaccinations, which prompted some parents to withhold immunization from their children, have led to a marked increase in vaccine-preventable disease, as well as unnecessary public expenditure on research and public-information campaigns aimed at rectifying the situation.
We first examine the mechanisms by which such misinformation is disseminated in society, both inadvertently and purposely. Misinformation can originate from rumors but also from works of fiction, governments and politicians, and vested interests. Moreover, changes in the media landscape, including the arrival of the Internet, have fundamentally influenced the ways in which information is communicated and misinformation is spread.
We next move to misinformation at the level of the individual, and review the cognitive factors that often render misinformation resistant to correction. We consider how people assess the truth of statements and what makes people believe certain things but not others. We look at people’s memory for misinformation and answer the questions of why retractions of misinformation are so ineffective in memory updating and why efforts to retract misinformation can even backfire and, ironically, increase misbelief. Though ideology and personal worldviews can be major obstacles for debiasing, there nonetheless are a number of effective techniques for reducing the impact of misinformation, and we pay special attention to these factors that aid in debiasing.
We conclude by providing specific recommendations for the debunking of misinformation. These recommendations pertain to the ways in which corrections should be designed, structured, and applied in order to maximize their impact. Grounded in cognitive psychological theory, these recommendations may help practitioners—including journalists, health professionals, educators, and science communicators—design effective misinformation retractions, educational tools, and public-information campaigns.
This is a fascinating article with many, many interesting points. I'm excerpting some of them below, but mostly just to get you to read it: if I were to quote everything interesting, I'd have to pretty much copy the entire (long!) article.
Rumors and fiction
[...] A related but perhaps more surprising source of misinformation is literary fiction. People extract knowledge even from sources that are explicitly identified as fictional. This process is often adaptive, because fiction frequently contains valid information about the world. For example, non-Americans’ knowledge of U.S. traditions, sports, climate, and geography partly stems from movies and novels, and many Americans know from movies that Britain and Australia have left-hand traffic. By definition, however, fiction writers are not obliged to stick to the facts, which creates an avenue for the spread of misinformation, even by stories that are explicitly identified as fictional. A study by Marsh, Meade, and Roediger (2003) showed that people relied on misinformation acquired from clearly fictitious stories to respond to later quiz questions, even when these pieces of misinformation contradicted common knowledge. In most cases, source attribution was intact, so people were aware that their answers to the quiz questions were based on information from the stories, but reading the stories also increased people’s illusory belief of prior knowledge. In other words, encountering misinformation in a fictional context led people to assume they had known it all along and to integrate this misinformation with their prior knowledge (Marsh & Fazio, 2006; Marsh et al., 2003).
The effects of fictional misinformation have been shown to be stable and difficult to eliminate. Marsh and Fazio (2006) reported that prior warnings were ineffective in reducing the acquisition of misinformation from fiction, and that acquisition was only reduced (not eliminated) under conditions of active on-line monitoring—when participants were instructed to actively monitor the contents of what they were reading and to press a key every time they encountered a piece of misinformation (see also Eslick, Fazio, & Marsh, 2011). Few people would be so alert and mindful when reading fiction for enjoyment. These links between fiction and incorrect knowledge are particularly concerning when popular fiction pretends to accurately portray science but fails to do so, as was the case with Michael Crichton’s novel State of Fear. The novel misrepresented the science of global climate change but was nevertheless introduced as “scientific” evidence into a U.S. Senate committee (Allen, 2005; Leggett, 2005).
Writers of fiction are expected to depart from reality, but in other instances, misinformation is manufactured intentionally. There is considerable peer-reviewed evidence pointing to the fact that misinformation can be intentionally or carelessly disseminated, often for political ends or in the service of vested interests, but also through routine processes employed by the media. [...]
Assessing the Truth of a Statement: Recipients’ Strategies
Misleading information rarely comes with a warning label. People usually cannot recognize that a piece of information is incorrect until they receive a correction or retraction. For better or worse, the acceptance of information as true is favored by tacit norms of everyday conversational conduct: Information relayed in conversation comes with a “guarantee of relevance” (Sperber & Wilson, 1986), and listeners proceed on the assumption that speakers try to be truthful, relevant, and clear, unless evidence to the contrary calls this default into question (Grice, 1975; Schwarz, 1994, 1996). Some research has even suggested that to comprehend a statement, people must at least temporarily accept it as true (Gilbert, 1991). On this view, belief is an inevitable consequence of—or, indeed, precursor to—comprehension.
Although suspension of belief is possible (Hasson, Simmons, & Todorov, 2005; Schul, Mayo, & Burnstein, 2008), it seems to require a high degree of attention, considerable implausibility of the message, or high levels of distrust at the time the message is received. So, in most situations, the deck is stacked in favor of accepting information rather than rejecting it, provided there are no salient markers that call the speaker’s intention of cooperative conversation into question. Going beyond this default of acceptance requires additional motivation and cognitive resources: If the topic is not very important to you, or you have other things on your mind, misinformation will likely slip in." [...]Is the information compatible with what I believe?
As numerous studies in the literature on social judgment and persuasion have shown, information is more likely to be accepted by people when it is consistent with other things they assume to be true (for reviews, see McGuire, 1972; Wyer, 1974). People assess the logical compatibility of the information with other facts and beliefs. Once a new piece of knowledge-consistent information has been accepted, it is highly resistant to change, and the more so the larger the compatible knowledge base is. From a judgment perspective, this resistance derives from the large amount of supporting evidence (Wyer, 1974); from a cognitive-consistency perspective (Festinger, 1957), it derives from the numerous downstream inconsistencies that would arise from rejecting the prior information as false. Accordingly, compatibility with other knowledge increases the likelihood that misleading information will be accepted, and decreases the likelihood that it will be successfully corrected.
When people encounter a piece of information, they can check it against other knowledge to assess its compatibility. This process is effortful, and it requires motivation and cognitive resources. A less demanding indicator of compatibility is provided by one’s meta-cognitive experience and affective response to new information. Many theories of cognitive consistency converge on the assumption that information that is inconsistent with one’s beliefs elicits negative feelings (Festinger, 1957). Messages that are inconsistent with one’s beliefs are also processed less fluently than messages that are consistent with one’s beliefs (Winkielman, Huber, Kavanagh, & Schwarz, 2012). In general, fluently processed information feels more familiar and is more likely to be accepted as true; conversely, disfluency elicits the impression that something doesn’t quite “feel right” and prompts closer scrutiny of the message (Schwarz et al., 2007; Song & Schwarz, 2008). This phenomenon is observed even when the fluent processing of a message merely results from superficial characteristics of its presentation. For example, the same statement is more likely to be judged as true when it is printed in high rather than low color contrast (Reber & Schwarz, 1999), presented in a rhyming rather than nonrhyming form (McGlone & Tofighbakhsh, 2000), or delivered in a familiar rather than unfamiliar accent (Levy-Ari & Keysar, 2010). Moreover, misleading questions are less likely to be recognized as such when printed in an easy-to-read font (Song & Schwarz, 2008).
As a result, analytic as well as intuitive processing favors the acceptance of messages that are compatible with a recipient’s preexisting beliefs: The message contains no elements that contradict current knowledge, is easy to process, and “feels right.”
Is the story coherent?
Whether a given piece of information will be accepted as true also depends on how well it fits a broader story that lends sense and coherence to its individual elements. People are particularly likely to use an assessment strategy based on this principle when the meaning of one piece of information cannot be assessed in isolation because it depends on other, related pieces; use of this strategy has been observed in basic research on mental models (for a review, see Johnson-Laird, 2012), as well as extensive analyses of juries’ decision making (Pennington & Hastie, 1992, 1993).
A story is compelling to the extent that it organizes information without internal contradictions in a way that is compatible with common assumptions about human motivation and behavior. Good stories are easily remembered, and gaps are filled with story-consistent intrusions. Once a coherent story has been formed, it is highly resistant to change: Within the story, each element is supported by the fit of other elements, and any alteration of an element may be made implausible by the downstream inconsistencies it would cause. Coherent stories are easier to process than incoherent stories are (Johnson-Laird, 2012), and people draw on their processing experience when they judge a story’s coherence (Topolinski, 2012), again giving an advantage to material that is easy to process. [...]Is the information from a credible source?
[...] People’s evaluation of a source’s credibility can be based on declarative information, as in the above examples, as well as experiential information. The mere repetition of an unknown name can cause it to seem familiar, making its bearer “famous overnight” (Jacoby, Kelley, Brown, & Jaseschko, 1989)—and hence more credible. Even when a message is rejected at the time of initial exposure, that initial exposure may lend it some familiarity-based credibility if the recipient hears it again.
Do others believe this information?
Repeated exposure to a statement is known to increase its acceptance as true (e.g., Begg, Anas, & Farinacci, 1992; Hasher, Goldstein, & Toppino, 1977). In a classic study of rumor transmission, Allport and Lepkin (1945) observed that the strongest predictor of belief in wartime rumors was simple repetition. Repetition effects may create a perceived social consensus even when no consensus exists. Festinger (1954) referred to social consensus as a “secondary reality test”: If many people believe a piece of information, there’s probably something to it. Because people are more frequently exposed to widely shared beliefs than to highly idiosyncratic ones, the familiarity of a belief is often a valid indicator of social consensus. But, unfortunately, information can seem familiar for the wrong reason, leading to erroneous perceptions of high consensus. For example, Weaver, Garcia, Schwarz, and Miller (2007) exposed participants to multiple iterations of the same statement, provided by the same communicator. When later asked to estimate how widely the conveyed belief is shared, participants estimated consensus to be greater the more often they had read the identical statement from the same, single source. In a very real sense, a single repetitive voice can sound like a chorus. [...]
The extent of pluralistic ignorance (or of the false-consensus effect) can be quite striking: In Australia, people with particularly negative attitudes toward Aboriginal Australians or asylum seekers have been found to overestimate public support for their attitudes by 67% and 80%, respectively (Pedersen, Griffiths, & Watt, 2008). Specifically, although only 1.8% of people in a sample of Australians were found to hold strongly negative attitudes toward Aboriginals, those few individuals thought that 69% of all Australians (and 79% of their friends) shared their fringe beliefs. This represents an extreme case of the false-consensus effect. [...]
The Continued Influence Effect: Retractions Fail to Eliminate the Influence of Misinformation
We first consider the cognitive parameters of credible retractions in neutral scenarios, in which people have no inherent reason or motivation to believe one version of events over another. Research on this topic was stimulated by a paradigm pioneered by Wilkes and Leatherbarrow (1988) and H. M. Johnson and Seifert (1994). In it, people are presented with a fictitious report about an event unfolding over time. The report contains a target piece of information: For some readers, this target information is subsequently retracted, whereas for readers in a control condition, no correction occurs. Participants’ understanding of the event is then assessed with a questionnaire, and the number of clear and uncontroverted references to the target (mis-)information in their responses is tallied.
A stimulus narrative commonly used in this paradigm involves a warehouse fire that is initially thought to have been caused by gas cylinders and oil paints that were negligently stored in a closet (e.g., Ecker, Lewandowsky, Swire, & Chang, 2011; H. M. Johnson & Seifert, 1994; Wilkes & Leatherbarrow, 1988). Some participants are then presented with a retraction, such as “the closet was actually empty.” A comprehension test follows, and participants’ number of references to the gas and paint in response to indirect inference questions about the event (e.g., “What caused the black smoke?”) is counted. In addition, participants are asked to recall some basic facts about the event and to indicate whether they noticed any retraction.
Research using this paradigm has consistently found that retractions rarely, if ever, have the intended effect of eliminating reliance on misinformation, even when people believe, understand, and later remember the retraction (e.g., Ecker, Lewandowsky, & Apai, 2011; Ecker, Lewandowsky, Swire, & Chang, 2011; Ecker, Lewandowsky, & Tang, 2010; Fein, McCloskey, & Tomlinson, 1997; Gilbert, Krull, & Malone, 1990; Gilbert, Tafarodi, & Malone, 1993; H. M. Johnson & Seifert, 1994, 1998, 1999; Schul & Mazursky, 1990; van Oostendorp, 1996; van Oostendorp & Bonebakker, 1999; Wilkes & Leatherbarrow, 1988; Wilkes & Reynolds, 1999). In fact, a retraction will at most halve the number of references to misinformation, even when people acknowledge and demonstrably remember the retraction (Ecker, Lewandowsky, & Apai, 2011; Ecker, Lewandowsky, Swire, & Chang, 2011); in some studies, a retraction did not reduce reliance on misinformation at all (e.g., H. M. Johnson & Seifert, 1994).
When misinformation is presented through media sources, the remedy is the presentation of a correction, often in a temporally disjointed format (e.g., if an error appears in a newspaper, the correction will be printed in a subsequent edition). In laboratory studies, misinformation is often retracted immediately and within the same narrative (H. M. Johnson & Seifert, 1994). Despite this temporal and contextual proximity to the misinformation, retractions are ineffective. More recent studies (Seifert, 2002) have examined whether clarifying the correction (minimizing misunderstanding) might reduce the continued influence effect. In these studies, the correction was thus strengthened to include the phrase “paint and gas were never on the premises.” Results showed that this enhanced negation of the presence of flammable materials backfired, making people even more likely to rely on the misinformation in their responses. Other additions to the correction were found to mitigate to a degree, but not eliminate, the continued influence effect: For example, when participants were given a rationale for how the misinformation originated, such as, “a truckers’ strike prevented the expected delivery of the items,” they were somewhat less likely to make references to it. Even so, the influence of the misinformation could still be detected. The wealth of studies on this phenomenon have documented its pervasive effects, showing that it is extremely difficult to return the beliefs of people who have been exposed to misinformation to a baseline similar to those of people who were never exposed to it.
Multiple explanations have been proposed for the continued influence effect. We summarize their key assumptions next. [...]Concise recommendations for practitioners
[...] We summarize the main points from the literature in Figure 1 and in the following list of recommendations:
Consider what gaps in people’s mental event models are created by debunking and fill them using an alternative explanation.
Use repeated retractions to reduce the influence of misinformation, but note that the risk of a backfire effect increases when the original misinformation is repeated in retractions and thereby rendered more familiar.
To avoid making people more familiar with misinformation (and thus risking a familiarity backfire effect), emphasize the facts you wish to communicate rather than the myth.
Provide an explicit warning before mentioning a myth, to ensure that people are cognitively on guard and less likely to be influenced by the misinformation.
Ensure that your material is simple and brief. Use clear language and graphs where appropriate. If the myth is simpler and more compelling than your debunking, it will be cognitively more attractive, and you will risk an overkill backfire effect.
Consider whether your content may be threatening to the worldview and values of your audience. If so, you risk a worldview backfire effect, which is strongest among those with firmly held beliefs. The most receptive people will be those who are not strongly fixed in their views.
If you must present evidence that is threatening to the audience’s worldview, you may be able to reduce the worldview backfire effect by presenting your content in a worldview-affirming manner (e.g., by focusing on opportunities and potential benefits rather than risks and threats) and/or by encouraging self-affirmation.
You can also circumvent the role of the audience’s worldview by focusing on behavioral techniques, such as the design of choice architectures, rather than overt debiasing.
The Useful Idea of Truth
(This is the first post of a new Sequence, Highly Advanced Epistemology 101 for Beginners, setting up the Sequence Open Problems in Friendly AI. For experienced readers, this first post may seem somewhat elementary; but it serves as a basis for what follows. And though it may be conventional in standard philosophy, the world at large does not know it, and it is useful to know a compact explanation. Kudos to Alex Altair for helping in the production and editing of this post and Sequence!)
I remember this paper I wrote on existentialism. My teacher gave it back with an F. She’d underlined true and truth wherever it appeared in the essay, probably about twenty times, with a question mark beside each. She wanted to know what I meant by truth.
-- Danielle Egan
I understand what it means for a hypothesis to be elegant, or falsifiable, or compatible with the evidence. It sounds to me like calling a belief ‘true’ or ‘real’ or ‘actual’ is merely the difference between saying you believe something, and saying you really really believe something.
-- Dale Carrico
What then is truth? A movable host of metaphors, metonymies, and; anthropomorphisms: in short, a sum of human relations which have been poetically and rhetorically intensified, transferred, and embellished, and which, after long usage, seem to a people to be fixed, canonical, and binding.
-- Friedrich Nietzche
The Sally-Anne False-Belief task is an experiment used to tell whether a child understands the difference between belief and reality. It goes as follows:
-
The child sees Sally hide a marble inside a covered basket, as Anne looks on.
-
Sally leaves the room, and Anne takes the marble out of the basket and hides it inside a lidded box.
-
Anne leaves the room, and Sally returns.
-
The experimenter asks the child where Sally will look for her marble.
Children under the age of four say that Sally will look for her marble inside the box. Children over the age of four say that Sally will look for her marble inside the basket.
Luke is doing an AMA on Reddit
I'm sure most of us are used to just being able to badger him about things in the comments here on LW, but for anyone interested here's the link.
Reply to Holden on The Singularity Institute
Holden Karnofsky of GiveWell has objected to the Singularity Institute (SI) as a target for optimal philanthropy. As someone who thinks that existential risk reduction is really important and also that the Singularity Institute is an important target of optimal philanthropy, I would like to explain why I disagree with Holden on these subjects. (I am also SI's Executive Director.)
Mostly, I'd like to explain my views to a broad audience. But I'd also like to explain my views to Holden himself. I value Holden's work, I enjoy interacting with him, and I think he is both intelligent and capable of changing his mind about Big Things like this. Hopefully Holden and I can continue to work through the arguments together, though of course we are both busy with many other things.
I appreciate the clarity and substance of Holden's objections, and I hope to reply in kind. I begin with an overview of some basic points that may be familiar to most Less Wrong veterans, and then I reply point-by-point to Holden's post. In the final section, I summarize my reply to Holden.
Holden raised many different issues, so unfortunately this post needed to be long. My apologies to Holden if I have misinterpreted him at any point.
Contents
- Existential risk reduction is a critical concern for many people, given their values and given many plausible models of the future. Details here.
- Among existential risks, AI risk is probably the most important. Details here.
- SI can purchase many kinds of AI risk reduction more efficiently than other groups can. Details here.
- These points and many others weigh against many of Holden's claims and conclusions. Details here.
- Summary of my reply to Holden
CFAR website launched
The new Center for Applied Rationality website has launched! We'll be adding content as time goes by. Let us know if you find broken links, etc.
Real World Solutions to Prisoners' Dilemmas
Why should there be real world solutions to Prisoners' Dilemmas? Because such dilemmas are a real-world problem.
If I am assigned to work on a school project with a group, I can either cooperate (work hard on the project) or defect (slack off while reaping the rewards of everyone else's hard work). If everyone defects, the project doesn't get done and we all fail - a bad outcome for everyone. If I defect but you cooperate, then I get to spend all day on the beach and still get a good grade - the best outcome for me, the worst for you. And if we all cooperate, then it's long hours in the library but at least we pass the class - a “good enough” outcome, though not quite as good as me defecting against everyone else's cooperation. This exactly mirrors the Prisoner's Dilemma.
Diplomacy - both the concept and the board game - involves Prisoners' Dilemmas. Suppose Ribbentrop of Germany and Molotov of Russia agree to a peace treaty that demilitarizes their mutual border. If both cooperate, they can move their forces to other theaters, and have moderate success there - a good enough outcome. If Russia cooperates but Germany defects, it can launch a surprise attack on an undefended Russian border and enjoy spectacular success there (for a while, at least!) - the best outcome for Germany and the worst for Russia. But if both defect, then neither has any advantage at the German-Russian border, and they lose the use of those troops in other theaters as well - a bad outcome for both. Again, the Prisoner's Dilemma.
Civilization - again, both the concept and the game - involves Prisoners' Dilemmas. If everyone follows the rules and creates a stable society (cooperates), we all do pretty well. If everyone else works hard and I turn barbarian and pillage you (defect), then I get all of your stuff without having to work for it and you get nothing - the best solution for me, the worst for you. If everyone becomes a barbarian, there's nothing to steal and we all lose out. Prisoner's Dilemma.
If everyone who worries about global warming cooperates in cutting emissions, climate change is averted and everyone is moderately happy. If everyone else cooperates in cutting emissions, but one country defects, climate change is still mostly averted, and the defector is at a significant economic advantage. If everyone defects and keeps polluting, the climate changes and everyone loses out. Again a Prisoner's Dilemma,
Prisoners' Dilemmas even come up in nature. In baboon tribes, when a female is in “heat”, males often compete for the chance to woo her. The most successful males are those who can get a friend to help fight off the other monkeys, and who then helps that friend find his own monkey loving. But these monkeys are tempted to take their friend's female as well. Two males who cooperate each seduce one female. If one cooperates and the other defects, he has a good chance at both females. But if the two can't cooperate at all, then they will be beaten off by other monkey alliances and won't get to have sex with anyone. Still a Prisoner's Dilemma!
So one might expect the real world to have produced some practical solutions to Prisoners' Dilemmas.
One of the best known such systems is called “society”. You may have heard of it. It boasts a series of norms, laws, and authority figures who will punish you when those norms and laws are broken.
Imagine that the two criminals in the original example were part of a criminal society - let's say the Mafia. The Godfather makes Alice and Bob an offer they can't refuse: turn against one another, and they will end up “sleeping with the fishes” (this concludes my knowledge of the Mafia). Now the incentives are changed: defecting against a cooperator doesn't mean walking free, it means getting murdered.


Both prisoners cooperate, and amazingly the threat of murder ends up making them both better off (this is also the gist of some of the strongest arguments against libertarianism: in Prisoner's Dilemmas, threatening force against rational agents can increase the utility of all of them!)
Even when there is no godfather, society binds people by concern about their “reputation”. If Bob got a reputation as a snitch, he might never be able to work as a criminal again. If a student gets a reputation for slacking off on projects, she might get ostracized on the playground. If a country gets a reputation for backstabbing, others might refuse to make treaties with them. If a person gets a reputation as a bandit, she might incur the hostility of those around her. If a country gets a reputation for not doing enough to fight global warming, it might...well, no one ever said it was a perfect system.
Aside from humans in society, evolution is also strongly motivated to develop a solution to the Prisoner's Dilemma. The Dilemma troubles not only lovestruck baboons, but ants, minnows, bats, and even viruses. Here the payoff is denominated not in years of jail time, nor in dollars, but in reproductive fitness and number of potential offspring - so evolution will certainly take note.
Most people, when they hear the rational arguments in favor of defecting every single time on the iterated 100-crime Prisoner's Dilemma, will feel some kind of emotional resistance. Thoughts like “Well, maybe I'll try cooperating anyway a few times, see if it works”, or “If I promised to cooperate with my opponent, then it would be dishonorable for me to defect on the last turn, even if it helps me out., or even “Bob is my friend! Think of all the good times we've had together, robbing banks and running straight into waiting police cordons. I could never betray him!”
And if two people with these sorts of emotional hangups play the Prisoner's Dilemma together, they'll end up cooperating on all hundred crimes, getting out of jail in a mere century and leaving rational utility maximizers to sit back and wonder how they did it.
Here's how: imagine you are a supervillain designing a robotic criminal (who's that go-to supervillain Kaj always uses for situations like this? Dr. Zany? Okay, let's say you're him). You expect to build several copies of this robot to work as a team, and expect they might end up playing the Prisoner's Dilemma against each other. You want them out of jail as fast as possible so they can get back to furthering your nefarious plots. So rather than have them bumble through the whole rational utility maximizing thing, you just insert an extra line of code: “in a Prisoner's Dilemma, always cooperate with other robots”. Problem solved.
Evolution followed the same strategy (no it didn't; this is a massive oversimplification). The emotions we feel around friendship, trust, altruism, and betrayal are partly a built-in hack to succeed in cooperating on Prisoner's Dilemmas where a rational utility-maximizer would defect a hundred times and fail miserably. The evolutionarily dominant strategy is commonly called “Tit-for-tat” - basically, cooperate if and only if your opponent did so last time.
This so-called "superrationality” appears even more clearly in the Ultimatum Game. Two players are given $100 to distribute among themselves in the following way: the first player proposes a distribution (for example, “Fifty for me, fifty for you”) and then the second player either accepts or rejects the distribution. If the second player accepts, the players get the money in that particular ratio. If the second player refuses, no one gets any money at all.
The first player's reasoning goes like this: “If I propose $99 for myself and $1 for my opponent, that means I get a lot of money and my opponent still has to accept. After all, she prefers $1 to $0, which is what she'll get if she refuses.
In the Prisoner's Dilemma, when players were able to communicate beforehand they could settle upon a winning strategy of precommiting to reciprocate: to take an action beneficial to their opponent if and only if their opponent took an action beneficial to them. Here, the second player should consider the same strategy: precommit to an ultimatum (hence the name) that unless Player 1 distributes the money 50-50, she will reject the offer.
But as in the Prisoner's Dilemma, this fails when you have no reason to expect your opponent to follow through on her precommitment. Imagine you're Player 2, playing a single Ultimatum Game against an opponent you never expect to meet again. You dutifully promise Player 1 that you will reject any offer less than 50-50. Player 1 offers 80-20 anyway. You reason “Well, my ultimatum failed. If I stick to it anyway, I walk away with nothing. I might as well admit it was a good try, give in, and take the $20. After all, rejecting the offer won't magically bring my chance at $50 back, and there aren't any other dealings with this Player 1 guy for it to influence.”
This is seemingly a rational way to think, but if Player 1 knows you're going to think that way, she offers 99-1, same as before, no matter how sincere your ultimatum sounds.
Notice all the similarities to the Prisoner's Dilemma: playing as a "rational economic agent" gets you a bad result, it looks like you can escape that bad result by making precommitments, but since the other player can't trust your precommitments, you're right back where you started
If evolutionary solutions to the Prisoners' Dilemma look like trust or friendship or altruism, solutions to the Ultimatum Game involve different emotions entirely. The Sultan presumably does not want you to elope with his daughter. He makes an ultimatum: “Touch my daughter, and I will kill you.” You elope with her anyway, and when his guards drag you back to his palace, you argue: “Killing me isn't going to reverse what happened. Your ultimatum has failed. All you can do now by beheading me is get blood all over your beautiful palace carpet, which hurts you as well as me - the equivalent of pointlessly passing up the last dollar in an Ultimatum Game where you've just been offered a 99-1 split.”
The Sultan might counter with an argument from social institutions: “If I let you go, I will look dishonorable. I will gain a reputation as someone people can mess with without any consequences. My choice isn't between bloody carpet and clean carpet, it's between bloody carpet and people respecting my orders, or clean carpet and people continuing to defy me.”
But he's much more likely to just shout an incoherent stream of dreadful Arabic curse words. Because just as friendship is the evolutionary solution to a Prisoner's Dilemma, so anger is the evolutionary solution to an Ultimatum Game. As various gurus and psychologists have observed, anger makes us irrational. But this is the good kind of irrationality; it's the kind of irrationality that makes us pass up a 99-1 split even though the decision costs us a dollar.
And if we know that humans are the kind of life-form that tends to experience anger, then if we're playing an Ultimatum Game against a human, and that human precommits to rejecting any offer less than 50-50, we're much more likely to believe her than if we were playing against a rational utility-maximizing agent - and so much more likely to give the human a fair offer.
It is distasteful and a little bit contradictory to the spirit of rationality to believe it should lose out so badly to simple emotion, and the problem might be correctable. Here we risk crossing the poorly charted border between game theory and decision theory and reaching ideas like timeless decision theory: that one should act as if one's choices determined the output of the algorithm one instantiates (or more simply, you should assume everyone like you will make the same choice you do, and take that into account when choosing.)
More practically, however, most real-world solutions to Prisoner's Dilemmas and Ultimatum Games still hinge on one of three things: threats of reciprocation when the length of the game is unknown, social institutions and reputation systems that make defection less attractive, and emotions ranging from cooperation to anger that are hard-wired into us by evolution. In the next post, we'll look at how these play out in practice.

Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)