Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Tl;dr: Articles on LW are, if unchecked (for now by you), heavily distorting a useful view (yours) on what matters.
[This is (though in part only) a five-year update to Patrissimo’s article Self-Improvement or Shiny Distraction: Why Less Wrong is anti-Instrumental Rationality. However, I wrote most of this article before I became aware of its predecessor. Then again, this reinforces both our articles' main critique.]
I claim that rational discussions in person, conferences, forums, social media, and blogs suffer from adverse selection and promote unwished-for phenomena such as the availability heuristic. Bluntly stated, they do (as all other discussions) have a tendency to support ever worse, unimportant, or wrong opinions and articles. More importantly, articles of high relevancy regarding some topics are conspicuously missing. This can be also observed on Less Wrong. It is not the purpose of this article to determine the exact extent of this problem. It shall merely bring to attention that “what you get is not what you should see." However, I am afraid this effect is largely undervalued.
This result is by design and therefore to be expected. A rational agent will, by definition, post incorrect, incomplete, or not at all in the following instances:
- Cost-benefit analysis: A rational agent will not post information that reduces his utility by enabling others to compete better and, more importantly, by causing him any effort unless some gain (status, monetary, happiness,…) offsets the former effect. Example: Have you seen articles by Mark Zuckerberg? But I also argue that for random John Doe the personal cost-benefit-analysis from posting an article is negative. Even more, the value of your time should approach infinity if you really drink the LW Kool-Aid, however, this shall be the topic of a subsequent article. I suspect the theme of this article may also be restated as a free-riding problem as it postulates the non-production or under-production of valuable articles and other contributions.
- Conflicting with law: Topics like drugs (in the western world) and maybe politics or sexuality in other parts of the world are biased due to the risk of persecution, punishment, extortion, etc. And many topics such as in the spheres of rationality, transhumanism, effective altruism, are at least highly sensitive, especially when you continue arguing until you reach their moral extremes.
- Inconvenience of disagreement: Due to the effort of posting truly anonymously (which currently requires a truly anonymous e-mail address and so forth), disagreeing posts will be avoided, particularly when the original poster is of high status and the risk to rub off on one’s other articles thus increased. This is obviously even truer for personal interactions. Side note: The reverse situation may also apply: more agreement (likes) with high status.
- Dark knowledge: Even if I know how to acquire a sniper gun that cannot be traced, I will not share this knowledge (as for all other reasons, there are substantially better examples, but I do not want to make spreading dark knowledge a focus of this article).
- Signaling: Seriously, would you discuss your affiliation to LW in a job interview?! Or tell your friends that you are afraid we live in a simulation? (If you don’t see my point, your rationality is totally off base, see the next point). LW user “Timtyler” commented before: “I also found myself wondering why people remained puzzled about the high observed levels of disagreement. It seems obvious to me that people are poor approximations of truth-seeking agents—and instead promote their own interests. If you understand that, then the existence of many real-world disagreements is explained: people disagree in order to manipulate the opinions and actions of others for their own benefit.”
- WEIRD-M-LW: It is a known problem that articles on LW are going to be written by authors that are in the overwhelming majority western, educated, industrialized, rich, democratic, and male. The LW surveys show distinctly that there are most likely many further attributes in which the population on LW differs from the rest of the world. LW user “Jpet” argued in a comment very nicely: “But assuming that the other party is in fact totally rational is just silly. We know we're talking to other flawed human beings, and either or both of us might just be totally off base, even if we're hanging around on a rationality discussion board.” LW could certainly use more diversity. Personal anecdote: I was dumbfounded by the current discussion around LW T-shirts sporting slogans such as "Growing Mentally Stronger" which seemed to me intuitively highly counterproductive. I then asked my wife who is far more into fashion and not at all into LW. Her comment (Crocker's warning): “They are great! You should definitely buy one for your son if you want him to go to high school and to be all for himself for the next couple of years; that is, except for the mobbing, maybe.”
- Genes, minds, hormones & personal history: (Even) rational agents are highly influenced by those factors. This fact seems underappreciated. Think of SSC's "What universal human experiences are you missing without realizing it?" Think of inferential distances and the typical mind fallacy. Think of slight changes in beliefs after drinking coffee, been working out, deeply in love for the first time/seen your child born, being extremely hungry, wanting to and standing on the top of the mountain (especially Mt. Everest). Russell pointed out the interesting and strong effect of Schopenhauer’s and Nietzsche’s personal history on their misogyny. However, it would be a stretch to simply call them irrational. In every discussion, you have to start somewhere, but finding a starting point is a lot more difficult when the discussion partners are more diverse. All factors may not result in direct misinformation on LW but certainly shape the conversation (see also the next point).
- Priorities: Specific “darlings” of the LW sphere such as Newcomb’s paradox or MW are regularly discussed. Just one moment of not paying bias attention, and you may assume they are really relevant. For those of us currently not programming FAI, they aren’t and steal attention from more important issues.
- Other beliefs/goals: Close to selfishness, but not quite the same. If an agent’s beliefs and goals differ from most others, the discussion would benefit from your post. Even so, that by itself may not be a sufficient reason for an agent to post. Example: Imagine somebody like Ben Goertzel. His beliefs on AI, for instance, differed from the mainstream on LW. This did not necessarily result in him posting an article on LW. And to my knowledge, he won’t, at least not directly. Plus, LW may try to slow him down as he seems less concerned about the F of FAI.
- Vanity: Considering the amount of self-help threads, nerdiness, and alike on LW, it may be suspected that some refrain from posting due to self-respect. E.g. I do not want to signal myself that I belong to this tribe. This may sound outlandish but then again, have a look at the Facebook groups of LW and other rationalists where people ask frequently how they can be more interesting, or how “they can train how to pause for two seconds before they speak to increase their charisma." Again, if this sounds perfectly fine to you, that may be bad news.
- Barriers to entry: Your first post requires creating an account. Karma that signals the quality of your post is still absent. An aspiring author may question the relative importance of his opinion (especially for highly complex topics), his understanding of the problem, the quality of his writing, and if his research on the chosen topic is sufficient.
- Nothing new under the sun: Writing an article requires the bold assumption that its marginal utility is significantly above zero. The likelihood of which probably decreases with the number of posts, which is, as of now, quite impressive. Patrissimo‘s article (footnote ) addresses the same point, others mention being afraid of "reinventing the wheel."
- Error: I should point out that most of the reasons brought forward in this list talk about deliberate misinformation. In many cases, an article will just be wrong which the author does not realize. Examples: facts (the earth is flat), predications (planes cannot fly), and, seriously underestimated, horizon effects (if more information is provided the rational agent realizes that his action did not yield the desired outcome, e.g. ban of plastic bags).
- Protection of the group: Opinions though being important may not be discussed to protect the group or its image to outsiders. See “is LW a c***” and Roko’s ***." This argument can also be brought forward much more subtle: an agent may, for example, hold the opinion that rationality concepts are information hazards by nature if they reduce the happiness of the otherwise blissfully unaware.
- Topicality: This is a problem specific to LW. Many of the great posts as well as the sequences have originated about five to ten years ago. While the interest in AI has now reached mainstream awareness, the solid intellectual basis (centered around a few individuals) which LW offered seems to break away gradually and rationality topics experience their diaspora. What remains is a less balanced account of important topics in the sphere of rationality and new authors are discouraged to enter the conversation.
- Russell’s antinomy: Is the contribution that states its futility ever expressed? Random example article title: “Writing articles on LW is useless because only nerds will read them."
- +Redundancy: If any of the above reasons apply, I may choose not to post. However, I also expect a rational agent with sufficiently close knowledge to attain the same knowledge himself so it is at the same time not absolutely necessary to post. An article will “only” speed up the time required to understand a new concept and reduce the likelihood of rationalists diverting due to disagreement (if Aumann is ignored) or faulty argumentation.
This list is not exhaustive. If you do not find a factor in this list that you expect to accounts for much of the effect, I will appreciate a hint in the comments.
There are a few outstanding examples pointing in the opposite direction. They appear to provide uncensored accounts of their way of thinking and take arguments to their logical extremes when necessary. Most notably Bostrom and Gwern, but then again, feel free to read the latter’s posts on endured extortion attempts.
A somewhat flippant conclusion (more in a FB than LW voice): After reading the article from 2010, I cannot expect this article (or the ones possibly following that have already been written) to have a serious impact. It thus can be concluded that it should not have been written. Then again, observing our own thinking patterns, we can identify influences of many thinkers who may have suspected the same (hubris not intended). And step by step, we will be standing on the shoulders of giants. At the same time, keep in mind that articles from LW won’t get you there. They represent only a small piece of the jigsaw. You may want to read some, observe how instrumental rationality works in the “real world," and, finally, you have to draw the critical conclusions for yourself. Nobody truly rational will lay them out for you. LW is great if you have an IQ of 140 and are tired of superficial discussions with the hairstylist in your village X. But keep in mind that the instrumental rationality of your hairstylist may still surpass yours, and I don’t even need to say much about the one of your president, business leader, and club Casanova. And yet, they may be literally dead wrong, because they have overlooked AI and SENS.
A final personal note: Kudos to the giants for building this great website and starting point for rationalists and the real-life progress in the last couple of years! This is a rather skeptical article to start with, but it does have its specific purpose of laying out why I, and I suspect many others, almost refrained from posting.
Cross-posted from my blog here.
One of the greatest successes of mankind over the last few centuries has been the enormous amount of wealth that has been created. Once upon a time virtually everyone lived in grinding poverty; now, thanks to the forces of science, capitalism and total factor productivity, we produce enough to support a much larger population at a much higher standard of living.
EAs being a highly intellectual lot, our preferred form of ritual celebration is charts. The ordained chart for celebrating this triumph of our people is the Declining Share of People Living in Extreme Poverty Chart.
However, as a heretic, I think this chart is a mistake. What is so great about reducing the share? We could achieve that by killing all the poor people, but that would not be a good thing! Life is good, and poverty is not death; it is simply better for it to be rich.
As such, I think this is a much better chart. Here we show the world population. Those in extreme poverty are in purple – not red, for their existence is not bad. Those who the wheels of progress have lifted into wealth unbeknownst to our ancestors, on the other hand, are depicted in blue, rising triumphantly.
Long may their rise continue.
Disclaimer: This post is mainly relevant to those who are interested in Effective Altruism
As a Less Wronger and Effective Altruist who is skilled at marketing, education, and outreach, I think we can do a lot of good if we improve the effectiveness of Effective Altruism outreach. I am not talking about EA pitches in particular, although these are of course valuable in the right time and place, but more broadly issues of strategy. I am talking about making Effective Altruism outreach effective through relying on research-based strategies of effective outreach.
To be clear, I should say that I have been putting my money/efforts where my mouth is, and devoting a lot of my time and energy to a project, Intentional Insights, of spreading rationality and effective altruism to a broad audience, as I think I can do the most good through convincing others to do the most good, through their giving and through rational thinking. Over the last year, I devoted approximately 2400 hours and $33000 to this project. Here's what I found helpful in my own outreach efforts to non-EAs, and lots of these ideas also apply to my outreach regarding rationality more broadly.
I found it quite helpful to focus much more on speaking to people's emotions rather than their cognition. Now, this was not intuitive to me. I'm much more motivated by data than the typical person, and I bet you are too. But I think we need to remember that we suffer from a typical mind fallacy, in that most EAs are much more data-driven than the typical person. Moreover, after we got into the EA movement, we forget how weird it looks from the outside - we suffer from the curse of knowledge.
Among my friends interested in rationality, effective altruism, and existential risk reduction, I often hear: "If you want to have a real positive impact on the world, grad school is a waste of time. It's better to use deliberate practice to learn whatever you need instead of working within the confines of an institution."
While I'd agree that grad school will not make you do good for the world, if you're a self-driven person who can spend time in a PhD program deliberately acquiring skills and connections for making a positive difference, I think you can make grad school a highly productive path, perhaps more so than many alternatives. In this post, I want to share some advice that I've been repeating a lot lately for how to do this:
- Find a flexible program. PhD programs in mathematics, statistics, philosophy, and theoretical computer science tend to give you a great deal of free time and flexibility, provided you can pass the various qualifying exams without too much studying. By contrast, sciences like biology and chemistry can require time-consuming laboratory work that you can't always speed through by being clever.
- Choose high-impact topics to learn about. AI safety and existential risk reduction are my favorite examples, but there are others, and I won't spend more time here arguing their case. If you can't make your thesis directly about such a topic, choosing a related more popular topic can give you valuable personal connections, and you can still learn whatever you want during the spare time a flexible program will afford you.
- Teach classes. Grad programs that let you teach undergraduate tutorial classes provide a rare opportunity to practice engaging a non-captive audience. If you just want to work on general presentation skills, maybe you practice on your friends... but your friends already like you. If you want to learn to win over a crowd that isn't particularly interested in you, try teaching calculus! I've found this skill particularly useful when presenting AI safety research that isn't yet mainstream, which requires carefully stepping through arguments that are unfamiliar to the audience.
- Use your freedom to accomplish things. I used my spare time during my PhD program to cofound CFAR, the Center for Applied Rationality. Alumni of our workshops have gone on to do such awesome things as creating the Future of Life Institute and sourcing a $10MM donation from Elon Musk to fund AI safety research. I never would have had the flexibility to volunteer for weeks at a time if I'd been working at a typical 9-to-5 or a startup.
- Organize a graduate seminar. Organizing conferences is critical to getting the word out on important new research, and in fact, running a conference on AI safety in Puerto Rico is how FLI was able to bring so many researchers together on its Open Letter on AI Safety. It's also where Elon Musk made his donation. During grad school, you can get lots of practice organizing research events by running seminars for your fellow grad students. In fact, several of the organizers of the FLI conference were grad students.
- Get exposure to experts. A top 10 US school will have professors around that are world-experts on myriad topics, and you can attend departmental colloquia to expose yourself to the cutting edge of research in fields you're curious about. I regularly attended cognitive science and neuroscience colloquia during my PhD in mathematics, which gave me many perspectives that I found useful working at CFAR.
- Learn how productive researchers get their work done. Grad school surrounds you with researchers, and by getting exposed to how a variety of researchers do their thing, you can pick and choose from their methods and find what works best for you. For example, I learned from my advisor Bernd Sturmfels that, for me, quickly passing a draft back and forth with a coauthor can get a paper written much more quickly than agonizing about each revision before I share it.
- Remember you don't have to stay in academia. If you limit yourself to only doing research that will get you good post-doc offers, you might find you aren't able to focus on what seems highest impact (because often what makes a topic high impact is that it's important and neglected, and if a topic is neglected, it might not be trendy enough land you good post-doc). But since grad school is run by professors, becoming a professor is usually the most salient path forward for most grad students, and you might end up pressuring yourself to follow that standards of that path. When I graduated, I got my top choice of post-doc, but then I decided not to take it and to instead try earning to give as an algorithmic stock trader, and now I'm a research fellow at MIRI. In retrospect, I might have done more valuable work during my PhD itself if I'd decided in advance not to do a typical post-doc.
That's all I have for now. The main sentiment behind most of this, I think, is that you have to be deliberate to get the most out of a PhD program, rather than passively expecting it to make you into anything in particular. Grad school still isn't for everyone, and far from it. But if you were seriously considering it at some point, and "do something more useful" felt like a compelling reason not to go, be sure to first consider the most useful version of grad that you could reliably make for yourself... and then decide whether or not to do it.
Please email me (firstname.lastname@example.org) if you have more ideas for getting the most out of grad school!
When you think of "ultimatums", what comes to mind?
Manipulativeness, maybe? Ultimatums are typically considered a negotiation tactic, and not a very pleasant one.
But there's a different thing that can happen, where an ultimatum is made, but where articulating it isn't a speech act but rather an observation. As in, the ultimatum wasn't created by the act of stating it, but rather, it already existed in some sense.
Some concrete examples: negotiating relationships
I had a tense relationship conversation a few years ago. We'd planned to spend the day together in the park, and I was clearly angsty, so my partner asked me what was going on. I didn't have a good handle on it, but I tried to explain what was uncomfortable for me about the relationship, and how I was confused about what I wanted. After maybe 10 minutes of this, she said, "Look, we've had this conversation before. I don't want to have it again. If we're going to do this relationship, I need you to promise we won't have this conversation again."
I thought about it. I spent a few moments simulating the next months of our relationship. I realized that I totally expected this to come up again, and again. Earlier on, when we'd had the conversation the first time, I hadn't been sure. But it was now pretty clear that I'd have to suppress important parts of myself if I was to keep from having this conversation.
"...yeah, I can't promise that," I said.
"I guess that's it then."
"I guess so."
I think a more self-aware version of me could have recognized, without her prompting, that my discomfort represented an unreconcilable part of the relationship, and that I basically already wanted to break up.
The rest of the day was a bit weird, but it was at least nice that we had resolved this. We'd realized that it was a fact about the world that there wasn't a serious relationship that we could have that we both wanted.
I sensed that when she posed the ultimatum, she wasn't doing it to manipulate me. She was just stating what kind of relationship she was interested in. It's like if you go to a restaurant and try to order a pad thai, and the waiter responds, "We don't have rice noodles or peanut sauce. You either eat somewhere else, or you eat something other than a pad thai."
An even simpler example would be that at the start of one of my relationships, my partner wanted to be monogamous and I wanted to be polyamorous (i.e. I wanted us both to be able to see other people and have other partners). This felt a bit tug-of-war-like, but eventually I realized that actually I would prefer to be single than be in a monogamous relationship.
I expressed this.
It was an ultimatum! "Either you date me polyamorously or not at all." But it wasn't me "just trying to get my way".
I guess the thing about ultimatums in the territory is that there's no bluff to call.
It happened in this case that my partner turned out to be really well-suited for polyamory, and so this worked out really well. We'd decided that if she got uncomfortable with anything, we'd talk about it, and see what made sense. For the most part, there weren't issues, and when there were, the openness of our relationship ended up just being a place where other discomforts were felt, not a generator of disconnection.
Normal ultimatums vs ultimatums in the territory
I use "in the territory" to indicate that this ultimatum isn't just a thing that's said but a thing that is true independently of anything being said. It's a bit of a poetic reference to the map-territory distinction.
No bluffing: preferences are clear
The key distinguishing piece with UITTs is, as I mentioned above, that there's no bluff to call: the ultimatum-maker isn't secretly really really hoping that the other person will choose one option or the other. These are the two best options as far as they can tell. They might have a preference: in the second story above, I preferred a polyamorous relationship to no relationship. But I preferred both of those to a monogamous relationship, and the ultimatum in the territory was me realizing and stating that.
This can actually be expressed formally, using what's called a preference vector. This comes from Keith Hipel at University of Waterloo. If the tables in this next bit doesn't make sense, don't worry about it: all important conclusions are expressed in the text.
First, we'll note that since each of us have two options, a table can be constructed which shows four possible states (numbered 0-3 in the boxes).
This representation is sometimes referred to as matrix form or normal form, and has the advantage of making it really clear who controls which state transitions (movements between boxes). Here, my decision controls which column we're in, and my partner's decision controls which row we're in.
Next, we can consider: of these four possible states, which are most and least preferred, by each person? Here's my preferences, ordered from most to least preferred, left to right. The 1s in the boxes mean that the statement on the left is true.
The order of the states represents my preferences (as I understand them) regardless of what my potential partner's preferences are. I only control movement in the top row (do I insist on polyamory or not). It's possible that they prefer no relationship to a poly relationship, in which case we'll end up in state 2. But I still prefer this state over state 1 (mono relationship) and state 0 (in which I don't ask for polyamory and my partner decides not to date me anyway). So whatever my partners preferences are, I've definitely made a good choice for me, by insisting on polyamory.
This wouldn't be true if I were bluffing (if I preferred state 1 to state 2 but insisted on polyamory anyway). If I preferred 1 to 2, but I bluffed by insisting on polyamory, I would basically be betting on my partner preferring polyamory to no relationship, but this might backfire and get me a no relationship, when both of us (in this hypothetical) would have preferred a monogamous relationship to that. I think this phenomenon is one reason people dislike bluffy ultimatums.
My partner's preferences turned out to be...
You'll note that they preferred a poly relationship to no relationship, so that's what we got! Although as I said, we didn't assume that everything would go smoothly. We agreed that if this became uncomfortable for my partner, then they would tell me and we'd figure out what to do. Another way to think about this is that after some amount of relating, my partner's preference vector might actually shift such that they preferred no relationship to our polyamorous one. In which case it would no longer make sense for us to be together.
UITTs release tension, rather than creating it
In writing this post, I skimmed a wikihow article about how to give an ultimatum, in which they say:
"Expect a negative reaction. Hardly anyone likes being given an ultimatum. Sometimes it may be just what the listener needs but that doesn't make it any easier to hear."
I don't know how accurate the above is in general. I think they're talking about ultimatums like "either you quit smoking or we break up". I can say that expect that these properties of an ultimatum contribute to the negative reaction:
- stated angrily or otherwise demandingly
- more extreme than your actual preferences, because you're bluffing
- refers to what they need to do, versus your own preferences
So this already sounds like UITTs would have less of a negative reaction.
But I think the biggest reason is that they represent a really clear articulation of what one party wants, which makes it much simpler for the other party to decide what they want to do. Ultimatums in the territory tend to also be more of a realization that you then share, versus a deliberate strategy. And this realization causes a noticeable release of tension in the realizer too.
"Either you quit smoking or we break up!"
"I'm realizing that as much as I like our relationship, it's really not working for me to be dating a smoker, so I've decided I'm not going to. Of course, my preferred outcome is that you stop smoking, not that we break up, but I realize that might not make sense for you at this point."
Of course, what's said here doesn't necessarily correspond to the preference vectors shown above. Someone could say the demanding first thing when they actually do have a UITT preference-wise, and someone who's trying to be really NVCy or something might say the sceond thing even though they're actually bluffing and would prefer to . But I think that in general they'll correlate pretty well.
The "realizing" seems similar to what happened to me 2 years ago on my own, when I realized that the territory was issuing me an ultimatum: either you change your habits or you fail at your goals. This is how the world works: your current habits will get you X, and you're declaring you want Y. On one level, it was sad to realize this, because I wanted to both eat lots of chocolate and to have a sixpack. Now this ultimatum is really in the territory.
Another example could be realizing that not only is your job not really working for you, but that it's already not-working to the extent that you aren't even really able to be fully productive. So you don't even have the option of just working a bit longer, because things are only going to get worse at this point. Once you realize that, it can be something of a relief, because you know that even if it's hard, you're going to find something better than your current situation.
More thoughts on the break-up story
One exercise I have left to the reader is creating the preference vectors for the break-up in the first story. HINT: (rot13'd) Vg'f fvzvyne gb gur cersrerapr irpgbef V qvq fubj, jvgu gjb qrpvfvbaf: fur pbhyq vafvfg ba ab shgher fhpu natfgl pbairefngvbaf be abg, naq V pbhyq pbagvahr gur eryngvbafuvc be abg.
An interesting note is that to some extent in that case I wasn't even expressing a preference but merely a prediction that my future self would continue to have this angst if it showed up in the relationship. So this is even more in the territory, in some senses. In my model of the territory, of course, but yeah. You can also think of this sort of as an unconscious ultimatum issued by the part of me that already knew I wanted to break up. It said "it's preferable for me to express angst in this relationship than to have it be angst free. I'd rather have that angst and have it cause a breakup than not have the angst."
I think that ultimatums in the territory are also connected to what I've called Reveal Culture (closely related to Tell Culture, but framed differently). Reveal cultures have the assumption that in some fundamental sense we're on the same side, which makes negotiations a very different thing... more of a collaborative design process. So it's very compatible with the idea that you might just clearly articulate your preferences.
Note that there doesn't always exist a UITT to express. In the polyamory example above, if I'd preferred a mono relationship to no relationship, then I would have had no UITT (though I could have bluffed). In this case, it would be much harder for me to express my preferences, because if I leave them unclear then there can be kind of implicit bluffing. And even once articulated, there's still no obvious choice. I prefer this, you prefer that. We need to compromise or something. It does seem clear that, with these preferences, if we don't end up with some relationship at the end, we messed up... but deciding how to resolve it is outside the scope of this post.
Knowing your own preferences is hard
Another topic this post will point at but not explore is: how do you actually figure out what you want? I think this is a mix of skill and process. You can get better at the general skill by practising trying to figure it out (and expressing it / acting on it when you do, and seeing if that works out well). One process I can think of that would be helpful is Gendlin's Focusing. Nate Soares has written about how introspection is hard and to some extent you don't ever actually know what you want: You don't get to know what you're fighting for. But, he notes,
"There are facts about what we care about, but they aren't facts about the stars. They are facts about us."
And they're hard to figure out. But to the extent that we can do so and then act on what we learn, we can get more of what we want, in relationships, in our personal lives, in our careers, and in the world.
(This article crossposted from my personal blog.)
Previously, I talked about the mystery of pain and pleasure, and how little we know about what sorts of arrangements of particles intrinsically produce them.
Up now: should FAI researchers care about this topic? Is research into the information theory of pain and pleasure relevant for FAI? I believe so! Here are the top reasons I came up with while thinking about this topic.
An important caveat: much depends on whether pain and pleasure (collectively, 'valence') are simple or complex properties of conscious systems. If they’re on the complex end of the spectrum, many points on this list may not be terribly relevant for the foreseeable future. On the other hand, if they have a relatively small “kolmogorov complexity” (e.g., if a ‘hashing function’ to derive valence could fit on a t-shirt), crisp knowledge of valence may be possible sooner rather than later, and could have some immediate relevance to current FAI research directions.
Additional caveats: it’s important to note that none of these ideas are grand, sweeping panaceas, or are intended to address deep metaphysical questions, or aim to reinvent the wheel- instead, they’re intended to help resolve empirical ambiguities and modestly enlarge the current FAI toolbox.
1. Valence research could simplify the Value Problem and the Value Loading Problem. If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI.
2. Valence research could form the basis for a well-defined ‘sanity check’ on AGI behavior. Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness. If there’s be a lot less of that pattern, the intervention is probably a bad idea.
3. Valence research could help us be humane to AGIs and WBEs. There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering. Unfortunately, many of these early systems won’t work well— i.e., they’ll be insane. It would be great if we had a good way to detect profound suffering in such cases and halt the system.
4. Valence research could help us prevent Mind Crimes. Nick Bostrom suggests in Superintelligence that AGIs might simulate virtual humans to reverse-engineer human preferences, but that these virtual humans might be sufficiently high-fidelity that they themselves could meaningfully suffer. We can tell AGIs not to do this- but knowing the exact information-theoretic pattern of suffering would make it easier to specify what not to do.
5. Valence research could enable radical forms of cognitive enhancement. Nick Bostrom has argued that there are hard limits on traditional pharmaceutical cognitive enhancement, since if the presence of some simple chemical would help us think better, our brains would probably already be producing it. On the other hand, there seem to be fewer a priori limits on motivational or emotional enhancement. And sure enough, the most effective “cognitive enhancers” such as adderall, modafinil, and so on seem to work by making cognitive tasks seem less unpleasant or more interesting. If we had a crisp theory of valence, this might enable particularly powerful versions of these sorts of drugs.
6. Valence research could help align an AGI’s nominal utility function with visceral happiness. There seems to be a lot of confusion with regard to happiness and utility functions. In short: they are different things! Utility functions are goal abstractions, generally realized either explicitly through high-level state variables or implicitly through dynamic principles. Happiness, on the other hand, seems like an emergent, systemic property of conscious states, and like other qualia but unlike utility functions, it’s probably highly dependent upon low-level architectural and implementational details and dynamics. In practice, most people most of the time can be said to have rough utility functions which are often consistent with increasing happiness, but this is an awfully leaky abstraction.
My point is that constructing an AGI whose utility function is to make paperclips, and constructing a sentient AGI who is viscerally happy when it makes paperclips, are very different tasks. Moreover, I think there could be value in being able to align these two factors— to make an AGI which is viscerally happy to the exact extent it’s maximizing its nominal utility function.
(Why would we want to do this in the first place? There is the obvious semi-facetious-but-not-completely-trivial answer— that if an AGI turns me into paperclips, I at least want it to be happy while doing so—but I think there’s real potential for safety research here also.)
7. Valence research could help us construct makeshift utility functions for WBEs and Neuromorphic AGIs. How do we make WBEs or Neuromorphic AGIs do what we want? One approach would be to piggyback off of what they already partially and imperfectly optimize for already, and build a makeshift utility function out of pleasure. Trying to shoehorn a utility function onto any evolved, emergent system is going to involve terrible imperfections, uncertainties, and dangers, but if research trends make neuromorphic AGI likely to occur before other options, it may be a case of “something is probably better than nothing.”
One particular application: constructing a “cryptographic reward token” control scheme for WBEs/neuromorphic AGIs. Carl Shulman has suggested we could incentivize an AGI to do what we want by giving it a steady trickle of cryptographic reward tokens that fulfill its utility function- it knows if it misbehaves (e.g., if it kills all humans), it’ll stop getting these tokens. But if we want to construct reward tokens for types of AGIs that don’t intrinsically have crisp utility functions (such as WBEs or neuromorphic AGIs), we’ll have to understand, on a deep mathematical level, what they do optimize for, which will at least partially involve pleasure.
8. Valence research could help us better understand, and perhaps prevent, AGI wireheading. How can AGI researchers prevent their AGIs from wireheading (direct manipulation of their utility functions)? I don’t have a clear answer, and it seems like a complex problem which will require complex, architecture-dependent solutions, but understanding the universe’s algorithm for pleasure might help clarify what kind of problem it is, and how evolution has addressed it in humans.
9. Valence research could help reduce general metaphysical confusion. We’re going to be facing some very weird questions about philosophy of mind and metaphysics when building AGIs, and everybody seems to have their own pet assumptions on how things work. The better we can clear up the fog which surrounds some of these topics, the lower our coordinational friction will be when we have to directly address them.
Successfully reverse-engineering a subset of qualia (valence- perhaps the easiest type to reverse-engineer?) would be a great step in this direction.
10. Valence research could change the social and political landscape AGI research occurs in. This could take many forms: at best, a breakthrough could lead to a happier society where many previously nihilistic individuals suddenly have “skin in the game” with respect to existential risk. At worst, it could be a profound information hazard, and irresponsible disclosure or misuse of such research could lead to mass wireheading, mass emotional manipulation, and totalitarianism. Either way, it would be an important topic to keep abreast of.
These are not all independent issues, and not all are of equal importance. But, taken together, they do seem to imply that reverse-engineering valence will be decently relevant to FAI research, particularly with regard to the Value Problem, reducing metaphysical confusion, and perhaps making the hardest safety cases (e.g., neuromorphic AGIs) a little bit more tractable.
As part of a broader project of promoting rationality, Raelifin and I had some luck in getting media coverage of rationality-informed approaches to probabilistic thinking (1, 2), mental health (1, 2), and reaching life goals through finding purpose and meaning (1, 2). The media includes mainstream media such as the main newspaper in Cleveland, OH; reason-oriented media such as Unbelievers Radio; student-oriented media such as the main newspaper for Ohio State University; and self improvement-oriented media such as the Purpose Revolution.
This is part of our strategy to reach out both to mainstream and to niche groups interested in a specific spin on rationality-informed approaches to winning at life. I wanted to share these here, and see if any of you had suggestions for optimizations of our performance, connections with other media channels both mainstream and nice, and any other thoughts on improving outreach. Thanks!
Irrationality is ingrained in our humanity. It is fundamental to who we are. This is because being human means that you are implemented on kludgy and limited wetware (a human brain). A consequence of this is that biases ↓ and irrational thinking are not mistakes, persay, they are not misfirings or accidental activations of neurons. They are the default mode of operation for wetware that has been optimized for purposes other than truth maximization.
If you want something to blame for the fact that you are innately irrational, then you can blame evolution ↓. Evolution tends to not to produce optimal organisms, but instead produces ones that are kludgy ↓, limited and optimized for criteria relating to ancestral environments rather than for criteria relating to optimal thought.
A kludge is a clumsy or inelegant, yet surprisingly effective, solution to a problem. The human brain is an example of a kludge. It contains many distinct substructures dating from widely separated periods of evolutionary development ↓. An example of this is the two kinds of processes in human cognition where one is fast (type 1) and the other is slow (type2) ↓.
There are many other characteristics of the brain that induce irrationality. The main ones are that:
- The brain is innately limited in its computational abilities and so it must use heuristics ↓, which are mental shortcuts that ease the cognitive load of making a decision.
- The brain has a tendency to blindly use salient or pre-existing responses to answers rather than developing new answers or thoroughly checking pre-existing solutions ↓.
- The brain does not inherently value truth. One of the main reasons for this is that many of the biases can actually be adaptive. An example of an adaptive bias is the sexual over perception bias ↓ in men. From a truth-maximization perspective young men who assume that all women want them are showing severe social-cognitive inaccuracies, judgment biases, and probably narcissistic personality disorder. However, from an evolutionary perspective, the same young men are behaving in a more optimal manner. One which has consistently maximized the reproductive success of their male ancestors. Another similar example is the bias for positive perception of partners ↓.
- The brain acts more like a coherence maximiser than a truth maximiser, which makes people liable to believing falsehoods ↓. If you want to believe something or you are often in situations in which two things just happen to be related then your brain is often by default going to treat them as if they were right ↓.
- The brain trusts its own version of reality much more than other peoples. This makes people defend their beliefs even when doing so is extremely irrational ↓. It is also makes it hard for people to change their minds ↓ and to accept when they are wrong ↓
- Disbelief requires System 2 thought ↓. This means that if system 2 is engaged then we are liable to believe pretty much anything. System 1 is gullible and biased to believe. It is system 2 that is in charge of doubting and disbelieving.
One important non-brain related factor is that we must make use of and live with our current adaptations ↓. People cannot reconform themselves to fulfill purposes suitable to their current environment, but must instead make use of pre-existing machinery that has been optimised for other environments. This means that there is probably never going to be any miracle cures to irrationality because eradicating it would require that you were so fundamentally altered that you were no longer human.
One of the first major steps on the path to becoming more rational, is the realisation that you are not only by default irrational, but that you are always fundamentally comprimised. This doesn't mean that improving your rationality is impossible. It just means that if you stop applying your knowledge of what improves rationality then you will slip back into irrationality. This is because the brain is a kludge. It works most of the time, but in some cases its innate and natural course of action must be diverted if we are to be rational. The good news is that this kind of diversion is possible. This is because humans possess second order thinking ↓. This means that they can observe their inherent flaws and systematic errors. They can then through studying the laws of thought and action apply second order corrections and from doing so become more rational.
The process of applying these second order corrections or training yourself to mitigate the effects of your propensities is called debiasing ↓. Debiasing is not a thing that you can do once and then forget about. It is something that you must either be doing constantly or that you must instill into habits so that it occurs without volitional effort. There are generally three main types of debaising and they are described below:
- Counteracting the effects of bias - this can be done by adjusting your estimates or opinions in order to avoid errors due to biases. This is probably the hardest of the three types of debiasing because to do it correctly you need to know exactly how much you are already biased. This is something that people are rarely aware of.
- Catching yourself when you are being or could be biased and applying a cogntive override. The basic idea behind this is that you observe and track your own thoughts and emotions so that you can catch yourself before you move to deeply into irrational modes of thinking. This is hard because it requires that you have superb self-awareness skills and these often take a long time to develop and train. Once you have caught yourself it is often best to resort to using formal thought in algebra, logic, probability theory or decision theory etc. It is also useful to instill habits in yourself that would allow this observation to occur without conscious and volitional effort. It should be noted that incorrectly applying the first two methods of debiasing can actually make you more biased and that this is a common conundrum and problem faced by beginners to rationality training ↓.
- Understanding the situations which make you biased so that you can avoid them ↓ - the best way to achieve this is simply to ask yourself: how can I become more objective? You do this by taking your biased and faulty perspective as much as possible out of the equation. For example, instead of taking measurements yourself you could get them taken automatically by some scientific instrument.
- Bias - refers to the obstacles to truth which are produced by our kludgy and limited wetware (brains) working exactly the way that they should. ↩
- Evolutionary psychology - the idea of evolution as the idiot designer of humans - that our brains are not consistently well-designed - is a key element of many of the explanations of human errors that appear on this website.
- Slowness of evolution- The tremendously slow timescale of evolution, especially for creating new complex machinery (as opposed to selecting on existing variance), is why the behavior of evolved organisms is often better interpreted in terms of what did in fact work ↩
- Alief - an independent source of emotional reaction which can coexist with a contradictory belief. For example, the fear felt when a monster jumps out of the darkness in a scary movie is based on the alief that the monster is about to attack you, even though you believe that it cannot.
- Wanting and liking - The reward system consists of three major components:
- Liking: The 'hedonic impact' of reward, comprised of (1) neural processes that may or may not be conscious and (2) the conscious experience of pleasure.
- Wanting: Motivation for reward, comprised of (1) processes of 'incentive salience' that may or may not be conscious and (2) conscious desires.
- Learning: Associations, representations, and predictions about future rewards, comprised of (1) explicitpredictions and (2) implicit knowledge and associative conditioning (e.g. Pavlovian associations). ↩
- Heuristics and biases - program in cognitive psychology tries to work backward from biases (experimentally reproducible human errors) to heuristics (the underlying mechanisms at work in the brain). ↩
- Cached thought – is an answer that was arrived at by recalling a previously-computed conclusion, rather than performing the reasoning from scratch. ↩
- Sympathetic Magic - humans seem to naturally generate a series of concepts known as sympathetic magic, a host of theories and practices which have certain principles in common, two of which are of overriding importance: the Law of Contagion holds that two things which have interacted, or were once part of a single entity, retain their connection and can exert influence over each other; the Law of Similarity holds that things which are similar or treated the same establish a connection and can affect each other. ↩
- Motivated Cognition - an academic/technical term for various mental processes that lead to desired conclusions regardless of the veracity of those conclusions.
- Rationalization - Rationalization starts from a conclusion, and then works backward to arrive at arguments apparently favoring that conclusion. Rationalization argues for a side already selected; rationality tries to choose between sides. ↩
- Opps - There is a powerful advantage to admitting you have made a large mistake. It's painful. It can also change your whole life. ↩
- Adaptation executors - Individual organisms are best thought of as adaptation-executers rather than as fitness-maximizers. Our taste buds do not find lettuce delicious and cheeseburgers distasteful once we are fed a diet too high in calories and too low in micronutrients. Tastebuds are adapted to an ancestral environment in which calories, not micronutrients, were the limiting factor. Evolution operates on too slow a timescale to re-adapt to adapt to a new conditions (such as a diet).
- Corrupted hardware - our brains do not always allow us to act the way we should. Corrupted hardware refers to those behaviors and thoughts that act for ancestrally relevant purposes rather than for stated moralities and preferences. ↩
- Debiasing - The process of overcoming bias. It takes serious study to gain meaningful benefits, half-hearted attempts may accomplish nothing, and partial knowledge of bias may do more harm than good. ↩
- Costs of rationality - Becoming more epistemically rational can only guarantee one thing: what you believe will include more of the truth. Knowing that truth might help you achieve your goals, or cause you to become a pariah. Be sure that you really want to know the truth before you commit to finding it; otherwise, you may flinch from it.
- Valley of bad rationality - It has been observed that when someone is just starting to learn rationality, they appear to be worse off than they were before. Others, with more experience at rationality, claim that after you learn more about rationality, you will be better off than you were before you started. The period before this improvement is known as "the valley of bad rationality".
- Dunning–Kruger effect - is a cognitive bias wherein unskilled individuals suffer from illusory superiority, mistakenly assessing their ability to be much higher than is accurate. This bias is attributed to a metacognitive inability of the unskilled to recognize their ineptitude. Conversely, highly skilled individuals tend to underestimate their relative competence, erroneously assuming that tasks that are easy for them are also easy for others. ↩
- Shut up and multiply - In cases where we can actually do calculations with the relevant quantities. The ability to shut up and multiply, to trust the math even when it feels wrong is a key rationalist skill. ↩
- Cognitive science of rationality - discusses fast(Type 1), slow (Type 2) processes of cognition, thinking errors and the three kinds of minds (reflective, algorithmic, autonomous). ↩
- The Lens That Sees Its Own Flaws - a human brain is a flawed lens that can understand its own flaws—its systematic errors, its biases—and apply second-order corrections to them. ↩
- We Change Our Minds Less Than We Think - between hindsight bias, fake causality, positive bias, anchoring/priming, et cetera et cetera, and above all the dreaded confirmation bias, once an idea gets into your head, it's probably going to stay there. ↩
- You Are A Brain - 'You Are A Brain' is a presentation by Liron Shapira that is tailored for a general audience and provides an introduction to some of the the core LessWrong concepts.
- Your intuitions are not magic - blindly following our intuitions can cause our careers, relationships or lives to crash and burn, because we did not think of the possibility that we might be wrong.
- To Spread Science, Keep It Secret - People seem to have holes in their minds for Esoteric Knowledge, Deep Secrets, the Hidden Truth. We've gotten into the habit of presenting the Hidden Truth in a very unsatisfying way, wrapped up in false mundanity.
- Marcus,Kluge: The Haphazard Evolution of the Human Mind ↩
- Chabris, The Invisible Gorilla: How Our Intuitions Deceive Us
- Kurzban, Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
- Dawkins, The Selfish Gene: 30th Anniversary Edition--with a new Introduction by the Author
- McCauley, Why Religion is Natural and Science is Not ↩
- Haselton, M. (2003). The sexual overperception bias: Evidence of a systematic bias in men from a survey of naturally occurring events. Journal of Research in Personality, 34-47.
- Hasselton, M., & Buss, D. (2000). Error Management Theory: A New Perspective on Biases in Cross-Sex Mind Reading. Jounral of Personality and Social Psychology, 81-91. ↩
- Murray, S., Griffin, D., & Holmes, J. (1996). The Self-Fulfilling Nature of Positive Illusions in Romantic Relationships: Love Is Not Blind, but Prescient. Journal of Personality and Social Psychology,, 1155-1180. ↩
- Gilbert, D.T., Tafarodi, R.W. and Malone, P.S. (1993) You can't not believe everything you read. Journal of Personality and Social Psychology, 65, 221-233 ↩
Notes on decisions I have made while creating this post
(these notes will not be in the final draft):
- This post doesn't have any specific details on debiasing or the biases. I plan to provide these details in later posts. The main point of this post is convey the idea in the title.
I recently encountered something that is, in my opinion, one of the most absurd failure modes of the human brain. I first encountered this after introspection on useful things that I enjoy doing, such as programming and writing. I noticed that my enjoyment of the activity doesn't seem to help much when it comes to motivation for earning income. This was not boredom from too much programming, as it did not affect my interest in personal projects. What it seemed to be, was the brain categorizing activities into "work" and "fun" boxes. On one memorable occasion, after taking a break due to being exhausted with work, I entertained myself, by programming some more, this time on a hobby personal project (as a freelancer, I pick the projects I work on so this is not from being told what to do). Relaxing by doing the exact same thing that made me exhausted in the first place.
The absurdity of this becomes evident when you think about what distinguishes "work" and "fun" in this case, which is added value. Nothing changes about the activity except the addition of more utility, making a "work" strategy always dominate a "fun" strategy, assuming the activity is the same. If you are having fun doing something, handing you some money can't make you worse off. Making an outcome better makes you avoid it. Meaning that the brain is adopting a strategy that has a (side?) effect of minimizing future utility, and it seems like it is utility and not just money here - as anyone who took a class in an area that personally interested them knows, other benefits like grades recreate this effect just as well. This is the reason I think this is among the most absurd biases - I can understand akrasia, wanting the happiness now and hyperbolically discounting what happens later, or biases that make something seem like the best option when it really isn't. But knowingly punishing what brings happiness just because it also benefits you in the future? It's like the discounting curve dips into the negative region. I would really like to learn where is the dividing line between which kinds of added value create this effect and which ones don't (like money obviously does, and immediate enjoyment obviously doesn't). Currently I'm led to believe that the difference is present utility vs. future utility, (as I mentioned above) or final vs. instrumental goals, and please correct me if I'm wrong here.
This is an effect that has been studied in psychology and called the overjustification effect, called that because the leading theory explains it in terms of the brain assuming the motivation comes from the instrumental gain instead of the direct enjoyment, and then reducing the motivation accordingly. This would suggest that the brain has trouble seeing a goal as being both instrumental and final, and for some reason the instrumental side always wins in a conflict. However, its explanation in terms of self-perception bothers me a little, since I find it hard to believe that a recent creation like self-perception can override something as ancient and low-level as enjoyment of final goals. I searched LessWrong for discussions of the overjustification effect, and the ones I found discussed it in the context of self-perception, not decision-making and motivation. It is the latter that I wanted to ask for your thoughts on.
View more: Next