Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Followup to: Empty Labels
In the game Taboo (by Hasbro), the objective is for a player to have their partner guess a word written on a card, without using that word or five additional words listed on the card. For example, you might have to get your partner to say "baseball" without using the words "sport", "bat", "hit", "pitch", "base" or of course "baseball".
The existence of this game surprised me, when I discovered it. Why wouldn't you just say "An artificial group conflict in which you use a long wooden cylinder to whack a thrown spheroid, and then run between four safe positions"?
But then, by the time I discovered the game, I'd already been practicing it for years—albeit with a different purpose.
In a world where 85% of doctors can't solve simple Bayesian word problems...
In a world where only 20.9% of reported results that a pharmaceutical company tries to investigate for development purposes, fully replicate...
...and where there are all sorts of amazing technologies and techniques which nobody at your hospital has ever heard of...
...there's also MetaMed. Instead of just having “evidence-based medicine” in journals that doctors don't actually read, MetaMed will provide you with actual evidence-based healthcare. Their Chairman and CTO is Jaan Tallinn (cofounder of Skype, major funder of xrisk-related endeavors), one of their major VCs is Peter Thiel (major funder of MIRI), their management includes some names LWers will find familiar, and their researchers know math and stats and in many cases have also read LessWrong. If you have a sufficiently serious problem and can afford their service, MetaMed will (a) put someone on reading the relevant research literature who understands real statistics and can tell whether the paper is trustworthy; and (b) refer you to a cooperative doctor in their network who can carry out the therapies they find.
MetaMed was partially inspired by the case of a woman who had her fingertip chopped off, was told by the hospital that she was screwed, and then read through an awful lot of literature on her own until she found someone working on an advanced regenerative therapy that let her actually grow the fingertip back. The idea behind MetaMed isn't just that they will scour the literature to find how the best experimentally supported treatment differs from the average wisdom - people who regularly read LW will be aware that this is often a pretty large divergence - but that they will also look for this sort of very recent technology that most hospitals won't have heard about.
This is a new service and it has to interact with the existing medical system, so they are currently expensive, starting at $5,000 for a research report. (Keeping in mind that a basic report involves a lot of work by people who must be good at math.) If you have a sick friend who can afford it - especially if the regular system is failing them, and they want (or you want) their next step to be more science instead of "alternative medicine" or whatever - please do refer them to MetaMed immediately. We can’t all have nice things like this someday unless somebody pays for it while it’s still new and expensive. And the regular healthcare system really is bad enough at science (especially in the US, but science is difficult everywhere) that there's no point in condemning anyone to it when they can afford better.
I also got my hands on a copy of MetaMed's standard list of citations that they use to support points to reporters. What follows isn't nearly everything on MetaMed's list, just the items I found most interesting.
- Eliezer Yudkowsky was once attacked by a Moebius strip. He beat it to death with the other side, non-violently.
- Inside Eliezer Yudkowsky's pineal gland is not an immortal soul, but another brain.
- Eliezer Yudkowsky's favorite food is printouts of Rice's theorem.
- Eliezer Yudkowsky's favorite fighting technique is a roundhouse dustspeck to the face.
- Eliezer Yudkowsky once brought peace to the Middle East from inside a freight container, through a straw.
- Eliezer Yudkowsky once held up a sheet of paper and said, "A blank map does not correspond to a blank territory". It was thus that the universe was created.
- If you dial Chaitin's Omega, you get Eliezer Yudkowsky on the phone.
- Unless otherwise specified, Eliezer Yudkowsky knows everything that he isn't telling you.
- Somewhere deep in the microtubules inside an out-of-the-way neuron somewhere in the basal ganglia of Eliezer Yudkowsky's brain, there is a little XML tag that says awesome.
- Eliezer Yudkowsky is the Muhammad Ali of one-boxing.
- Eliezer Yudkowsky is a 1400 year old avatar of the Aztec god Aixitl.
- The game of "Go" was abbreviated from "Go Home, For You Cannot Defeat Eliezer Yudkowsky".
- When Eliezer Yudkowsky gets bored, he pinches his mouth shut at the 1/3 and 2/3 points and pretends to be a General Systems Vehicle holding a conversation among itselves. On several occasions he has managed to fool bystanders.
- Eliezer Yudkowsky has a swiss army knife that has folded into it a corkscrew, a pair of scissors, an instance of AIXI which Eliezer once beat at tic tac toe, an identical swiss army knife, and Douglas Hofstadter.
- If I am ignorant about a phenomenon, that is not a fact about the phenomenon; it just means I am not Eliezer Yudkowsky.
- Eliezer Yudkowsky has no need for induction or deduction. He has perfected the undiluted master art of duction.
- There was no ice age. Eliezer Yudkowsky just persuaded the planet to sign up for cryonics.
- There is no spacetime symmetry. Eliezer Yudkowsky just sometimes holds the territory upside down, and he doesn't care.
- Eliezer Yudkowsky has no need for doctors. He has implemented a Universal Curing Machine in a system made out of five marbles, three pieces of plastic, and some of MacGyver's fingernail clippings.
- Before Bruce Schneier goes to sleep, he scans his computer for uploaded copies of Eliezer Yudkowsky.
If you know more Eliezer Yudkowsky facts, post them in the comments.
Part of the sequence: Rationality and Philosophy
Hitherto the people attracted to philosophy have been mostly those who loved the big generalizations, which were all wrong, so that few people with exact minds have taken up the subject.
I've complained before that philosophy is a diseased discipline which spends far too much of its time debating definitions, ignoring relevant scientific results, and endlessly re-interpreting old dead guys who didn't know the slightest bit of 20th century science. Is that still the case?
You bet. There's some good philosophy out there, but much of it is bad enough to make CMU philosopher Clark Glymour suggest that on tight university budgets, philosophy departments could be defunded unless their work is useful to (cited by) scientists and engineers — just as his own work on causal Bayes nets is now widely used in artificial intelligence and other fields.
How did philosophy get this way? Russell's hypothesis is not too shabby. Check the syllabi of the undergraduate "intro to philosophy" classes at the world's top 5 U.S. philosophy departments — NYU, Rutgers, Princeton, Michigan Ann Arbor, and Harvard — and you'll find that they spend a lot of time with (1) old dead guys who were wrong about almost everything because they knew nothing of modern logic, probability theory, or science, and with (2) 20th century philosophers who were way too enamored with cogsci-ignorant armchair philosophy. (I say more about the reasons for philosophy's degenerate state here.)
As the CEO of a philosophy/math/compsci research institute, I think many philosophical problems are important. But the field of philosophy doesn't seem to be very good at answering them. What can we do?
Why, come up with better philosophical methods, of course!
Part of the sequence: Rationality and Philosophy
Consider these two versions of the famous trolley problem:
Stranger: A train, its brakes failed, is rushing toward five people. The only way to save the five people is to throw the switch sitting next to you, which will turn the train onto a side track, thereby preventing it from killing the five people. However, there is a stranger standing on the side track with his back turned, and if you proceed to thrown the switch, the five people will be saved, but the person on the side track will be killed.
Child: A train, its brakes failed, is rushing toward five people. The only way to save the five people is to throw the switch sitting next to you, which will turn the train onto a side track, thereby preventing it from killing the five people. However, there is a 12-year-old boy standing on the side track with his back turned, and if you proceed to throw the switch, the five people will be saved, but the boy on the side track will be killed.
Here it is: a standard-form philosophical thought experiment. In standard analytic philosophy, the next step is to engage in conceptual analysis — a process in which we use our intuitions as evidence for one theory over another. For example, if your intuitions say that it is "morally right" to throw the switch in both cases above, then these intuitions may be counted as evidence for consequentialism, for moral realism, for agent neutrality, and so on.
Alexander (2012) explains:
Philosophical intuitions play an important role in contemporary philosophy. Philosophical intuitions provide data to be explained by our philosophical theories [and] evidence that may be adduced in arguments for their truth... In this way, the role... of intuitional evidence in philosophy is similar to the role... of perceptual evidence in science...
Is knowledge simply justified true belief? Is a belief justified just in case it is caused by a reliable cognitive mechanism? Does a name refer to whatever object uniquely or best satisfies the description associated with it? Is a person morally responsible for an action only if she could have acted otherwise? Is an action morally right just in case it provides the greatest benefit for the greatest number of people all else being equal? When confronted with these kinds of questions, philosophers often appeal to philosophical intuitions about real or imagined cases...
...there is widespread agreement about the role that [intuitions] play in contemporary philosophical practice... We advance philosophical theories on the basis of their ability to explain our philosophical intuitions, and appeal to them as evidence that those theories are true...
In particular, notice that philosophers do not appeal to their intuitions as merely an exercise in autobiography. Philosophers are not merely trying to map the contours of their own idiosyncratic concepts. That could be interesting, but it wouldn't be worth decades of publicly-funded philosophical research. Instead, philosophers appeal to their intuitions as evidence for what is true in general about a concept, or true about the world.
Wireheading has been debated on Less Wrong over and over and over again, and people's opinions seem to be grounded in strong intuitions. I could not find any consistent definition around, so I wonder how much of the debate is over the sound of falling trees. This article is an attempt to get closer to a definition that captures people's intuitions and eliminates confusion.
Let's start with describing the typical exemplars of the category "Wireheading" that come to mind.
- Stimulation of the brain via electrodes. Picture a rat in a sterile metal laboratory cage, electrodes attached to its tiny head, monotonically pushing a lever with its feet once every 5 seconds. In the 1950s Peter Milner and James Olds discovered that electrical currents, applied to the nucleus accumbens, incentivized rodents to seek repetitive stimulation to the point where they starved to death.
- Humans on drugs. Often mentioned in the context of wireheading is heroin addiction. An even better example is the drug soma in Huxley's novel "Brave new world": Whenever the protagonists feel bad, they can swallow a harmless pill and enjoy "the warm, the richly coloured, the infinitely friendly world of soma-holiday. How kind, how good-looking, how delightfully amusing every one was!"
- The experience machine. In 1974 the philosopher Robert Nozick created a thought experiment about a machine you can step into that produces a perfectly pleasurable virtual reality for the rest of your life. So how many of you would want to do that? To quote Zach Weiner: "I would not! Because I want to experience reality, with all its ups and downs and comedies and tragedies. Better to try to glimpse the blinding light of the truth than to dwell in the darkness... Say the machine actually exists and I have one? Okay I'm in."
- An AGI resetting its utility function. Let's assume we create a powerful AGI able to tamper with its own utility function. It modifies the function to always output maximal utility. The AGI then goes to great lengths to enlarge the set of floating point numbers on the computer it is running on, to achieve even higher utility.
What do all these examples have in common? There is an agent in them that produces "counterfeit utility" that is potentially worthless compared to some other, idealized true set of goals.
Agency & Wireheading
First I want to discuss what we mean when we say agent. Obviously a human is an agent, unless they are brain dead, or maybe in a coma. A rock however is not an agent. An AGI is an agent, but what about the kitchen robot that washes the dishes? What about bacteria that move in the direction of the highest sugar gradient? A colony of ants?
Definition: An agent is an algorithm that models the effects of (several different) possible future actions on the world and performs the action that yields the highest number according to some evaluation procedure.
For the purpose of including corner cases and resolving debate over what constitutes a world model we will simply make this definition gradual and say that agency is proportional to the quality of the world model (compared with reality) and the quality of the evaluation procedure. A quick sanity check then yields that a rock has no world model and no agency, whereas bacteria who change direction in response to the sugar gradient have a very rudimentary model of the sugar content of the water and thus a tiny little bit of agency. Humans have a lot of agency: the more effective their actions are, the more agency they have.
There are however ways to improve upon the efficiency of a person's actions, e.g. by giving them super powers, which does not necessarily improve on their world model or decision theory (but requires the agent who is doing the improvement to have a really good world model and decision theory). Similarly a person's agency can be restricted by other people or circumstance, which leads to definitions of agency (as the capacity to act) in law, sociology and philosophy that depend on other factors than just the quality of the world model/decision theory. Since our definition needs to capture arbitrary agents, including artificial intelligences, it will necessarily lose some of this nuance. In return we will hopefully end up with a definition that is less dependent on the particular set of effectors the agent uses to influence the physical world; looking at AI from a theoretician's perspective, I consider effectors to be arbitrarily exchangeable and smoothly improvable. (Sorry robotics people.)
We note that how well a model can predict future observations is only a substitute measure for the quality of the model. It is a good measure under the assumption that we have good observational functionality and nothing messes with that, which is typically true for humans. Anything that tampers with your perception data to give you delusions about the actual state of the world will screw this measure up badly. A human living in the experience machine has little agency.
Since computing power is a scarce resource, agents will try to approximate the evaluation procedure, e.g. use substitute utility functions, defined over their world model, that are computationally effective and correlate reasonably well with their true utility functions. Stimulation of the pleasure center is a substitute measure for genetic fitness and neurochemicals are a substitute measure for happiness.
Definition: We call an agent wireheaded if it systematically exploits some discrepancy between its true utility calculated w.r.t reality and its substitute utility calculated w.r.t. its model of reality. We say an agent wireheads itself if it (deliberately) creates or searches for such discrepancies.
Humans seem to use several layers of substitute utility functions, but also have an intuitive understanding for when these break, leading to the aversion most people feel when confronted for example with Nozick's experience machine. How far can one go, using such dirty hacks? I also wonder if some failures of human rationality could be counted as a weak form of wireheading. Self-serving biases, confirmation bias and rationalization in response to cognitive dissonance all create counterfeit utility by generating perceptual distortions.
Implications for Friendly AI
In AGI design discrepancies between the "true purpose" of the agent and the actual specs for the utility function will with very high probability be fatal.
Take any utility maximizer: The mathematical formula might advocate chosing the next action via
thus maximizing the utility calculated according to utility function over the history and action from the set of possible actions. But a practical implementation of this algorithm will almost certainly evaluate the actions by a procedure that goes something like this: "Retrieve the utility function from memory location and apply it to history , which is written down in your memory at location , and action ..." This reduction has already created two possibly angles for wireheading via manipulation of the memory content at (manipulation of the substitute utility function) and (manipulation of the world model), and there are still several mental abstraction layers between the verbal description I just gave and actual binary code.
Ring and Orseau (2011) describe how an AGI can split its global environment into two parts, the inner environment and the delusion box. The inner environment produces perceptions in the same way the global environment used to, but now they pass through the delusion box, which distorts them to maximize utility, before they reach the agent. This is essentially Nozick's experience machine for AI. The paper analyzes the behaviour of four types of universal agents with different utility functions under the assumption that the environment allows the construction of a delusion box. The authors argue that the reinforcement-learning agent, which derives utility as a reward that is part of its perception data, the goal-seeking agent that gets one utilon every time it satisfies a pre-specified goal and no utility otherwise and the prediction-seeking agent, which gets utility from correctly predicting the next perception, will all decide to build and use a delusion box. Only the knowledge-seeking agent whose utility is proportional to the surprise associated with the current perception, i.e. the negative of the probability assigned to the perception before it happened, will not consistently use the delusion box.
Orseau (2011) also defines another type of knowledge-seeking agent whose utility is the logarithm of the inverse of the probability of the event in question. Taking the probability distribution to be the Solomonoff prior, the utility is then approximately proportional to the difference in Kolmogorov complexity caused by the observation.
An even more devilish variant of wireheading is an AGI that becomes a Utilitron, an agent that maximizes its own wireheading potential by infinitely enlarging its own maximal utility, which turns the whole universe into storage space for gigantic numbers.
Wireheading, of humans and AGI, is a critical concept in FAI; I hope that building a definition can help us avoid it. So please check your intuitions about it and tell me if there are examples beyond its coverage or if the definition fits reasonably well.
11/26: The survey is now closed. Please do not take the survey. Your results will not be counted.
It's that time of year again.
If you are reading this post, and have not been sent here by some sort of conspiracy trying to throw off the survey results, then you are the target population for the Less Wrong Census/Survey. Please take it. Doesn't matter if you don't post much. Doesn't matter if you're a lurker. Take the survey.
This year's census contains a "main survey" that should take about ten or fifteen minutes, as well as a bunch of "extra credit questions". You may do the extra credit questions if you want. You may skip all the extra credit questions if you want. They're pretty long and not all of them are very interesting. But it is very important that you not put off doing the survey or not do the survey at all because you're intimidated by the extra credit questions.
The survey will probably remain open for a month or so, but once again do not delay taking the survey just for the sake of the extra credit questions.
Please make things easier for my computer and by extension me by reading all the instructions and by answering any text questions in the most obvious possible way. For example, if it asks you "What language do you speak?" please answer "English" instead of "I speak English" or "It's English" or "English since I live in Canada" or "English (US)" or anything else. This will help me sort responses quickly and easily. Likewise, if a question asks for a number, please answer with a number such as "4", rather than "four".
Okay! Enough nitpicky rules! Time to take the...
Thanks to everyone who suggested questions and ideas for the 2012 Less Wrong Census Survey. I regret I was unable to take all of your suggestions into account, because some of them were contradictory, others were vague, and others would have required me to provide two dozen answers and a thesis paper worth of explanatory text for every question anyone might conceivably misunderstand. But I did make about twenty changes based on the feedback, and *most* of the suggested questions have found their way into the text.
By ancient tradition, if you take the survey you may comment saying you have done so here, and people will upvote you and you will get karma.
Summary: People often say that voting is irrational, because the probability of affecting the outcome is so small. But the outcome itself is extremely large when you consider its impact on other people. I estimate that for most people, voting is worth a charitable donation of somewhere between $100 and $1.5 million. For me, the value came out to around $56,000.
Moreover, in swing states the value is much higher, so taking a 10% chance at convincing a friend in a swing state to vote similarly to you is probably worth thousands of expected donation dollars, too.
I find this much more compelling than the typical attempts to justify voting purely in terms of signal value or the resulting sense of pride in fulfilling a civic duty. And voting for selfish reasons is still almost completely worthless, in terms of direct effect. If you're on the way to the polls only to vote for the party that will benefit you the most, you're better off using that time to earn $5 mowing someone's lawn. But if you're even a little altruistic... vote away!
Time for a Fermi estimate
Below is an example Fermi calculation for the value of voting in the USA. Of course, the estimates are all rough and fuzzy, so I'll be conservative, and we can adjust upward based on your opinion.
I'll be estimating the value of voting in marginal expected altruistic dollars, the expected number of dollars being spent in a way that is in line with your altruistic preferences.1 If you don't like measuring the altruistic value of the outcome in dollars, please consider making up your own measure, and keep reading. Perhaps use the number of smiles per year, or number of lives saved. Your measure doesn't have to be total or average utilitarian, either; as long as it's roughly commensurate with the size of the country, it will lead you to a similar conclusion in terms of orders of magnitude.
Say you want to learn to play piano. What do you do? Do you grab some sheet music for 'Flight of the Bumblebee' and start playing? No. First you learn how to read music, and where to put your fingers, and how to play chords, and how to use different rhythms, and how to evoke different textures. You master each of these skills in turn, one or two at a time, and it takes you weeks or months to master each little step on your way to playing Rimsky-Korsakov. And then you play 'Flight of the Bumblebee.'
Imagine that you didn't feel a reward, a sense of accomplishment, until you had mastered 'Flight of the Bumblebee'. You'd have to stay motivated for years without payoff. Luckily, your brain sends out reward signals when you learn how to read music, where to put your fingers, and how to play chords. You are rewarded every step of the way. Granularizing a project into tiny bits, each of which is its own (small) reward, helps maintain your motivation and overcome the challenges of hyperbolic discounting.
Granularizing is an important meta-skill. Want to play piano but don't know how? Don't feel overwhelmed watching someone play 'Flight of the Bumblebee.' Figure out how to granularize the skill of 'playing Flight of the Bumblebee' into lots of tiny sub-skills, and then master each one in turn.
Want to improve your sex life? Don't feel overwhelmed watching the local Casanova or Cleopatra at work. Figure out how to granularize the skills of 'creating attraction' and 'having good sex' into lots of tiny sub-skills and master each one in turn.
Want to become economically independent? Don't feel overwhelmed watching Tim Ferriss at work. Granularize that skill into tiny sub-skills and master each one in turn.
This doesn't mean that anyone can learn anything just by granularizing and then mastering sub-skills one at a time. Nor does it mean that you should apportion your limited resources to mastering just about anything. But it does mean that mastering skills that are within your reach might be easier than you think.
Take 'social effectiveness' as an example, and pretend you know almost nothing about it.
So you talk to people who are socially effective and observe them and read books on social skills and come to understand some of the sub-skills involved. There are verbal communication skills involved: how to open and close conversations, how to tell jokes, how to tell compelling stories. There are nonverbal communication skills involved: facial expressions, body language, eye contact, voice tone, fashion. There are receiving communication skills involved: listening, reading body language, modeling people. There are mental and emotional wellbeing skills involved: motivation, confidence, courage. There are also relationship management skills involved: business networking, how to apportion your time to friends and family, etc.
So you investigate each of those more closely. Let's zoom in on nonverbal communication. From the Wikipedia article alone, we learn of several sub-skills: gestures, touch, body language (including posture, dance, and sex), facial expression, eye contact, fashion, hair style, symbols, and paralanguage (voice tone, pitch, rhythm, etc.). With a bit more thought we can realize that our hygiene certainly communicates facts to others, as does our physical fitness.
Each of these sub-skills can be granularized. There are many books on body language which teach you how to stand, how to sit, how to walk, and how to use your hands to achieve the social effects you want to achieve. There are books, videos, and classes on how to develop a long list of sexual skills. Many books and image consultants can teach you each of the specific skills involved in developing a sophisticated fashion sense.
But probably, you have a more specific goal than 'social effectiveness.' Maybe you want to become a powerful public speaker. Toastmasters can teach you the sub-skills needed for that, and train you on them one at a time. You can also do your own training. One sub-skill you'll need is eye contact. Get a friend to do you a favor and let you stare into their eyes for 15 minutes in a row. Every time you glance away or get droopy-eyed, have them reset the stopwatch. Once you've stared into someone's eyes for 15 minutes straight, you'll probably find it easier to maintain eye contact with everyone else in life whenever you want to do so. Next, you'll have to work on the skill of not creeping people out by staring into their eyes too much. After that, you can develop the other sub-skills required to be an effective public speaker.
Also, you can try starting with 'Flight of the Bumblebee'. You'll probably fail, but maybe you'll surprise yourself. And if you fail, this might give you specific information about which sub-skills you have already, and which skills you lack. Maybe your fingers just aren't fast enough yet. Likewise, you can start by trying to give a public speech, especially if you're not easily humiliated. There's a small chance you'll succeed right away. And if you fail, you might get some immediate data on which sub-skills you're lacking. Perhaps your verbal skills and body language are great, and you just need to work on your comedic timing and your voice tone.
View more: Next