Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
I am surrounded by well-meaning people trying to take responsibility for the future of the universe. I think that this attitude – prominent among Effective Altruists – is causing great harm. I noticed this as part of a broader change in outlook, which I've been trying to describe on this blog in manageable pieces (and sometimes failing at the "manageable" part).
I'm going to try to contextualize this by outlining the structure of my overall argument.
Why I am worried
Effective Altruists often say they're motivated by utilitarianism. At its best, this leads to things like Katja Grace's excellent analysis of when to be a vegetarian. We need more of this kind of principled reasoning about tradeoffs.
At its worst, this leads to some people angsting over whether it's ethical to spend money on a cup of coffee when they might have saved a life, and others using the greater good as license to say things that are not quite true, socially pressure others into bearing inappropriate burdens, and make ever-increasing claims on resources without a correspondingly strong verified track record of improving people's lives. I claim that these actions are not in fact morally correct, and that people keep winding up endorsing those conclusions because they are using the wrong cognitive approximations to reason about morality.
Summary of the argument
- When people take responsibility for something, they try to control it. So, universal responsibility implies an attempt at universal control.
- Maximizing control has destructive effects:
- An adversarial stance towards other agents.
- Decision paralysis.
- These failures are not accidental, but baked into the structure of control-seeking. We need a practical moral philosophy to describe strategies that generalize better, and benefit from the existence of other benevolent agents, rather than treating them primarily as threats.
Responsibility implies control
In practice, the way I see the people around me applying utilitarianism, it seems to make two important moral claims:
- You - you, personally - are responsible for everything that happens.
- No one is allowed their own private perspective - everyone must take the public, common perspective.
The first principle is almost but not quite simple consequentialism. But it's important to note that it actually doesn't generalize; it's massive double-counting if each individual person is responsible for everything that happens. I worked through an example of the double-counting problem in my post on matching donations.
The second principle follows from the first one. If you think you're personally responsible for everything that happens, and obliged to do something about that rather than weigh your taste accordingly – and you also believe that there are ways to have an outsized impact (e.g. that you can reliably save a life for a few thousand dollars) – then in some sense nothing is yours. The money you spent on that cup of coffee could have fed a poor family for a day in the developing world. It's only justified if the few minutes you save somehow produce more value.
One way of resolving this is simply to decide that you're entitled to only as much as the global poor, and try to do without the rest to improve their lot. This is the reasoning behind the notorious demandingness of utilitarianism.
But of course, other people are also making suboptimal uses of resources. So if you can change that, then it becomes your responsibility to do so.
In general, if Alice and Bob both have some money, and Alice is making poor use of money by giving to the Society to Cure Rare Diseases in Cute Puppies, and Bob is giving money to comparatively effective charities like the Against Malaria Foundation, then if you can cause one of them to have access to more money, you'd rather help Bob than Alice.
There's no reason for this to be different if you are one of Bob and Alice. And since you've already rejected your own private right to hold onto things when there are stronger global claims to do otherwise, there's no principled reason not to try to reallocate resources from the other person to you.
What you're willing to do to yourself, you'll be willing to do to others. Respecting their autonomy becomes a mere matter of either selfishly indulging your personal taste for "deontological principles," or a concession made because they won't accept your leadership if you're too demanding - not a principled way to cooperate with them. You end up trying to force yourself and others to obey your judgment about what actions are best.
If you think of yourself as a benevolent agent, and think of the rest of the world and all the people in it in as objects with regular, predictable behaviors you can use to improve outcomes, then you'll feel morally obliged - and therefore morally sanctioned - to shift as much of the locus of control as possible to yourself, for the greater good.
If someone else seems like a better candidate, then the right thing to do seems like throwing your lot in with them, and transferring as much as you can to them rather than to yourself. So this attitude towards doing good leads either to personal control-seeking, or support of someone else's bid for the same.
I think that this reasoning is tacitly accepted by many Effective Altruists, and explains two seemingly opposite things:
- Some EAs get their act together and make power plays, implicitly claiming the right to deceive and manipulate to implement their plan.
- Some EAs are paralyzed by the impossibility of weighing the consequences for the universe of every act, and collapse into perpetual scrupulosity and anxiety, mitigated only by someone else claiming legitimacy, telling them what to do, and telling them how much is enough.
Interestingly, people in the second category are somewhat useful for people following the strategy of the first category, as they demonstrate demand for the service of telling other people what to do. (I think the right thing to do is largely to decline to meet this demand.)
Objectivists sometimes criticize "altruistic" ventures by insisting on Ayn Rand's definition of altruism as the drive to self-abnegation, rather than benevolence. I used to think that this was obnoxiously missing the point, but now I think this might be a fair description of a large part of what I actually see. (I'm very much not sure I'm right. I am sure I'm not describing all of Effective Altruism – many people are doing good work for good reasons.)
Control-seeking is harmful
You have to interact with other people somehow, since they're where most of the value is in our world, and they have a lot of causal influence on the things you care about. If you don't treat them as independent agents, and you don't already rule over them, you will default to going to war against them (and more generally trying to attain control and then make all the decisions) rather than trading with them (or letting them take care of a lot of the decisionmaking). This is bad because it destroys potential gains from trade and division of labor, because you win conflicts by destroying things of value, and because even when you win you unnecessarily become a bottleneck.
People who think that control-seeking is the best strategy for benevolence tend to adopt plans like this:
Step 1 – acquire control over everything.
Step 2 – optimize it for the good of all sentient beings.
The problem with this is that step 1 does not generalize well. There are lots of different goals for which step 1 might seem like an appealing first step, so you should expect lots of other people to be trying, and their interests will all be directly opposed to yours. Your methods will be nearly the same as the methods for someone with a different step 2. You'll never get to step 2 of this plan; it's been tried many times before, and failed every time.
Lots of different types of people want more resources. Many of them are very talented. You should be skeptical about your ability to win without some massive advantage. So, what you're left with are your proximate goals. Your impact on the world will be determined by your means, not your ends.
What are your means?
Even though you value others' well-being intrinsically, when pursuing your proximate goals, their agency mostly threatens to muck up your plans. Consequently, it will seem like a bad idea to give them info or leave them resources that they might misuse.
You will want to make their behavior more predictable to you, so you can influence it better. That means telling simplified stories designed to cause good actions, rather than to directly transmit relevant information. Withholding, rather than sharing, information. Message discipline. I wrote about this problem in my post on the humility argument for honesty.
And if the words you say are tools for causing others to take specific actions, then you're corroding their usefulness for literally true descriptions of things far away or too large or small to see. Peter Singer's claim that you can save a life for hundreds of dollars by giving to developing-world charities no longer means that you can save a life for hundreds of dollars by giving to developing-world charities. It simply means that Peter Singer wants to motivate you to give to developing-world charities. I wrote about this problem in my post on bindings and assurances.
More generally, you will try to minimize others' agency. If you believe that other people are moral agents with common values, then e.g. withholding information means that the friendly agents around you are more poorly informed, which is obviously bad, even before taking into account trust considerations! This plan only makes sense if you basically believe that other people are moral patients, but independent, friendly agents do not exist; that you are the only person in the world who can be responsible for anything.
Another specific behavioral consequence is that you'll try to acquire resources even when you have no specific plan for them. For instance, GiveWell's impact page tracks costs they've imposed on others – money moved, and attention in the form of visits to their website – but not independent measures of outcomes improved, or the opportunity cost of people who made a GiveWell-influenced donation. The implication is that people weren't doing much good with their money or time anyway, so it's a "free lunch" to gain control over these.<fn>Their annual metrics report goes into more detail and does track this, and finds that about a quarter of GiveWell-influenced donations were reallocated from other developing-world charities (and another quarter from developed-world charities).</fn> By contrast, the Gates foundation's Valentine's day report to Warren Buffet tracks nothing but developing-world outcomes (but then absurdly takes credit for 100% of the improvement).
As usual, I'm not picking on GiveWell because they're unusually bad – I'm picking on GiveWell because they're unusually open. You should assume that similar but more secretive organizations are worse by default, not better.
This kind of divergent strategy doesn't just directly inflict harms on other agents. It takes resources away from other agents that aren't defending themselves, which forces them into a more adversarial stance. It also earns justified mistrust, which means that if you follow this strategy, you burn cooperative bridges, forcing yourself farther down the adversarial path.
I've written more about the choice between convergent and divergent strategies in my post about the neglectedness consideration.
Simple patches don't undo the harms from adversarial strategies
Since you're benevolent, you have the advantage of a goal in common with many other people. Without abandoning your basic acquisitive strategy, you could try to have a secret handshake among people trying to take over the world for good reasons rather than bad. Ideally, this would let the benevolent people take over the world, cooperating among themselves. But, in practice, any simple shibboleth can be faked; anyone can say they're acquiring power for the greater good.
It's a commonplace in various discussions among Effective Altruists, when someone identifies an individual or organization doing important work, to suggest that we "persuade them to become an EA" or "get an EA in the organization", rather than directly about ways to open up a dialogue and cooperate. This is straightforwardly an attempt to get them to agree to the same shibboleths in order to coordinate on a power-grabbing strategy. And yet, the standard of evidence we're using is mostly "identifies as an EA".
When Gleb Tsipursky tried to extract resources from the Effective Altruism movement with straightforward low-quality mimesis, mouthing the words but not really adding value, and grossly misrepresenting what he was doing and his level of success, it took EAs a long time to notice the pattern of misbehavior. I don't think this is because Gleb is especially clever, or because EAs are especially bad at noticing things. I think this is because EAs identify each other by easy-to-mimic shibboleths rather than meaningful standards of behavior.
Nor is Effective Altruism unique in suffering from this problem. When the Roman empire became too big to govern, gradually emperors hit upon the solution of dividing the empire in two and picking someone to govern the other half. This occasionally worked very well, when the two emperors had a strong preexisting bond, but generally they distrusted each other enough that the two empires behaved like rival states as often as they behaved like allies. Even though both emperors were Romans, and often close relatives!
Using "believe me" as our standard of evidence will not work out well for us. The President of the United States seems to have followed the strategy of saying the thing that's most convenient, whether or not it happens to be true, and won an election based on this. Others can and will use this strategy against us.
We can do better
The above is all a symptom of not including other moral agents in your model of the world. We need a moral theory that takes this into account in its descriptions (rather than having to do a detailed calculation each time), and yet is scope-sensitive and consequentialist the way EAs want to be.
There are two important desiderata for such a theory:
- It needs to take into account the fact that there are other agents who also have moral reasoning. We shouldn't be sad to learn that others reason the way we do.
- Graceful degradation. We can't be so trusting that we can be defrauded by anyone willing to say they're one of us. Our moral theory has to work even if not everyone follows it. It should also degrade gracefully within an individual – you shouldn't have to be perfect to see benefits.
One thing we can do now is stop using wrong moral reasoning to excuse destructive behavior. Until we have a good theory, the answer is we don't know if your clever argument is valid.
On the explicit and systematic level, the divergent force is so dominant in our world that sincere benevolent people simply assume, when they see someone overtly optimizing for an outcome, that this person is optimizing for evil. This leads to perceptive people who don’t like doing harm, like Venkatesh Rao, to explicitly advise others to minimize their measurable impact on the world.
I don't think this impact-minimization is right, but on current margins it's probably a good corrective.
One encouraging thing is that many people using common-sense moral reasoning already behave according to norms that respect and try to cooperate with the moral agency of others. I wrote about this in Humble Charlie.
I've also begun to try to live up to cooperative heuristics even if I don't have all the details worked out, and help my friends do the same. For instance, I'm happy to talk to people making giving decisions, but usually I don't go any farther than connecting them with people they might be interested in, or coaching them through heuristics, because doing more would be harmful, it would destroy information, and I'm not omniscient, otherwise I'd be richer.
A movement like Effective Altruism, explicitly built around overt optimization, can only succeed in the long run at actually doing good with (a) a clear understanding of this problem, (b) a social environment engineered to robustly reject cost-maximization, and (c) an intellectual tradition of optimizing only for actually good things that people can anchor on and learn from.
This was only a summary. I don't expect many people to be persuaded by this alone. I'm going to fill in the details in the future posts. If you want to help me write things that are relevant, you can respond to this (preferably publicly), letting me know:
- What seems clearly true?
- Which parts seem most surprising and in need of justification or explanation?
(Cross-posted at my personal blog.)
Original post: http://bearlamp.com.au/deriving-techniques-on-the-fly/
Last year Lachlan Cannon came back from a CFAR reunion and commented that instead of just having the CFAR skills we need the derivative skills. The skills that say, "I need a technique for this problem" and let you derive a technique, system, strategy, plan, idea for solving the problem on the spot.
By analogy to an old classic,
Give a man a fish and he will eat for a day. Teach a man to fish and he never go hungry again.
This concept always felt off to me until I met Anna. An american who used to live in Alaska where they have enough fish in a river that any time you go fishing you catch a fish, and a big enough one to eat. In contrast, I had been fishing several times when I was little (in Australia) and never caught things, or only caught fish that were too small to feed one person, let alone many people.
Silly fishing misunderstandings aside I think the old classic speaks to something interesting but misses a point. to that effect I want to add something.
Teach a man to derive the skill of fishing when he needs it. and he will never stop growing.
We need to go more meta than that? I am afraid it's turtles all the way down.
To help you derive you need to start by noticing when there is a need. There are two parts to noticing:
- What next
Answer the question, "What's my first possible clue that I'm about to encounter the problem?" If your problem is "I don't respond productively to being confused," then the first sign a crucial moment is coming might be "a fleeting twinge of surprise". Whatever that feels like in real time from the inside of your mind, that's your trigger.
Whenever you notice your trigger, make a precise physical gesture. Snap your fingers, tap your foot, touch your pinky finger with your thumb - whatever feels comfortable. Do it every time you notice that fleeting twinge of surprise.
I guess. I remember or imagine a few specific instances of encountering weak contrary evidence (such as when I thought my friend wasn't attracted to me, but when I made eye contact with him across the room at a party he smiled widely). On the basis of those simulations, I make a prediction about what it will feel like, in terms of immediate subjective experience, to encounter weak contrary evidence in the future. The prediction is a tentative trigger. For me, this would be "I feel a sort of matching up with one of my beliefs, there's a bit of dissonance, a tiny bit of fear, and maybe a small impulse to direct my attention away from these sensations and away from thoughts about the observation causing all of this".
I test my guess. I keep a search going on in the background for anything in the neighborhood of the experience I predicted. Odds are good I'll miss several instances of weak contrary evidence, but as soon as I realize I've encountered one, I go into reflective attention so I'm aware of as many details of my immediate subjective experience as possible. I pay attention to what's going on in my mind right now, and also what's still looping in my very short-term memory of a few moments before I noticed. Then I compare those results to my prediction, noting anything I got wrong, and I feed that information into a new prediction for next time. (I might have gotten something wrong that caused the trigger to go off at the wrong time, which probably means I need to narrow my prediction.) The new prediction is the new trigger.
I repeat the test until my trigger seems to be accurate and precise. Now I've got a good trigger to match a good action.
Derivations (as above) are a "what next" action.
My derivations come from asking myself that question or other similar questions, then attempting to answer them:
- What should I do next?
- How do I solve this problem?
- Why don't other people have this problem?
- Can I make this problem go away?
- How do I design a system to make this not matter any more?
(you may notice this is stimulating introspection - this is what it is)
The post that led me to post on derivations is this post on How to present a problem hopefully to be published tomorrow.
This post took ~1 hour to write.
Cross posted to lesswrong
Follow Up to: Dominance, care, and social touch
One thing Ben said in his latest post especially resonated with me, and I wanted to offer some expanded thoughts on it:
Sometimes, when I feel let down because someone close to me dropped the ball on something important, they try to make amends by submitting to me. This would be a good appeasement strategy if I mainly felt bad because I wanted them to assign me a higher social rank. But, the thing I want is actually the existence of another agent in the world who is independently looking out for my interests. So when they respond by submitting, trying to look small and incompetent, I perceive them as shirking. My natural response to this kind of shirking is anger - but people who are already trying to appease me by submitting tend to double down on submission if they notice I'm upset at them - which just compounds the problem!
My main strategy for fixing this has been to avoid leaning on this sort of person for anything important. I've been experimenting with instead explicitly telling them I don't want submission and asking them to take more responsibility, and this occasionally works a bit, but it's slow and frustrating and I'm not sure it's worth the effort.
This resonated on multiple levels.
There is the basic problem of someone dropping the ball, and offering submission rather than fixing the problem on some level. As someone who tried to run a company, this is especially maddening. I do not want you to show your submission, I want you to tell me how you are going to fix what went wrong, and avoid making the same mistake again! I want you to tell me how you have learned from this experience. That makes everyone perform better. I also want to see you take responsibility. These are all highly useful, whereas submission usually is not. However, you have to hammer this, over and over again, for not only some but most people - too many people to never rely on such folks.
Different people have different reactions they want to see when someone lets them down or makes a mistake. I have one set of reactions I use at work, one set I use at home, another I use with other rationalists, and so on, and for people I know well, I customize further.
The bigger problem, also described here, is the anger feedback loop, which the main thing I want to talk about. Ben gives an example of it:
A: Sorry I let you down, I suck. And other submissive things.
Ben (gets angry): Why are you doing that? I don't want that reaction!
A (seeing Ben is mad): Oh, I made you mad! So sorry I let you down, I suck. And other even more submissive things than before.
Ben (get angrier): Aaaarrgggh!
...and so on, usually until A also gets angry at Ben (in my experience), and a real fight ensues that often eclipses by far the original problem. This is Ben's particular form of this, but more common to my experience is this, the most basic case:
A: You screwed up!
B: You're angry at me! How dare you get angry at me? I'm angry!
A: How dare you get angry at me for being angry? I'm even angrier!
B: How dare you get angry at me for being angry at your being angry? Oh boy am I angry!
When things go down this path, something very minor can turn into a huge fight. Whether or not you signed up for it, you're in a dominance contest. One or both participants has to make a choice to not be angry, or at least to act as if they are not angry. Sometimes this will be after a large number of iterations, which will make this task very difficult, and it plays like a game of chicken: One person credibly commits to being completely incapable of diffusing the situation before it results in destruction of property, so the other now has no choice but to appear to put themselves in the required emotional state, at a time when they feel themselves beyond justified, which usually involves saying things like "I'm not angry" a lot when that claim is not exactly credible. Having to do all this really sucks.
The only real alternative I know about is to physically leave, and wait for things to calm down.
Then there are the even worse variations, where the original sin that you are fighting over is failure to be in the proper emotional state. In these cases, not only is submission demanded, but voluntary, happy, you-are-right style submission. You can end up with this a lot:
A: I demand X!
B: OK, fine, X.
A: How dare you not be happy about this?
B: I'm happy about it.
A: No you're not! You're pretending to be happy about it! How dare you!
B: No, really, I am! I am blameworthy for many things, but for this I am not blameworthy, I have the emotional response you demand oh mighty demander!
A: I don't believe you.
And so on - and it can go on quite a while. With begging and pleading. B was my father. A lot. It is painful even to listen to. It was painful to even write this.
So essentially, and I have been in situations like this including at various jobs, you end up on constant emotional notice. You must, at all times, represent the right response to everything that is happening. So you try hard to do this at all times, and perhaps often this is helpful, because people acting cheerful can make things better. But what happens the moment this facade starts to break down? Too many things push your buttons in a row? This happens at exactly the moment when it has become too expensive to keep this up. Then they detect it.
They tell you this is bad. You must be happy about this; you have no right to be upset! And of course, now you're also mad about them telling you what you have no right to be mad about... and the cycle begins. Cause your job just got a lot harder, and if you slip again, it's going to get really ugly.
Even when reasonably big things are at stake and there is actual disagreement, this is where most of the real ugliness seems to come from - one party decides the emotional response of the other party is illegitimate and their reaction to this reinforces the reaction.
This is something we need to be super vigilant about not doing.
Within reason, and somewhat beyond it, people who want to be upset need to be allowed to be upset. As long as they can do it quietly they need to be allowed to be angry. If the person is being disruptive and actively wrecking things, that is something else, but if someone decides to let the wookie win, and you are the wookie, you need to let them let the wookie win. The argument really is over. If you've got what you want on the substance, that has to be good enough.
They also need to be allowed to be submissive. People instinctively are going into this mode in order to avoid these fights and dominance contests. Yes, it's not the most productive thing they could be doing right now. You can explain to them later in a different conversation that this isn't necessary with you. Eventually they might even believe it. For now, let them have this. If you do not, what is likely to happen is, as Ben observes, they interpret your being upset with them as them not being submissive enough. That is a reasonable guess, and more often then not they will be right.
Rising above this is, of course, even better. Here's something along those lines that happened to me recently.
For a while I had been busy, and therefore mostly out of rationalist circles. I had been spending a lot of time in other good (if not quite as good) epistemic circles, and I'd learned the habit, when someone calls you out on having screwed up, of acknowledging I had screwed up, apologizing, fixing it to the extent that was still relevant, and assuring that I knew how to not have it happen again. If everyone in the world started doing that, I would take that reaction in a second, and life would be a lot better.
It's not as good as understanding on a deep level exactly why you made the mistake in the first place. So the other person got frustrated, expecting better and holding me to a higher standard, and I was then called out on my reaction to being called out, because the other person respected me enough to do that: I don't want your apology, I want you to figure out why you did that and I think you can do it. I then caught myself doing the same submission thing a second time, which resulted in me realizing what was wrong in a much more important sense than the original error. As a result, instead of simply putting a band-aid over the local issue, I got a moment that stuck with me.
We should all strive for such a standard - from both sides.
[Cross-posted at my personal blog]
This post is for all the people who have been following Arbital's progress since 2015 via whispers, rumors, and clairvoyant divination. That is to say: we didn't do a very good job of communicating on our part. I hope this posts corrects some of that.
The top question on your mind is probably: "Man, I was promised that Arbital will solve X! Why hasn't it solved X already?" Where X could be intuitive explanations, online debate, all LessWrong problems, AGI, or just cancer. Well, we did try to solve the first two and it didn't work. Math explanations didn't work because we couldn't find enough people who would spend the time to write good math explanations. (That said, we did end up with some decent posts on abstract algebra. Thank you to everyone who contributed!) Debates didn't work because... well, it's a very complicated problem. There was also some disagreement within the team about the best approach, and we ended up moving too slowly.
So what now?
You are welcome to use Arbital in its current version. It's mostly stable, though a little slow sometimes. It has a few features some might find very helpful for their type of content. Eliezer is still writing AI Alignment content on it, and he heavily relies on the specific Arbital features, so it's pretty certain that the platform is not going away. In fact, if the venture fails completely, it's likely MIRI will adopt Arbital for their personal use.
I'm starting work on Arbital 2.0. It's going to be a (micro-)blogging platform. (If you are a serious blogger / Tumblr user, let me know; I'd love to ask you some questions!) I'm not trying to solve online debates, build LW 2.0, or cure cancer. It's just going to be a damn good blogging platform. If it goes well, then at some point I'd love to revisit the Arbital dream.
I'm happy to answer any and all questions in the comments.
Some of the ensuing responses discussed the fidelity with which such a simulation would need to be run, in order to keep the population living within it guessing as to whether they were in a digital simulation, which is a topic that's been discussed before on LessWrong:
If a simulation can be not just run, but also loaded from previous saved states then edited, it should be possible for the simulation's Architect to start it running with low granularity, wait for some inhabitant to notice an anomaly, then rewind a little, use a more accurate but computing intensive algorithm in the relevant parts of the inhabitant's timecone and edit the saved state to include that additional detail, before setting the simulation running again and waiting for the next anomaly.
construct a system with easy-to-verify but arbitrarily-hard-to-compute behavior ("Project: Piss Off God"), and then scrupulously observe its behavior. Then we could keep making it more expensive until we got to a system that really shouldn't be practically computable in our universe.
but I'm wondering how easy that would be.
The problem would need to be physical (for example, make a net with labelled strands of differing lengths joining the nodes, then hang it from one corner), else humanity would have to be doing as much work as the simulation.
The solution should be discrete (for example, what are the labels on the strands making up the limiting path that prevents the lowest point from hanging further down)
The solution should be not just analytic, but also difficult to get via numerical analysis.
The problem should be scalable to very large sizes (so, for example, the net problem wouldn't work, because with large size nets making the strands sufficiently different in length that you could tell two close solutions apart would be a limiting factor)
And, ideally, the problem would be one that occurs (and is solved) naturally, such that humanity could just record data in multiple locations over a period of years, then later decide which examples of the problem to verify. (See this paper by Scott Aaronson: "NP-complete Problems and Physical Reality")
[Epistemic Status: I suspect that this is at least partially wrong. But I don’t know why yet, and so I figured I’d write it up and let people tell me. First post on Less Wrong, for what that’s worth.]
First thesis: IQ is more akin to a composite measure of performance such as the decathlon than it is to a single characteristic such as height or speed.
Second thesis: When looking at extraordinary performance in any specific field, IQ will usually be highly correlated with success, but it will not fully explain or predict top-end performance, because extraordinary performance in a specific field is a result of extraordinary talent in a sub-category of intelligence (or even a sub-category of a sub-category), rather than truly top-end achievement in the composite metric.
Before we go too far, here are some of the things I’m not arguing:
- IQ is largely immutable (though perhaps not totally immutable).
- IQ is a heritable, polygenic trait.
- IQ is highly correlated with a variety of achievement measures, including academic performance, longevity, wealth, happiness, and health.
- That parenting and schooling matter far less than IQ in predicting performance.
- That IQ matters more than “grit” and “mindset” when explaining performance.
- Most extraordinary performers, from billionaire tech founders to chess prodigies, to writers and artists and musicians, will possess well-above-average IQ.
Here is one area why I’m certain I’m in the minority:
- I believe that Spearman’s G is a reification. At least one smart person has also expressed this opinion, but most experts disagree with him (this ties in with the First Thesis).
Here is the issue where I’m not sure if my opinion is controversial, and thus why I’m writing to get feedback:
- While IQ is almost certainly highly correlated with high-end performance, IQ fails a metric to explain or, more importantly, to predict top-end individual performance (the Second Thesis).
Why IQ Isn’t Like Height
Height is a single, measurable characteristic. Speed over any distance is a single, measurable characteristic. Ability to bench-press is a single, measurable characteristic.
But intelligence is more like the concept of athleticism than it is the concept of height, speed, or the ability to bench-press.
Here is an excerpt from the Slate Star Codex article Talents part 2, Attitude vs. Altitude:
The average eminent theoretical physicist has an IQ of 150-160. The average NBA player has a height of 6’ 7”. Both of these are a little over three standard deviations above their respective mean. Since z-scores are magic and let us compare unlike domains, we conclude that eminent theoretical physicists are about as smart as pro basketball players are tall.
Any time people talk about intelligence, height is a natural sanity check. It’s another strongly heritable polygenic trait which is nevertheless susceptible to environmental influences, and which varies in a normal distribution across the population – but which has yet to accrete the same kind of cloud of confusion around it that IQ has.
All of this is certainly true. But here’s what I’d like to discuss more in depth:
Height is a trait that can be measured in a single stroke. IQ has to be measured by multiple sub-tests.
IQ measures the following sub-components of intelligence:
- Verbal Intelligence
- Mathematical Ability
- Spatial Reasoning Skills
- Visual/Perceptual Skills
- Classification Skills
- Logical Reasoning Skills
- Pattern Recognition Skills
Even though both height and intelligence are polygenic traits, there is a category difference between two.
That’s why I think that athleticism is a better polygenic-trait-comparator to intelligence than height. Obviously, people are born with different degrees of athletic talent. Athleticism can be affected by environmental factors (nutrition, lack of access to athletic facilities, etc.). Athleticism, like intelligence, because it is composed of different sub-variables (speed, agility, coordination – verbal intelligence, mathematical intelligence, spatial reasoning skills), can be measured in a variety of ways. You could measure athleticism with an athlete’s performance in the decathlon, or you could measure it with a series of other tests. Those results would be highly correlated, but not identical. And those results would probably be highly correlated with lots of seemingly unrelated but important physical outcomes.
Measure intelligence with an LSAT vs. IQ test vs. GRE vs. SAT vs. ACT vs. an IQ test from 1900 vs. 1950 vs. 2000 vs. the blink test, and the results will be highly correlated, but again, not identical.
Whether you measure height in centimeters or feet, however, the ranking of the people you measure will be identical no matter how you measure it.
To me, that distinction matters.
I think this athleticism/height distinction explains part (but not all) of the “cloud” surrounding IQ.
Athletic Quotient (“AQ”)
Play along with me for a minute.
Imagine we created a single, composite metric to measure overall athletic ability. Let’s call it AQ, or Athletic Quotient. We could measure AQ just as we measure IQ, with 100 as the median score, and with two standard deviations above at 130 and four standard deviations above at 160.
For the sake of simplicity, let’s measure athletes’ athletic ability with the decathlon. This event is an imperfect test of speed, strength, jumping ability, and endurance.
An Olympic-caliber decathlete could compete at a near-professional level in most sports. But the best decathletes aren’t the people whom we think of when we think of the best athletes in the world. When we think of great athletes, we think of the top performers in one individual discipline, rather than the composite.
When people think of the best athlete in the world, they think of Leo Messi or Lebron James, not Ashton Eaton.
IQ and Genius
Here’s where my ideas might start to get controversial.
I don’t think most of the people we consider geniuses necessarily had otherworldly IQs. People with 200-plus IQs are like Olympic decathletes. They’re amazingly intelligent people who can thrive in any intellectual environment. They’re intellectual heavyweights without specific weaknesses. But those aren’t necessarily the superstars of the intellectual world. The Einsteins, Mozarts, Picassos, or the Magnus Carlsens of the world – they’re great because of domain-specific talent, rather than general intelligence.
Phlogiston and Albert Einstein’s IQ
Check out this article.
The article declares, without evidence, that Einstein had an IQ of 205-225.
The thinking seems to go like this: Most eminent physicists have IQs of around 150-160. Albert Einstein created a paradigm shift in physics (or perhaps multiple such shifts). So he must have had an IQ around 205-225. We’ll just go ahead and retroactively apply that IQ to this man who’s been dead for 65 years and that’ll be great for supporting the idea that IQ and high-end field-specific performance are perfectly correlated.
As an explanation of intelligence, that’s no more helpful than phlogiston in chemistry.
But here’s the thing: It’s easy to ascribe super-high IQs retroactively to highly accomplished dead people, but I have never heard of IQ predicting an individual’s world-best achievement in a specific field. I have never read an article that says, “this kid has an IQ of 220; he’s nearly certain to create a paradigm-shift in physics in 20 years.” There are no Nate Silvers predicting individual achievement based on IQ. IQ does not predict Nobel Prize winners or Fields Medal winners or the next chess #1. A kid with a 220 IQ may get a Ph.D. at age 17 from CalTech, but that doesn’t mean he’s going to be the next Einstein.
Einstein was Einstein because he was an outsider. Because he was intransigent. Because he was creative. Because he was an iconoclast. Because he had the ability to focus. But there is no evidence that he had an IQ over 200. But according to the Isaacson biography at least, there were other pre-eminent physicists who were stronger at math than he was. Of course he was super smart. But there's no evidence he had a super-high IQ (as in, above 200).
We’ve been using IQ as a measure of intelligence for over 100 years and it has never predicted an Einstein, a Musk, or a Carlsen. Who is the best counter-example to this argument? Terence Tao? Without obvious exception, those who have been recognized for early-age IQ are still better known for their achievements as prodigies than their achievements as adults.
Is it unfair to expect that predictive capacity from IQ? Early-age prediction of world-class achievement does happen. Barcelona went and scooped up Leo Messi from the hinterlands of Argentina at age 12 and he went and became Leo Messi. Lebron James was on the cover of Sports Illustrated when he was in high school.
In some fields, predicting world-best performance happens at an early age. But IQ – whatever its other merits – does not seem to serve as an effective mechanism for predicting world-best performance in specific individualized activities.
Magnus Carlsen’s IQ
When I type in Magnus Carlsen’s name into Google, the first thing that autofills (after chess) is “Magnus Carlsen IQ.”
People seem to want to believe that his IQ score can explain why he is the Mozart of chess.
We don’t know what his IQ is, but the instinct people have to try to explain his performance in terms of IQ feels very similar to people’s desire to ascribe an IQ of 225 to Einstein. It’s phlogiston.
Magnus Carlsen probably has a very high IQ. He obviously has well above-average intelligence. Maybe his IQ is 130, 150, or 170 (there's a website called ScoopWhoop that claims, without citation, that it's 190). But however high his IQ, doubtless there are many or at least a few chess players in the world who have higher IQs than he has. But he’s the #1 chess player in the world – not his competitors with higher IQs. And I don’t think the explanation for why he’s so great is his “mindset” or “grit” or anything like that.
It’s because IQ is akin to an intellectual decathlon, whereas chess is a single-event competition. If we dug deep into the sub-components of Carlsen’s IQ (or perhaps the sub-components of the sub-components), we’d probably find some sub-component where he measured off the charts. I’m not saying there’s a “chess gene,” but I suspect that there is a trait that could be measured as a sub-component of intelligence that that is more specific than IQ that would be a greater explanatory variable of his abilities than raw IQ.
Leo Messi isn’t the greatest soccer player in the world because he’s the best overall athlete in the world. He’s the best soccer player in the world because of his agility and quickness in incredibly tight spaces. Because of his amazing coordination in his lower extremities. Because of his ability to change direction with the ball before defenders have time to react. These are all natural talents. But they are only particularly valuable because of the arbitrary constraints in soccer.
Leo Messi is a great natural athlete. If we had a measure of AQ, he’d probably be in the 98th or 99th percentile. But that doesn’t begin to explain his otherworldly soccer-playing talents. He probably could have been a passable high-school point guard at a school of 1000 students. He would have been a well-above-average decathlete (though I doubt he could throw the shot put worth a damn).
But it’s the unique athletic gifts that are particularly well suited to soccer that enabled him to be the best in the world at soccer. So, too, with Magnus Carlsen with chess, Elon Musk with entrepreneurialism, and Albert Einstein with paradigm-shifting physics.
The decathlon won’t predict the next Leo Messi or the next Lebron James. And IQ won’t predict the next Magnus Carlsen, Elon Musk, Picasso, Mozart, or Albert Einstein.
And so we shouldn’t seek it out as an after-the-fact explanation for their success, either.
 Of course, high performance in some fields is probably more closely correlated with IQ than others: physics professor > english professor > tech founder > lawyer > actor > bassist in grunge band. [Note: this footnote is total unfounded speculation]
 The other part is that people don’t like to be defined by traits that they feel they cannot change or improve.
 Let me know if I am missing any famous examples here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
I need help getting out of a logical trap I've found myself in after reading The Age of Em.
Some statements needed to set the trap:
If mind-uploading is possible, then a mind can theoretically exist for an arbitrary length of time.
If a mind is contained in software, it can be copied, and therefore can be stolen.
An uploaded mind can retain human attributes indefinitely.
Some subset of humans are sadistic jerks, many of these humans have temporal power.
All humans, under certain circumstances, can behave like sadistic jerks.
Human power relationships will not simply disappear with the advent of mind uploading.
Some minor negative implications:
Torture becomes embarrassingly parallel.
US states with the death penalty may adopt death plus simulation as a penalty for some offenses.
Over a long enough timeline, the probability of a copy of any given uploaded mind falling into the power of a sadistic jerk approaches unity. Once an uploaded mind has fallen under the power of a sadistic jerk, there is no guarantee that it will ever be 'free', and the quantity of experienced sufferring could be arbitrarily large, due in part to the embarrassingly parallel nature of torture enabled by running multiple copies of a captive mind.
Therefore! If you believe that mind uploading will become possible in a given individual's lifetime, the most ethical thing you can do from the utilitarian standpoint of minimizing aggregate suffering, is to ensure that the person's mind is securely deleted before it can be uploaded.
Imagine the heroism of a soldier, who faced with capture by an enemy capable of uploading minds and willing to parallelize torture spends his time ensuring that his buddies' brains are unrecoverable at the cost of his own capture.
I believe that mind uploading will become possible in my lifetime, please convince me that running through the streets with a blender screaming for brains is not an example of effective altruism.
On a more serious note, can anyone else think of examples of really terrible human decisions that would be incentivised by the development of AGI or mind uploading? This problem appears related to AI safety.
Hello guys, I am currently writing my master's thesis on biases in the investment context. One sub-sample that I am studying is people who are educated about biases in a general context, but not in the investment context. I guess LW is the right place to find some of those so I would be very happy if some of you would participate, especially since people who are aware about biases are hard to come by elsewhere. Also I explicitly ask for activity in the LW community in the survey, so if enough of LWers participate I could analyse them as an individual sub-sample. It would be interesting to know how LWers perform compared to psychology students for example.The link to the survey is: https://survey.deadcrab.de/
It’s only been recently that I’ve been thinking about epistemics in the context of figuring out my behavior and debiasing. Aside from trying to figure out how I actually behave (as opposed to what I merely profess I believe), I’ve been thinking about how to confront uncertainty—and what it feels like.
For many areas of life, I think we shy away from confronting uncertainty and instead flee into the comforting non-falsifiability of vagueness.
Consider these examples:
1) You want to get things done today. You know that writing things down can help you finish more things. However, it feels aversive to write down what you specifically want to do. So instead, you don’t write things down and instead just keep a hazy notion of “I will do things today”.
2) You try to make a confidence interval for a prediction where money is on the line. You notice yourself feeling uncomfortable, no matter what your bounds are; it feels bad to set down any number at all, which is accompanied by a dread feeling of finality.
3) You’re trying to find solutions to a complex, entangled problem. Coming up with specific solutions feels bad because none of them seem to completely solve the problem. So instead you decide to create a meta-framework that produces solutions, or argue in favor of some abstract process like a “democratized system that focuses on holistic workarounds”.
In each of the above examples, it feels like we move away from making specific claims because that opens us up to specific criticism. But instead of trying to improve the strengths of specific claims, we retreat to fuzzily-defined notions that allow us to incorporate any criticism without having to really update.
I think there’s a sense in which, in some areas of life, we’re embracing shoddy epistemology (e.g. not wanting to validate or falsify our beliefs) because of a fear of wanting to fail / put in the effort to update. I think this failure is what fuels this feeling of aversion.
It seems useful to face this feeling of badness or aversion with the understanding that this is what confronting uncertainty feels like. The best action doesn’t always feel comfortable and easy; it can just as easily feel aversive and final.
Look for situations where you might be flinching away from making specific claims and replacing them with vacuous claims that support all evidence you might see.
If you never put your beliefs to the test with specific claims, then you can never verify them in the real world. And if your beliefs don’t map well onto the real world, they don’t seem very useful to even have in the first place.
Not really sure where else I might post this, but there seems to be a UI issue on the site. When I hit the homepage of lesswrong.com while logged in I no longer see the user sidebar or the header links for Main and Discussion. This is kind of annoying because I have to click into an article first to get to a page where I can access those things. Would be nice to have them back on the front page.
About a year ago, I made a setting of the Litany of Tarski for four-part a cappella (i.e. unaccompanied) chorus.
More recently, in the process of experimenting with MuseScore for potential use in explaining musical matters on the internet (it makes online sharing of playback-able scores very easy), the thought occurred to me that perhaps the Tarski piece might be of interest to some LW readers (if no one else!), so I went ahead and re-typeset it in MuseScore for your delectation.
Here it is (properly notated :-)).
Here it is (alternate version designed to avoid freaking out those who aren't quite the fanatical enthusiasts of musical notation that I am).
Home appliances, such as washing machines, are apparently much less durable now than they were decades ago. [ETA: Thanks to commenters for providing lots of reasons to doubt this claim (especially from here and here).]
Perhaps this is a kind of mirror image of "cost disease". In many sectors (education, medicine), we pay much more now for a product that is no better than what we got decades ago at a far lower cost, even accounting for inflation. It takes more money to buy the same level of quality. Scott Alexander (Yvain) argues that the cause of cost disease is a mystery. There are several plausible accounts, but they don't cover all the cases in a satisfying way. (See the link for more on the mystery of cost disease.)
Now, what if the mysterious cause of cost disease were to set to work in a sector where price can't go up, for whatever reason? Then you would expect quality to take a nosedive. If price per unit quality goes up, but total price can't go up, then quality must go down. So maybe the mystery of crappy appliances is just cost disease in another guise.
In the spirit of inadequate accounts of cost disease, I offer this inadequate account of crappy appliances:
As things get better globally, they get worse locally.
Global goodness provides a buffer against local badness. This makes greater local badness tolerable. That is, the cheapest tolerable thing gets worse. Thus, worse and worse things dominate locally as things get better globally.
This principle applies in at least two ways to washing machines:
Greater global wealth: Consumers have more money, so they can afford to replace washing machines more frequently. Thus, manufacturers can sell machines that require frequent replacement.
Manufacturers couldn't get away with this if people were poorer and could buy only one machine every few decades. If you're poor, you prioritize durability more. In the aggregate, the market will reward durability more. But a rich market accepts less durability.
Better materials science: Globally, materials science has improved. Hence, at the local level, manufacturers can get away with making worse materials.
Rich people might tolerate a washer that lasts 3 years, give or take. But even they don't want a washer that breaks in one month. If you build washers, you need to be sure that nearly every single one lasts a full month, at least. But, with poor materials science, you have to overshoot by a lot to ensure of that. Maybe you have to aim for a mean duration of decades to guarantee that the minimum duration doesn't fall below one month. On the other hand, with better materials science, you can get the distribution of duration to cluster tightly around 3 years. You still have very few washers lasting only one month, but the vast majority of your washers are far less durable than they used to be.
Maybe this is just Nassim Taleb's notion of antifragility. I haven't read the book, but I gather that the idea is that individuals grow stronger in environments that contain more stressors (within limits). Conversely, if you take away the stressors (i.e., make the environment globally better), then you get more fragile individuals (i.e., things are locally worse).
In this post, I'll argue that Joyce's equilibrium CDT (eCDT) can be made into FDT (functional decision theory) with the addition of an intermediate step - a step that should have no causal consequences. This would show that eCDT is unstable under causally irrelevant changes, and is in fact a partial version of FDT.
Joyce's principle is:
Full Information. You should act on your time-t utility assessments only if those assessments are based on beliefs that incorporate all the evidence that is both freely available to you at t and relevant to the question about what your acts are likely to cause.
When confronted by a problem with a predictor (such as Death in Damascus or the Newcomb problem), this allows eCDT to recursively update their probabilities of the behaviour of the predictor, based on their own estimates of their own actions, until this process reaches equilibrium. This allows it to behave like FDT/UDT/TDT on some (but not all) problems. I'll argue that you can modify the setup to make eCDT into a full FDT.
Death in Damascus
In this problem, Death has predicted whether the agent will stay in Damascus (S) tomorrow, or flee to Aleppo (F). And Death has promised to be in the same city as the agent (D or A), to kill them. Having made its prediction, Death then travels to that city to wait for the agent. Death is known to be a perfect predictor, and the agent values survival at $1000, while fleeing costs $1.
Then eCDT fleeing to Aleppo with probability 999/2000. To check this, let x be the probability of fleeing to Aleppo (F), and y the probability of Death being there (A). The expected utility is then
- 1000(x(1-y)+(1-x)y)-x (1)
Differentiating this with respect to x gives 999-2000y, which is zero for y=999/2000. Since Death is a perfect predictor, y=x and eCDT's expected utility is 499.5.
The true expected utility, however, is -999/2000, since Death will get the agent anyway, and the only cost is the trip to Aleppo.
The eCDT decision process seems rather peculiar. It seems to allow updating of the value of y dependent on the value of x - hence allow acausal factors to be considered - but only in a narrow way. Specifically, it requires that the probability of F and A be equal, but that those two events remain independent. And it then differentiates utility according to the probability of F only, leaving that of A fixed. So, in a sense, x correlates with y, but small changes in x don't correlate with small changes in y.
That's somewhat unsatisfactory, so consider the problem now with an extra step. The eCDT agent no longer considers whether to stay or flee; instead, it outputs X, a value between 0 and 1. There is a uniform random process Z, also valued between 0 and 1. If Z<X, then the agent flees to Aleppo; if not, it stays in Damascus.
This seems identical to the original setup, for the agent. Instead of outputting a decision as to whether to flee or stay, it outputs the probability of fleeing. This has moved the randomness in the agent's decision from inside the agent to outside it, but this shouldn't make any causal difference, because the agent knows the distribution of Z.
Death remains a perfect predictor, which means that it can predict X and Z, and will move to Aleppo if and only if Z<X.
Now let the eCDT agent consider outputting X=x for some x. In that case, it updates its opinion of Death's behaviour, expecting that Death will be in Aleppo if and only if Z<x. Then it can calculate the expected utility of setting X=x, which is simply 0 (Death will always find the agent) minus x (the expected cost of fleeing to Aleppo), hence -x. Among the "pure" strategies, X=0 is clearly the best.
Now let's consider mixed strategies, where the eCDT agent can consider a distribution PX over values of X (this is a sort of second order randomness, since X and Z already give randomness over the decision to move to Aleppo). If we wanted the agent to remain consistent with the previous version, the agent then models Death as sampling from PX, independently of the agent. The probability of fleeing is just the expectation of PX; but the higher the variance of PX, the harder it is for Death to predict where the agent will go. The best option is as before: PX will set X=0 with probability 1001/2000, and X=1 with probability 999/2000.
But is this a fair way of estimating mixed strategies?
Average Death in Aleppo
Consider a weaker form of Death, Average Death. Average Death cannot predict X, but can predict PX, and will use that to determine its location, sampling independently from it. Then, from eCDT's perspective, the mixed-strategy behaviour described above is the correct way of dealing with Average Death.
But that means that the agent above is incapable of distinguishing between Death and Average Death. Joyce argues strongly for considering all the relevant information, and the distinction between Death and Average Death is relevant. Thus it seems when considering mixed strategies, the eCDT agent must instead look at the pure strategies, compute their value (-x in this case) and then look at the distribution over them.
One might object that this is no longer causal, but the whole equilibrium approach undermines the strictly causal aspect anyway. It feels daft to be allowed to update on Average Death predicting PX, but not on Death predicting X. Especially since moving from PX to X is simply some random process Z' that samples from the distribution PX. So Death is allowed to predict PX (which depends on the agent's reasoning) but not Z'. It's worse than that, in fact: Death can predict PX and Z', and the agent can know this, but the agent isn't allowed to make use of this knowledge.
Given all that, it seems that in this situation, the eCDT agent must be able to compute the mixed strategies correctly and realise (like FDT) that staying in Damascus (X=0 with certainty) is the right decision.
Let's recurse again, like we did last summer
This deals with Death, but not with Average Death. Ironically, the "X=0 with probability 1001/2000..." solution is not the correct solution for Average Death. To get that, we need to take equation (1), set x=y first, and then differentiate with respect to x. This gives x=1999/4000, so setting "X=0 with probability 2001/4000 and X=1 with probability 1999/4000" is actually the FDT solution for Average Death.
And we can make the eCDT agent reach that. Simply recurse to the next level, and have the agent choose PX directly, via a distribution PPX over possible PX.
But these towers of recursion are clunky and unnecessary. It's simpler to state that eCDT is unstable under recursion, and that it's a partial version of FDT.
You should always cooperate with an identical copy of yourself in the prisoner's dilemma. This is obvious, because you and the copy will reach the same decision.
That justification implicitly assumes that you and your copy as (somewhat) antagonistic: that you have opposite aims. But the conclusion doesn't require that at all. Suppose that you and your copy were instead trying to ensure that one of you got maximal reward (it doesn't matter which). Then you should still jointly cooperate because (C,C) is possible, while (C,D) and (D,C) are not (I'm ignoring randomising strategies for the moment).
Now look at the Newcomb problem. You decision enters twice: once when you decide how many boxes to take, and once when Omega is simulating or estimating you to decide how much money to put in box B. You would dearly like your two "copies" (one of which may just be an estimate) to be out of sync - for the estimate to 1-box while the real you two-boxes. But without any way of distinguishing between the two, you're stuck with taking the same action - (1-box,1-box). Or, seeing it another way, (C,C).
This also makes the Newcomb problem into an anti-coordination game, where you and your copy/estimate try to pick different options. But, since this is not possible, you have to stick to the diagonal. This is why the Newcomb problem can be seen both as an anti-coordination game and a prisoners' dilemma - the differences only occur in the off-diagonal terms that can't be reached.
Note: This post is in error, I've put up a corrected version of it here. I'm leaving the text in place, as historical record. The source of the error is that I set Pa(S)=Pe(D) and then differentiated with respect to Pa(S), while I should have differentiated first and then set the two values to be the same.
Nate Soares and Ben Levinstein have a new paper out on "Functional Decision theory", the most recent development of UDT and TDT.
It's good. Go read it.
This post is about further analysing the "Death in Damascus" problem, and to show that Joyce's "equilibrium" version of CDT (causal decision theory) is in a certain sense intermediate between CDT and FDT. If eCDT is this equilibrium theory, then it can deal with a certain class of predictors, which I'll call distribution predictors.
Death in Damascus
In the original Death in Damascus problem, Death is a perfect predictor. It finds you in Damascus, and says that it's already planned it's trip for tomorrow - and it'll be in the same place you will be.
You value surviving at $1000, and can flee to Aleppo for $1.
Classical CDT will put some prior P over Death being in Damascus (D) or Aleppo (A) tomorrow. And then, if P(A)>999/2000, you should stay (S) in Damascus, while if P(A)<999/2000, you should flee (F) to Aleppo.
FDT estimates that Death will be wherever you will, and thus there's no point in F, as that will just cost you $1 for no reason.
But it's interesting what eCDT produces. This decision theory requires that Pe (the equilibrium probability of A and D) be consistent with the action distribution that eCDT computes. Let Pa(S) be the action probability of S. Since Death knows what you will do, Pa(S)=Pe(D).
The expected utility is 1000.Pa(S)Pe(A)+1000.Pa(F)Pe(D)-Pa(F). At equilibrium, this is 2000.Pe(A)(1-Pe(A))-Pe(A). And that quantity is maximised when Pe(A)=1999/4000 (and thus the probability of you fleeing is also 1999/4000).
This is still the wrong decision, as paying the extra $1 is pointless, even if it's not a certainty to do so.
So far, nothing interesting: both CDT and eCDT fail. But consider the next example, on which eCDT does not fail.
Statistical Death in Damascus
Let's assume now that Death has an assistant, Statistical Death, that is not a prefect predictor, but is a perfect distribution predictor. It can predict the distribution of your actions, but not your actual decision. Essentially, you have access to a source of true randomness that it cannot predict.
It informs you that its probability over whether to be in Damascus or Aleppo will follow exactly the same distribution as yours.
Classical CDT follows the same reasoning as before. As does eCDT, since Pa(S)=Pe(D), as before, since Statistical Death follows the same distribution as you do.
But what about FDT? Well, note that FDT will reach the same conclusion as eCDT. This is because 1000.Pa(S)Pe(A)+1000.Pa(F)Pe(D)-Pa(F) is the correct expected utility, the Pa(S)=Pe(D) assumption is correct for Statistical Death, and (S,F) is independent of (A,D) once the action probabilities have been fixed.
So on the Statistical Death problem, eCDT and FDT say the same thing.
Factored joint distribution versus full joint distributions
What's happening is that there is a joint distribution over (S,F) (your actions) and (D,A) (Death's actions). FDT is capable of reasoning over all types of joint distributions, and fully assessing how its choice of Pa acausally affects Death's choice of Pe.
But eCDT is only capable of reasoning over ones where the joint distribution factors into a distribution over (S,F) times a distribution over (D,A). Within the confines of that limitation, it is capable of (acausally) changing Pe via its choice of Pa.
Death in Damascus does not factor into two distributions, so eCDT fails on it. Statistical Death in Damascus does so factor, so eCDT succeeds on it. Thus eCDT seems to be best conceived of as a version of FDT that is strangely limited in terms of which joint distributions its allowed to consider.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Attractor Theory: A Model of Minds and Motivation
[Epistemic status: Moderately strong. Attractor Theory is a model based on the well-researched concept of time-inconsistent preferences combined with anecdotal evidence that extends the theory to how actions affect our preferences in general. See the Caveats at the end for a longer discussion on what this model is and isn’t.]
<Cross-posted from mindlevelup>
I’ve thinking about minds and motivation somewhat on/off for about a year now, and I think I now have a model that merges some related ideas together into something useful. The model is called Attractor Theory, and it brings together ideas from Optimizing Your Mindstate, behavioral economics, and flow.
Attractor Theory is my attempt to provide a way of looking at the world that hybridizes ideas from the Resolve paradigm (where humans Actually Try and exert their will) and the “click-whirr” paradigm (where humans are driven by “if-then” loops and proceduralized habits).
As a brief summary, Attractor Theory basically states that you should consider any action you take as being easier to continue than to start, as well as having meta-level effects on changing your perception of which actions feel desirable.
Here’s a metaphor that provides most of the intuitions behind Attractor Theory:
Imagine that you are in a hamster ball:
As a human inside this ball, you can kinda roll around by exerting energy. But it’s hard to do so all of the time — you’d likely get tired. Still, if you really wanted to, you could push the ball and move.
These are Utilons. They represent productivity hours, lives saved, HPMOR fanfictions written, or anything else you care about maximizing. You are trying to roll around and collect as many Utilons as possible.
But the terrain isn’t actually smooth. Instead, there are all these Attractors that pull you towards them. Attractors are like valleys, or magnets, or point charges. Or maybe electrically charged magnetic valleys. (I’m probably going to Physics Hell for that.)
The point is that they draw you towards them, and it’s hard to resist their pull.
Also, Attractors have an interesting property: Once you’re being pulled in by one, this actually modifies other Attractors. This usually manifests by changing how strongly other ones are pulling you in. Sometimes, though, this even means that some Attractors will disappear, and new ones may appear.
As a human, your goal is to navigate this tangle of Utilons and Attractors from your hamster ball, trying to collect Utilons.
Now you could just try to take a direct path to all the nearest Utilons, but that would mean exerting a lot of energy to fight the pull of Attractors that pull you in Utilon-sparse directions.
Instead, given that you can’t avoid Attractors (they’re everywhere!) and that you want to get as many Utilons as possible, the best thing to do seems to be to strategically choose which Attractors you’re drawn to and selectively choose when to exert energy to move from one to another to maximize your overall trajectory.
In the above metaphor, actions and situations serve as Attractors, which are like slippery slopes that pull you in. Your agency is represented by the “meta-human” that inhabits the ball, which has some limited control when it comes to choosing which Attractor-loops to dive into and which ones to pop out of.
So the default view of humans and decisions seems to be something like viewing actions as time-chunks that we can just slot into our schedule. Attractor Theory attempts to present a model that moves away from that and shifts our intuitions to:
a) think less about our actions in a vacuum / individually
b) consider starting / stopping costs more
c) see our preferences in a more mutable light
It’s my hope that thinking about actions in as “things that draw you in” can better improve our intuitions about global optimization:
My point here is that, phenomenologically, it feels like our actions change the sorts of things we might want. Every time we take an action, this will, in turn, prime how we view other actions, often in somewhat predictable ways. I might not know exactly how they’ll change, but we can get good, rough ideas from past experience and our imaginations.
For example, the set of things that feel desirable to me after running a marathon may differ greatly from the set of things after I read a book on governmental corruption.
(I may still have core values, like wanting everyone to be happy, which I place higher up in my sense of self, which aren’t affected by these, but I’m mainly focusing on how object-level actions feel for this discussion. There’s a longer decision-theoretic discussion here that I’ll save for a later post.)
When you start seeing your actions in terms of, not just their direct effects, but also their effects on how you can take further actions, I think this is useful. It changes your decision algorithm to be something like:
“Choose actions such that their meta-level effects on me by my taking them allow me to take more actions of this type in the future and maximize the number of Utilons I can earn in the long run.”
By phrasing it this way, it makes it more clear that most things in life are a longer-term endeavor that involve trying to globally optimize, rather than locally. It also provides a model for evaluating actions on a new axis — the extent to which is influences your future, which seems like an important thing to consider.
(While it’s arguable that a naive view of maximization should by default take this into account from a consequentialist lens, I think making it explicitly clear, as the above formulation does, is a useful distinction.)
This allows us to better evaluate actions which, by themselves, might not be too useful, but do a good job of reorienting ourselves into a better state of mind. For example, spending a few minutes outside to get some air might not be directly useful, but it’ll likely help clear my mind, which has good benefits down the line.
Along the same lines, you want to view actions, not as one-time deals, but a sort of process that actively changes how you perceive other actions. In fact, these effects should somtimes perhaps be as important a consideration as time or effort when looking at a task.
Attractor Theory also conceptually models the idea of precommitment:
Humans often face situations where we fall prey to “in the moment” urges, which soon turn to regret. These are known as time-inconsistent preferences, where what we want quickly shifts, often because we are in the presence of something that really tempts us.
An example of this is the dieter who proclaims “I’ll just give in a little today” when seeing a delicious cake on the restaurant menu, and then feeling “I wish I hadn’t done that” right after gorging themselves.
Precommitment is the idea that you can often “lock-in” your choices beforehand, such that you will literally be unable to give into temptation when the actual choice comes before you, or entirely avoid the opportunity to even face the choice.
An example from the above would be something like having a trustworthy friend bring food over instead of eating out, so you can’t stuff yourself on cake because you weren’t even the one who ordered food.
There’s seems to be a general principle here of going “upstream”, such that you’re trying to target places where you have the most control, such that you can improve your experiences later down the line. This seems to be a useful idea, whether the question is about finding leverage or self-control.
Attractor Theory views all actions and situations as self-reinforcing slippery slopes. As such, it more realistically models the act of taking certain actions as leading you to other Attractors, so you’re not just looking at things in isolation.
In this model, we can reasonably predict, for example, that any video on YouTube will likely lead to more videos because the “sucked-in-craving-more-videos Future You” will have different preferences than “needing-some-sort-of-break Present You”.
This view allows you to better see certain “traps”, where an action will lead you deeper and deeper down an addiction/reward cycle, like a huge bag of chips or a webcomic. These are situations where, after the initial buy-in, it becomes incredibly attractive to continue down the same path, as these actions make reinforce themselves, making it easy to continue on and on…
Under the Attractor metaphor, your goal, then, is to focus on finding ways of being drawn to certain actions and avoidong others. You wan to find ways that you can avoid specific actions which you could lead you down bad spirals, even if the initial actions themselves may not be that distractiong.
The result is chaining together actions and their effects on how you perceive things in an upstream way, like precommitment.
Exploring, Starting, and Stopping:
Local optima is also visually represented by this model: We can get caught in certain chains of actions that do a good job of netting Utilons. Similar to the above traps, it can be hard to try new things once we’ve found an effective route already.
Chances are, though, that there’s probably even more Utilons to be had elsewhere. In which case, being able to break out to explore new areas could be useful.
Attractor Theory also does a good job of modeling how actions seem much harder to start than to stop. Moving from one Attractor to a disparate one can be costly in terms of energy, as you need to move against the pull of the current Attractor.
Once you’re pulled in, though, it’s usually easier to keep going with the flow. So using this model ascribes costs to starting and places less of a cost on continuing actions.
By “pulled in”, I mean making it feel effortless or desirable to continue with the action. I’m thinking of the feeling you get when you have a decent album playing music, and you feel sort of tempted to switch it to a better album, except that, given that this good song is already playing, you don’t really feel like switching.
Given the costs between switching, you want to invest your efforts and agency into, perhaps not always choosing the immediate Utilon-maximizing action moment-by-moment but by choosing the actions / situations whose attractors pull you in desirable directions, or make it such that other desirable paths are now easier to take.
Summary and Usefulness:
Attractor Theory attempts to retain willpower as a coherent idea, while also hopefully more realistically modeling how actions can affect our preferences with regards to other actions.
It can serve as an additional intuition pump behind using willpower in certain situations. Thinking about “activation energy” in terms of putting in some energy to slide into positive Attractors removes the mental block I’ve recently had on using willpower. (I’d been stuck in the “motivation should come from internal cooperation” mindset.)
The meta-level considerations when looking at how Attractors affect how other Attractors affect us provides a clearer mental image of why you might want to precommit to avoid certain actions.
For example, when thinking about taking breaks, I now think about which actions can help me relax without strongly modifying my preferences. This means things like going outside, eating a snack, and drawing as far better break-time activities than playing an MMO or watching Netflix.
This is because the latter are powerful self-reinforcing Attractors that also pull me towards more reward-seeking directions, which might distract me from my task at hand. The former activities can also serve as breaks, but they don’t do much to alter your preferences, and thus, help keep you focused.
I see Attractor Theory as being useful when it comes to thinking upstream and providing an alternative view of motivation that isn’t exactly internally based.
Hopefully, this model can be useful when you look at your schedule to identify potential choke-points / bottlenecks can arise, as a result of factors you hadn’t previously considered, when it comes to evaluating actions.
Attractor Theory assumes that different things can feel desirable depending on the situation. It relinquishes some agency by assuming that you can’t always choose what you “want” because of external changes to how you perceive actions. It also doesn’t try to explain internal disagreements, so it’s still largely at odds with the Internal Double Crux model.
I think this is fine. The goal here isn’t exactly to create a wholly complete prescriptive model or a descriptive one. Rather, it’s an attempt to create a simplified model of humans, behavior, and motivation into a concise, appealing form your intuitions can crystallize, similar to the System 1 and System 2 distinction.
I admit that if you tend to use an alternate ontology when it comes to viewing how your actions relate to the concept of “you”, this model might be less useful. I think that’s also fine.
This is not an attempt to capture all of the nuances / considerations in decision-making. It’s simply an attempt to partially take a few pieces and put them together in a more coherent framework. Attractor Theory merely takes a few pieces that I’d previously had as disparate nodes and chunks them together into a more unified model of how we think about doing things.
Rationalists like to live in group houses. We are also as a subculture moving more and more into a child-having phase of our lives. These things don't cooperate super well - I live in a four bedroom house because we like having roommates and guests, but if we have three kids and don't make them share we will in a few years have no spare rooms at all. This is frustrating in part because amenable roommates are incredibly useful as alloparents if you value things like "going to the bathroom unaccompanied" and "eating food without being screamed at", neither of which are reasonable "get a friend to drive for ten minutes to spell me" situations. Meanwhile there are also people we like living around who don't want to cohabit with a small child, which is completely reasonable, small children are not for everyone.
For this and other complaints ("househunting sucks", "I can't drive and need private space but want friends accessible", whatever) the ideal solution seems to be somewhere along the spectrum between "a street with a lot of rationalists living on it" (no rationalist-friendly entity controls all those houses and it's easy for minor fluctuations to wreck the intentional community thing) and "a dorm" (sorta hard to get access to those once you're out of college, usually not enough kitchens or space for adult life). There's a name for a thing halfway between those, at least in German - "baugruppe" - buuuuut this would require community or sympathetic-individual control of a space and the money to convert it if it's not already baugruppe-shaped.
Maybe if I complain about this in public a millionaire will step forward or we'll be able to come up with a coherent enough vision to crowdfund it or something. I think there is easily enough demand for a couple of ten-to-twenty-adult baugruppen (one in the east bay and one in the south bay) or even more/larger, if the structures materialized. Here are some bulleted lists.
- Units that it is really easy for people to communicate across and flow between during the day - to my mind this would be ideally to the point where a family who had more kids than fit in their unit could move the older ones into a kid unit with some friends for permanent sleepover, but still easily supervise them. The units can be smaller and more modular the more this desideratum is accomplished.
- A pricing structure such that the gamut of rationalist financial situations (including but not limited to rent-payment-constraining things like "impoverished app academy student", "frugal Google engineer effective altruist", "NEET with a Patreon", "CfAR staffperson", "not-even-ramen-profitable entrepreneur", etc.) could live there. One thing I really like about my house is that Spouse can pay for it himself and would by default anyway, and we can evaluate roommates solely on their charming company (or contribution to childcare) even if their financial situation is "no". However, this does require some serious participation from people whose financial situation is "yes" and a way to balance the two so arbitrary numbers of charity cases don't bankrupt the project.
- Variance in amenities suited to a mix of Soylent-eating restaurant-going takeout-ordering folks who only need a fridge and a microwave and maybe a dishwasher, and neighbors who are not that, ideally such that it's easy for the latter to feed neighbors as convenient.
- Some arrangement to get repairs done, ideally some compromise between "you can't do anything to your living space, even paint your bedroom, because you don't own the place and the landlord doesn't trust you" and "you have to personally know how to fix a toilet".
- I bet if this were pulled off at all it would be pretty easy to have car-sharing bundled in, like in Benton House That Was which had several people's personal cars more or less borrowable at will. (Benton House That Was may be considered a sort of proof of concept of "20 rationalists living together" but I am imagining fewer bunk beds in the baugruppe.) Other things that could be shared include longish-term storage and irregularly used appliances.
- Dispute resolution plans and resident- and guest-vetting plans which thread the needle between "have to ask a dozen people before you let your brother crash on the couch, let alone a guest unit" and "cannot expel missing stairs". I think there are some rationalist community Facebook groups that have medium-trust networks of the right caution level and experiment with ways to maintain them.
- Bikeshedding. Not that it isn't reasonable to bikeshed a little about a would-be permanent community edifice that you can't benefit from or won't benefit from much unless it has X trait - I sympathize with this entirely - but too much from too many corners means no baugruppen go up at all even if everything goes well, and that's already dicey enough, so please think hard on how necessary it is for the place to be blue or whatever.
- Location. The only really viable place to do this for rationalist population critical mass is the Bay Area, which has, uh, problems, with new construction. Existing structures are likely to be unsuited to the project both architecturally and zoningwise, although I would not be wholly pessimistic about one of those little two-story hotels with rooms that open to the outdoors or something like that.
- Principal-agent problems. I do not know how to build a dormpartment building and probably neither do you.
- Community norm development with buy-in and a good match for typical conscientiousness levels even though we are rules-lawyery contrarians.
Please share this wherever rationalists may be looking; it's definitely the sort of thing better done with more eyes on it.