Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
This is the first in a series of posts I am putting together on a personal blog I just started two days ago as a collection of my musings on astrobiology ("The Great A'Tuin" - sorry, I couldn't help it), and will be reposting here. Much has been written here about the Fermi paradox and the 'great filter'. It seems to me that going back to a somewhat more basic level of astronomy and astrobiology is extremely informative to these questions, and so this is what I will be doing. The bloggery is intended for a slightly more general audience than this site (hence much of the content of the introduction) but I think it will be of interest. Many of the points I will be making are ones I have touched on in previous comments here, but hope to explore in more detail.
This post is a combined version of my first two posts - an introduction, and a discussion of our apparent position in space and time in the universe. The blog posts may be found at:
Text reproduced below.
What's all this about?
This blog is to be a repository for the thoughts and analysis I've accrued over the years on the topic of astrobiology, and the place of life and intelligence in the universe. All my life I've been pulled to the very large and the very small. Life has always struck me as the single most interesting thing on Earth, with its incredibly fine structure and vast, amazing history and fantastic abilities. At the same time, the vast majority of what exists is NOT on Earth. Going up in size from human-scale by the same number of orders of magnitude as you go down through to get to a hydrogen atom, you get just about to Venus at its closest approach to Earth - or one billionth the distance to the nearest star. The large is much larger than the small is small. On top of this, we now know that the universe as we know it is much older than life on Earth. And we know so little of the vast majority of the universe.
There's a strong tendency towards specialization in the sciences. These days, there pretty much has to be for anybody to get anywhere. Much of the great foundational work of physics was done on tabletops, and the law of gravitation was derived from data on the motions of the planets taken without the benefit of so much as a telescope. All the low-hanging fruit has been picked. To continue to further knowledge of the universe, huge instruments and vast energies are put to bear in astronomy and physics. Biology is arguably a bit different, but the very complexity that makes living systems so successful and so fascinating to study means that there is so much to study that any one person is often only looking at a very small problem.
This has distinct drawbacks. The universe does not care for our abstract labels of fields and disciplines - it simply is, at all scales simultaneously at all times and in all places. When people focus narrowly on their subject of interest, it can prevent them from realizing the implications of their findings on problems usually considered a different field.
It is one of my hopes to try to bridge some gaps between biology and astronomy here. I very nearly double-majored in biology and astronomy in college; the only thing that prevented this (leading to an astronomy minor) was a bad attitude towards calculus. As is, I am a graduate student studying basic cell biology at a major research university, who nonetheless keeps in touch with a number of astronomer friends and keeps up with the field as much as possible. I quite often find that what I hear and read about has strong implications for questions of life elsewhere in the universe, but see so few of these implications actually get publicly discussed. All kinds of information shedding light on our position in space and time, the origins of life, the habitability of large chunks of the universe, the course that biospheres take, and the possible trajectories of intelligences seem to me to be out there unremarked.
It is another of my hopes to try, as much as is humanly possible, to take a step back from the usual narratives about extraterrestrial life and instead focus from something closer to first principles. What we actually have observed and have not, what we can observe and what we cannot, and what this leaves open, likely, or unlikely. In my study of the history of the ideas of extraterrestrial life and extraterrestrial intelligence, all too often these take a back seat to popular narratives of the day. In the 16th century the notion that the Earth moved in a similar way to the planets gained currency and lead to the suppositions that they might be made of similar stuff and that the planets might even be inhabited. The hot question was, of course, if their inhabitants would be Christians and their relationship with God given the anthropocentric biblical creation stories. In the late 19th and early 20th century, Lowell's illusory canals on Mars were advanced as evidence for a Martian socialist utopia. In the 1970s, Carl Sagan waxed philosophical on the notion that contacting old civilizations might teach us how to save ourselves from nuclear warfare. Today, many people focus on the Fermi paradox - the apparent contradiction that since much of the universe is quite old, extraterrestrials experiencing continuing technological progress and growth should have colonized and remade it in their image long ago and yet we see no evidence of this. I move that all of these notions have a similar root - inflating the hot concerns and topics of the day to cosmic significance and letting them obscure the actual, scientific questions that can be asked and answered.
Life and intelligence in the universe is a topic worth careful consideration, from as many angles as possible. Let's get started.
Space and Time
This is just a short note to point out that AIs can self-improve without having to self-modify. So locking down an agent from self-modification is not an effective safety measure.
How could AIs do that? The easiest and the most trivial is to create a subagent, and transfer their resources and abilities to it ("create a subagent" is a generic way to get around most restriction ideas).
Or it the AI remains unchanged and in charge, it could change the whole process around itself, so that the whole process changes and improves. For instance, if the AI is inconsistent and has to pay more attention to problems that are brought to its attention than problems that aren't, it can start to act to manage the news (or the news-bearers) to hear more of what it wants. If it can't experiment on humans, it will give advice that will cause more "natural experiments", and so on. It will gradually try to reform its environment to get around its programmed limitations.
Anyway, that was nothing new or deep, just a reminder point I hadn't seen written out.
Some have argued that "tool AIs" are safe(r). Recently, Eric Drexler decomposed AIs into "problem solvers" (eg calculators), "advisors" (eg GPS route planners), and actors (autonomous agents). Both solvers and advisors can be seen as examples of tools.
People have argued that tool AIs are not safe. It's hard to imagine a calculator going berserk, no matter what its algorithm is, but it's not too hard to come up with clear examples of dangerous tools. This suggests the solvers vs advisors vs actors (or tools vs agents, or oracles vs agents) is not the right distinction.
Instead, I've been asking: how likely is the algorithm to implement a pernicious policy? If we model the AI as having an objective function (or utility function) and algorithm that implements it, a pernicious policy is one that scores high in the objective function but is not at all what is intended. A pernicious function could be harmless and entertaining or much more severe.
I will lay aside, for the moment, the issue of badly programmed algorithms (possibly containing its own objective sub-functions). In any case, to implement a pernicious function, we have to ask these questions about the algorithm:
- Do pernicious policies exist? Are there many?
- Can the AI find them?
- Can the AI test them?
- Would the AI choose to implement them?
The answer to 1. seems to be trivially yes. Even a calculator could, in theory, output a series of messages that socially hack us, blah, take over the world, blah, extinction, blah, calculator finishes its calculations. What is much more interesting is some types of agents have many more pernicious policies than others. This seems the big difference between actors and other designs. An actor AI in complete control of the USA or Russia's nuclear arsenal has all sort of pernicious policies easily to hand; an advisor or oracle has much fewer (generally going through social engineering), a tool typically even less. A lot of the physical protection measures are about reducing the number of sucessfull pernicious policies the AI has a cess to.
The answer to 2. is mainly a function of the power of the algorithm. A basic calculator will never find anything dangerous: its programming is simple and tight. But compare an agent with the same objective function and the ability to do an unrestricted policy search with vast resources... So it seems that the answer to 2. does not depend on any solver vs actor division, but purely on the algorithm used.
And now we come to the big question 3., whether the AI can test these policies. Even if the AI can find pernicious policies that rank high on its objective function, it will never implement them unless it can ascertain this fact. And there are several ways it could do so. Let's assume that a solver AI has a very complicated objective function - one that encodes many relevant facts about the real world. Now, the AI may not "care" about the real world, but it has a virtual version of that, in which it can virtually test all of its policies. With a detailed enough computing power, it can establish whether the pernicious policy would be effective at achieving its virtual goal. If this is a good approximation of how the pernicious policy would behave in the real world, we could have a problem.
But extremely detailed objective functions are unlikely. But even simple ones can show odd behaviour if the agents gets to interact repeatedly with the real world - this is the issue with reinforcement learning. Suppose that the agent attempts a translation job, and is rewarded on the accuracy of its translation. Depending on the details of what the AI knows and who choose the rewards, the AI could end up manipulating its controllers, similarly to this example. The problem is that one there is any interaction, all the complexity of humanity could potentially show up in the reward function, even if the objective function is simple.
Of course, some designs make this very unlikely - resetting the AI periodically can help to alleviate the problem, as can choosing more objective criteria for any rewards. Lastly on this point, we should mention the possibility that human R&D, by selecting and refining the objective function and the algorithm, could take the roll of testing the policies. This is likely to emerge only in cases where many AI designs are considered, and the best candiates are retained based on human judgement.
Finally we come to the question of whether the AI will implement the policy if it's found it and tested it. You could say that the point of FAI is to create an AI that doesn't choose to implement pernicious policies - but, more correctly, the point of FAI is to ensure that very few (or zero) pernicious policies exist in the first place, as they all score low on the utility function. However, there are a variety of more complicated designs - satisficers, agents using crude measures - where the questions of "Do pernicious policies exist?" and "Would the AI choose to implement them?" could become quite distinct.
Conclusion: a more through analysis of AI designs is needed
A calculator is safe, because it is a solver, it has a very simple objective function, with no holes in the algorithm, and it can neither find nor test any pernicious policies. It is the combination of these elements that makes it almost certainly safe. If we want to make the same claim about other designs, neither "it's just a solver" or "it's objective function is simple" would be enough; we need a careful analysis.
Though, as usual, "it's not certainly safe" is a quite distinct claim from "it's (likely) dangerous", and they should not be conflated.
A putative new idea for AI control; index here.
- Nasty extrapolation of concepts (though badly implemented value learning or badly coded base concepts).
- AI's making themselves into nasty expected utility maximisers.
- AI's hacking themselves to maximum reward.
- AI's creating successor agents that differ from them in dangerous ways.
- People hacking themselves to maximum apparent happiness.
- Problems with Coherent Extrapolated Volition.
- Problems with unrestricted search.
- Some issues I have with some of Paul Christiano's designs.
- Reflective equilibrium itself.
Speaking very broadly, there are two features all them share:
- The convergence criteria are self-referential.
- Errors in the setup are likely to cause false convergence.
What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup. In other words, the stopping point (and the the convergence to the stopping point) is entirely self-referentially defined: the morality judges itself. It does not include any other moral considerations. You input your initial moral intuitions and values, and you hope this will cause the end result to be "nice", but the definition of the end result does not include your initial moral intuitions (note that some moral realists could see this process dependence as a positive - except for the fact that these processes have many convergent states, not just one or a small grouping).
So when the process goes nasty, you're pretty sure to have achieved something self-referentially stable, but not nice. Similarly, a nasty CEV will be coherent and have no desire to further extrapolate... but that's all we know about it.
The second feature is that any process has errors - computing errors, conceptual errors, errors due to the weakness of human brains, etc... If you visualise this as noise, you can see that noise in a convergent process is more likely to cause premature convergence, because if the process ever reaches a stable self-referential state, it will stay there (and if the process is a long one, then early noise will cause great divergence at the end). For instance, imagine you have to reconcile your belief in preserving human cultures with your beliefs in human individual freedom. A complex balancing act. But if, at any point along the way, you simply jettison one of the two values completely, things become much easier - and once jettisoned, the missing value is unlikely to ever come back.
Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps - and again, once you lose something of value in your system, you don't tend to get if back.
And again, very broadly speaking, there are several classes of solutions to deal with these problems:
- Reduce or prevent errors in the extrapolation (eg solving the agent tiling problem).
- Solve all or most of the problem ahead of time (eg traditional FAI approach by specifying the correct values).
- Make sure you don't get too far from the starting point (eg reduced impact AI, tool AI, models as definitions).
- Figure out the properties of a nasty convergence, and try to avoid them (eg some of the ideas I mentioned in "crude measures", general precautions that are done when defining the convergence process).
This post will attempt a (yet another) analysis of the problem of the Sleeping Beauty, in terms of Jaynes' framework "probability as extended logic" (aka objective Bayesianism).
TL,DR: The problem of the sleeping beauty reduces to interpreting the sentence “a fair coin is tossed”: it can mean either that no results of the toss is favourite, or that the coin toss is not influenced by anthropic information, but not both at the same time. Fairness is a property in the mind of the observer that must be further clarified: the two meanings cannot be confused.
What I hope to show is that the two standard solutions, 1/3 and 1/2 (the 'thirder' and the 'halfer' solutions), are both consistent and correct, and the confusion lies only in the incorrect specification of the sentence "a fair coin is tossed".
I'm going to symbolize the events in the following way:
- It's Monday = Mon
- It's Tuesday = Tue
- The coin landed head = H
- The coin landed tail = T
- statement "A and B" = A & B
- statement "not A" = ~A
The problem setup leads to an uncontroversial attributions of logical structure:
1) H = ~T (the coin can land only on head or tail)
2) Mon = ~Tue (if it's Tuesday, it cannot be Monday, and viceversa)
And of probability:
3) P(Mon|H) = 1 (upon learning that the coin landed head, the sleeping beauty knows that it’s Monday)
4) P(T|Tue) = 1 (upon learning that it’s Tuesday, the sleeping beauty knows that the coin landed tail)
Using the indifference principle, we can also derive another equation.
Let's say that the Sleeping Beauty is awaken and told that the coin landed tail, but nothing else. Since she has no information useful to distinguish between Monday and Tuesday, she should assign both events equal probability. That is:
5) P(Mon|T) = P(Tue|T)
6) P(Mon & T) = P(Mon|T)P(T) = P(Tue|T)P(T) = P(Tue & T)
It's here that the analysis between "thirder" and "halfer" starts to diverge.
The wikipedia article says "Guided by the objective chance of heads landing being equal to the chance of tails landing, it should therefore hold that". We know however that there's no such thing as 'the objective chance'.
Thus, "a fair coin will be tossed", in this context, will mean different things for different people.
The thirders interpret the sentence to mean that beauty learns no new facts about the coin upon learning that it is Monday.
They thus make the assumption:
(TA) P(T|Mon) = P(H|Mon)
7) P(Mon & H) = P(H|Mon)P(Mon) = P(T|Mon)P(Mon) = P(Mon & T)
From 6) and 7) we have:
8) P(Mon & H) = P(Mon & T) = P(Tue & T)
And since those events are a partition of unity, P(Mon & H) = 1/3.
And indeed from 8) and 3):
9) 1/3 = P(Mon & H) = P(Mon|H)P(H) = P(H)
So that, under TA, P(H) = 1/3 and P(T) = 2/3.
Notice that also, since if it’s Monday the coin landed either on head or tail, P(H|Mon) = 1/2.
The thirder analysis of the Sleeping Beauty problem is thus one in which "a fair coin is tossed" means "Sleeping Beauty receives no information about the coin from anthropic information".
There is however another way to interpret the sentence, that is the halfer analysis:
(HA) P(T) = P(H)
Here, a fair coin is tossed means simply that we assign no preference to either side of the coin.
Obviously from 1:
10) P(T) + P(H) = 1
So that, from 10) and HA)
11) P(H) = 1/2, P(T) = 1/2
But let’s not stop here, let’s calculate P(H|Mon).
First of all, from 3) and 11)
12) P(H & Mon) = P(H|Mon)P(Mon) = P(Mon|H)P(H) = 1/2
From 5) and 11) also
13) P(Mon & T) = 1/4
But from 12) and 13) we get
14) P(Mon) = P(Mon & T) + P(Mon & H) = 1/2 + 1/4 = 3/4
So that, from 12) and 14)
15) P(H|Mon) = P(H & Mon) / P(Mon) = 1/2 / 3/4 = 2/3
We have seen that either P(H) = 1/2 and P(H|Mon) = 2/3, or P(H) = 2/3 and P(H|Mon) = 1/2.
Nick Bostrom is correct in saying that self-locating information changes the probability distribution, but this is true in both interpretations.
The problem of the sleeping beauty reduces to interpreting the sentence “a fair coin is tossed”: it can mean either that no results of the toss is favourite, or that the coin toss is not influenced by anthropic information, that is, you can attribute the fairness of the coin to prior or posterior distribution.
Either P(H)=P(T) or P(H|Mon)=P(T|Mon), but both at the same time is not possible.
If probability were a physical property of the coin, then so would be its fairness. But since the causal interactions of the coin possess both kind of indifference (balance and independency from the future), that would make the two probability equivalent.
That such is not the case just means that fairness is a property in the mind of the observer that must be further clarified, since the two meanings cannot be confused.
I. Humans are emotion-feeling machines.
I don’t mean that humans are machines that happen to feel emotions. I mean that humans are machines whose output is the feeling of emotions—“emotion-feeling” is the thing of value that we produce.
Not just “being happy." Then wireheading is the ultimate good, rather than the go-to utopia-horror example. But emotions must be involved, because everything else one can do are no more than a means to an end. Producing things, propagating life, even thinking. They all seem like endeavors that are useful, but a life of maximizing those things would suck. And the implication is that if we can create a machine that can do those things better than we can, it would be good to replace ourselves with that machine and set it to reproduce itself infinitely.
I recently saw a statement to the effect of “Art exists to produce feelings in us that we want, but do not get enough of in the course of normal life.” That’s what makes art valuable – supplementing emotional malnutrition. Such a thing exists because “to feel emotions” is the core function of humanity, and not fulfilling that function hurts like not eating does.
This is why (for many people) the optimal level of psychosis is non-zero. This is why intelligence is important – a greater level of intelligence allows a species to experience far more complex and nuanced emotional states. And the ability to experience more varieties of emotions is why it’s better to become more complex rather than simply dialing up happiness. It’s why disorders that prevent us from experiencing certain emotions are so awful (with the worst obviously being the ones that prevent us from feeling the “best” desires)
It’s why we like funny things, and tragic things, and scary things. Who wants to feel the way they feel after watching all of Evangelion?? Turns out – everyone, at some point, for at least a little bit of time!
It is why all human life has value. You do not matter based on what you can produce, or how smart you are, or how useful you are to others. You matter because you are a human who feels things.
My utility function is to feel a certain elastic web of emotions, and it varies from other utility functions by which emotions are desired in which amounts. My personality determines what actions produce what emotions.
And a machine that could feel things even better than humans can could be a wonderful thing. Greg Egan’s "Diaspora" features an entire society of uploaded humans, living rich, complex lives of substance. Loving, striving, crying, etc. The society can support far more humans than is physically possible in meat-bodies, running far faster than is possible in realspace. Since all these humans are running on computer chips, one could argue that one way of looking at this thing is not “A society of uploaded humans” but “A machine that feels human emotions better than meat-humans do.” And it’s a glorious thing. I would be happy to live in such a society.
II. God Mode is Super Lame
Why not just wirehead with a large and complex set of emotions?
I’m old enough to have played the original Doom when it came out (sooo old!). It had a cheat-code that made you invincible, commonly called god-mode. The first thing you notice is that it’s super cool to be invincible and just mow down all those monsters with impunity! The next thing you notice is that after a while (maybe ten minutes?) it loses all appeal. It becomes boring. There is no game anymore, once you no longer have to worry about taking damage. It becomes a task. You start enabling other cheats to get through it faster. Full-ammo cheats, to just use the biggest, fastest gun nonstop and get those monsters out of your way. Then walk-through-wall cheats, so you can just go straight to the level exit without wandering around looking for keys. Over, and over, and over again, level after level. It becomes a Kafka-esque grotesquery. Why am I doing this? Why am I here? Is my purpose just to keep walking endlessly from Spawn Point to Exit, the world passing around me in a blur, green and blue explosions obscuring all vision? When will this end?
It was a relief to be finished with the game.
That was my generation’s first brush with the difference between goal-oriented objectives, and process-oriented objectives. We learned that the point of a game isn’t to get to the end, the point is to play the game. It used to be that if you wanted to be an awesome guitarist, you had to go through the process of playing guitar a LOT. There was no shortcut. So one could be excused for confusing “I want to be a rock star” with “I want to be playing awesome music.” Before cheat codes, getting to the end of the game was fun, so we thought that was our objective. After cheat-codes we could go straight to the end any time we wanted, and now we had to choose – is your objective really just to get to the end? Or is it to go through the process of playing the game?
Some things are goal-oriented, of course. Very few people clean their toilets because they enjoy the process of cleaning their toilet. They want their toilet to be clean. If they could push a button and have a clean toilet without having to do the cleaning, they would.
Process-oriented objectives still have a goal. You want to beat the game. But you do not want first-order control over the bit “Game Won? Y/N”. You want first-order control over the actions that can get you there – strafing, shooting, jumping – resulting in second-order control over if the bit finally gets flipped or not.
First-order control is god mode. Your goal is completed with full efficiency. Second-order control is indirect. You can take actions, and those actions will, if executed well, get you closer to your goal. They are fuzzier, you can be wrong about their effects, their effects can be inconsistent over time, and you can get better at using them. You can tell if you’d prefer god-mode for a task by considering if you’d like to have it completed without going through the steps.
Do you want to:
Have Not Played The Game, And Have It Completed? or Be Playing The Game?
Have A Clean Toilet, Without Cleaning It Yourself? or Be Cleaning The Toilet?
Be At The End of a Movie? or Be Watching The Movie?
If the answer is in the first column, you want first-order control. If it is in the second column, you want second-order control.
Wireheading, even variable multi-emotional wireheading, assumes that emotions are a goal-oriented objective, and thus takes first-order control of one’s emotional state. I contest that emotions are a process-oriented objective. The purpose is to evoke those emotions by using second-order control – taking actions that will lead to those emotions being felt. To eliminate that step and go straight to the credits is to lose the whole point of being human.
III. Removing The Person From The Output
How is the process of playing Doom without cheat codes distinguished from the process of repeatedly pushing a button connected to certain electrodes in your head that produce the emotions associated with playing Doom without cheat codes? (Or just lying there while the computer chooses which electrodes to stimulate on your behalf?)
If it’s just the emotions without the experiences that would cause those emotions, I think that’s a huge difference. That is once again just jumping right to the end-state, rather than experiencing the process that brings it about. It’s first-order control, and that efficiency and directness strips out all the complexity and nuance of a second-order experience.
See Incoming Fireball -> Startled, Fear
Strafe Right -> Anticipation, Dread
Fireball Dodged -> Relief
Return Fire -> Vengeance!!
Is strictly more complicated than just
The key difference being that in the first case, the player is entangled in the process. While these things are designed to produce a specific and very similar experiences for everyone (which is why they’re popular to a wide player base), it takes a pre-existing person and combines them with a series of elements that is supposed to lead to an emotional response. The exact situation is unique(ish) for each person, because the person is a vital input. The output (of person feeling X emotions) is unique and personalized, as the input is different in every case.
When simply conjuring the emotions directly via wire, the individual is removed as an input. The emotions are implanted directly and do not depend on the person. The output (of person feeling X emotions) is identical and of far less complexity and value. Even if the emotions are hooked up to a random number generator or in some other way made to result in non-identical outputs, the situation is not improved. Because the problem isn’t so much “identical output” as it is that the Person was not an input, was not entangled in the process, and therefore doesn’t matter.
I actually don’t have much of a problem with simulated-realities. Already a large percentage of the emotions felt by middle-class people in the first world are due to simulated realities. We induce feelings via music, television/movies, video games, novels, and other art. I think this has had some positive effects on society – it’s nice when people can get their Thrill needs met without actually risking their lives and/or committing crimes. In fact, the sorts of people who still try to get all their emotional needs met in the real world tend to be destructive and dramatic and I’m sure everyone knows at least one person like that, and tries to avoid them.
I think a complete retreat to isolation would be sad, because other human minds are the most complex things that exist, and to cut that out of one’s life entirely would be an impoverishment. But a community of people interacting in a cyberworld, with access to physical reality? Shit, that sounds amazing!
Of course a “Total Recall” style system has the potential to become nightmarish. Right now when someone watches a movie, they bring their whole life with them. The movie is interpreted in light of one’s life experience. Every viewer has a different experience (some people have radically different experiences, as me and my SO recently discovered when we watched Birdman together. In fact, this comparing of the difference of experiences is the most fun part of my bi-weekly book club meetings. It’s kinda the whole point.). The person is an input in the process, and they’re mashed up into the product. If your proposed system would simply impose a memory or an experience onto someone else wholesale* without them being involved in the process, then it would be just as bad as the “series of emotions” process.
I have a vision of billions of people spending all of eternity simply reliving the most intense emotional experiences ever recorded, in perfect carbon copy, over and over again, and I shudder in horror. That’s not even being a person anymore. That’s overwriting your own existence with the recorded existence of someone(s) else. :(
But a good piece of art, that respects the person-as-input, and uses the artwork to cause them to create/feel more of their own emotions? That seems like a good thing.
(*this was adapted from a series of posts on my blog)
A well-known American federal appellate judge, Alex Kozinski, has written a commentary on systemic biases and institutional myths in the criminal justice system.
The basic thrust of his criticism will be familiar to readers of the sequences and rationalists generally. Lots about cognitive biases (but some specific criticisms of fingerprints and DNA evidence as well). Still, it's interesting that a prominent federal judge -- the youngest when appointed, and later chief of the Ninth Circuit -- would treat some sacred cows of the judiciary so ruthlessly.
This is specifically a criticism of U.S. criminal justice, but, ceteris paribus, much of it applies not only to other areas of U.S. law, but to legal practices throughout the world as well.
I recently wrote an essay about AI risk, targeted at other academics:
I think it might be interesting to some of you, so I am sharing it here. I would appreciate any feedback any of you have, especially from others who do AI / machine learning research.
Markets are powerful decentralized optimization engines - it is known. Liberals see the free market as a kind of optimizer run amuck, a dangerous superintelligence with simple non-human values that must be checked and constrained by the government - the friendly SI. Conservatives just reverse the narrative roles.
In some domains, where the incentive structure aligns with human values, the market works well. In our current framework, the market works best for producing gadgets. It does not work so well for pricing intangible information, and most specifically it is broken when it comes to health.
We treat health as just another gadget problem: something to be solved by pills. Health is really a problem of knowledge; it is a computational prediction problem. Drugs are useful only to the extent that you can package the results of new knowledge into a pill and patent it. If you can't patent it, you can't profit from it.
So the market is constrained to solve human health by coming up with new patentable designs for mass-producible physical objects which go into human bodies. Why did we add that constraint - thou should solve health, but thou shalt only use pills? (Ok technically the solutions don't have to be ingestible, but that's a detail.)
The gadget model works for gadgets because we know how gadgets work - we built them, after all. The central problem with health is that we do not completely understand how the human body works - we did not build it. Thus we should be using the market to figure out how the body works - completely - and arguably we should be allocating trillions of dollars towards that problem.
The market optimizer analogy runs deeper when we consider the complexity of instilling values into a market. Lawmakers cannot program the market with goals directly, so instead they attempt to engineer desireable behavior by ever more layers and layers of constraints. Lawmakers are deontologists.
As an example, consider the regulations on drug advertising. Big pharma is unsafe - its profit function does not encode anything like "maximize human health and happiness" (which of course itself is an oversimplification). If allowed to its own devices, there are strong incentives to sell subtly addictive drugs, to create elaborate hyped false advertising campaigns, etc. Thus all the deontological injunctions. I take that as a strong indicator of a poor solution - a value alignment failure.
What would healthcare look like in a world where we solved the alignment problem?
To solve the alignment problem, the market's profit function must encode long term human health and happiness. This really is a mechanism design problem - its not something lawmakers are even remotely trained or qualified for. A full solution is naturally beyond the scope of a little blog post, but I will sketch out the general idea.
To encode health into a market utility function, first we create financial contracts with an expected value which captures long-term health. We can accomplish this with a long-term contract that generates positive cash flow when a human is healthy, and negative when unhealthy - basically an insurance contract. There is naturally much complexity in getting those contracts right, so that they measure what we really want. But assuming that is accomplished, the next step is pretty simple - we allow those contracts to trade freely on an open market.
There are some interesting failure modes and considerations that are mostly beyond scope but worth briefly mentioning. This system probably needs to be asymmetric. The transfers on poor health outcomes should partially go to cover medical payments, but it may be best to have a portion of the wealth simply go to nobody/everybody - just destroyed.
In this new framework, designing and patenting new drugs can still be profitable, but it is now put on even footing with preventive medicine. More importantly, the market can now actually allocate the correct resources towards long term research.
To make all this concrete, let's use an example of a trillion dollar health question - one that our current system is especially ill-posed to solve:
What are the long-term health effects of abnormally low levels of solar radiation? What levels of sun exposure are ideal for human health?
This is a big important question, and you've probably read some of the hoopla and debate about vitamin D. I'm going to soon briefly summarize a general abstract theory, one that I would bet heavily on if we lived in a more rational world where such bets were possible.
In a sane world where health is solved by a proper computational market, I could make enormous - ridiculous really - amounts of money if I happened to be an early researcher who discovered the full health effects of sunlight. I would bet on my theory simply by buying up contracts for individuals/demographics who had the most health to gain by correcting their sunlight deficiency. I would then publicize the theory and evidence, and perhaps even raise a heap pile of money to create a strong marketing engine to help ensure that my investments - my patients - were taking the necessary actions to correct their sunlight deficiency. Naturally I would use complex machine learning models to guide the trading strategy.
Now, just as an example, here is the brief 'pitch' for sunlight.
If we go back and look across all of time, there is a mountain of evidence which more or less screams - proper sunlight is important to health. Heliotherapy has a long history.
Humans, like most mammals, and most other earth organisms in general, evolved under the sun. A priori we should expect that organisms will have some 'genetic programs' which take approximate measures of incident sunlight as an input. The serotonin -> melatonin mediated blue-light pathway is an example of one such light detecting circuit which is useful for regulating the 24 hour circadian rhythm.
The vitamin D pathway has existed since the time of algae such as the Coccolithophore. It is a multi-stage pathway that can measure solar radiation over a range of temporal frequencies. It starts with synthesis of fat soluble cholecalciferiol which has a very long half life measured in months.  
- Cholecalciferiol (HL ~ months) becomes
- 25(OH)D (HL ~ 15 days) which finally becomes
- 1,25(OH)2 D (HL ~ 15 hours)
The main recognized role for this pathway in regards to human health - at least according to the current Wikipedia entry - is to enhance "the internal absorption of calcium, iron, magnesium, phosphate, and zinc". Ponder that for a moment.
Interestingly, this pathway still works as a general solar clock and radiation detector for carnivores - as they can simply eat the precomputed measurement in their diet.
So, what is a long term sunlight detector useful for? One potential application could be deciding appropriate resource allocation towards DNA repair. Every time an organism is in the sun it is accumulating potentially catastrophic DNA damage that must be repaired when the cell next divides. We should expect that genetic programs would allocate resources to DNA repair and various related activities dependent upon estimates of solar radiation.
I should point out - just in case it isn't obvious - that this general idea does not imply that cranking up the sunlight hormone to insane levels will lead to much better DNA/cellular repair. There are always tradeoffs, etc.
One other obvious use of a long term sunlight detector is to regulate general strategic metabolic decisions that depend on the seasonal clock - especially for organisms living far from the equator. During the summer when food is plentiful, the body can expect easy calories. As winter approaches calories become scarce and frugal strategies are expected.
So first off we'd expect to see a huge range of complex effects showing up as correlations between low vit D levels and various illnesses, and specifically illnesses connected to DNA damage (such as cancer) and or BMI.
Now it turns out that BMI itself is also strongly correlated with a huge range of health issues. So the first key question to focus on is the relationship between vit D and BMI. And - perhaps not surprisingly - there is pretty good evidence for such a correlation  , and this has been known for a while.
Now we get into the real debate. Numerous vit D supplement intervention studies have now been run, and the results are controversial. In general the vit D experts (such as my father, who started the vit D council, and publishes some related research) say that the only studies that matter are those that supplement at high doses sufficient to elevate vit D levels into a 'proper' range which substitutes for sunlight, which in general requires 5000 IU day on average - depending completely on genetics and lifestyle (to the point that any one-size-fits all recommendation is probably terrible).
The mainstream basically ignores all that and funds studies at tiny RDA doses - say 400 IU or less - and then they do meta-analysis over those studies and conclude that their big meta-analysis, unsurprisingly, doesn't show a statistically significant effect. However, these studies still show small effects. Often the meta-analysis is corrected for BMI, which of course also tends to remove any vit D effect, to the extent that low vit D/sunlight is a cause of both weight gain and a bunch of other stuff.
So let's look at two studies for vit D and weight loss.
First, this recent 2015 study of 400 overweight Italians (sorry the actual paper doesn't appear to be available yet) tested vit D supplementation for weight loss. The 3 groups were (0 IU/day, ~1,000 IU / day, ~3,000 IU/day). The observed average weight loss was (1 kg, 3.8 kg, 5.4 kg). I don't know if the 0 IU group received a placebo. Regardless, it looks promising.
On the other hand, this 2013 meta-analysis of 9 studies with 1651 adults total (mainly women) supposedly found no significant weight loss effect for vit D. However, the studies used between 200 IU/day to 1,100 IU/day, with most between 200 to 400 IU. Five studies used calcium, five also showed weight loss (not necessarily the same - unclear). This does not show - at all - what the study claims in its abstract.
In general, medical researchers should not be doing statistics. That is a job for the tech industry.
Now the vit D and sunlight issue is complex, and it will take much research to really work out all of what is going on. The current medical system does not appear to be handling this well - why? Because there is insufficient financial motivation.
Is Big Pharma interested in the sunlight/vit D question? Well yes - but only to the extent that they can create a patentable analogue! The various vit D analogue drugs developed or in development is evidence that Big Pharma is at least paying attention. But assuming that the sunlight hypothesis is mainly correct, there is very little profit in actually fixing the real problem.
There is probably more to sunlight that just vit D and serotonin/melatonin. Consider the interesting correlation between birth month and a number of disease conditions. Perhaps there is a little grain of truth to astrology after all.
Thus concludes my little vit D pitch.
In a more sane world I would have already bet on the general theory. In a really sane world it would have been solved well before I would expect to make any profitable trade. In that rational world you could actually trust health advertising, because you'd know that health advertisers are strongly financially motivated to convince you of things actually truly important for your health.
Instead of charging by the hour or per treatment, like a mechanic, doctors and healthcare companies should literally invest in their patients long-term health, and profit from improvements to long term outcomes. The sunlight health connection is a trillion dollar question in terms of medical value, but not in terms of exploitable profits in today's reality. In a properly constructed market, there would be enormous resources allocated to answer these questions, flowing into legions of profit motivated startups that could generate billions trading on computational health financial markets, all without selling any gadgets.
So in conclusion: the market could solve health, but only if we allowed it to and only if we setup appropriate financial mechanisms to encode the correct value function. This is the UFAI problem next door.
I was recently re-reading a piece by Yvain/Scott Alexander called Epistemic Learned Helplessness. It's a very insightful post, as is typical for Scott, and I recommend giving it a read if you haven't already. In it he writes:
When I was young I used to read pseudohistory books; Immanuel Velikovsky's Ages in Chaos is a good example of the best this genre has to offer. I read it and it seemed so obviously correct, so perfect, that I could barely bring myself to bother to search out rebuttals.
And then I read the rebuttals, and they were so obviously correct, so devastating, that I couldn't believe I had ever been so dumb as to believe Velikovsky.
And then I read the rebuttals to the rebuttals, and they were so obviously correct that I felt silly for ever doubting.
And so on for several more iterations, until the labyrinth of doubt seemed inescapable.
He goes on to conclude that the skill of taking ideas seriously - often considered one of the most important traits a rationalist can have - is a dangerous one. After all, it's very easy for arguments to sound convincing even when they're not, and if you're too easily swayed by argument you can end up with some very absurd beliefs (like that Venus is a comet, say).
This post really resonated with me. I've had several experiences similar to what Scott describes, of being trapped between two debaters who both had a convincingness that exceeded my ability to discern truth. And my reaction in those situations was similar to his: eventually, after going through the endless chain of rebuttals and counter-rebuttals, changing my mind at each turn, I was forced to throw up my hands and admit that I probably wasn't going to be able to determine the truth of the matter - at least, not without spending a lot more time investigating the different claims than I was willing to. And so in many cases I ended up adopting a sort of semi-principled stance of agnosticism: unless it was a really really important question (in which case I was sort of obligated to do the hard work of investigating the matter to actually figure out the truth), I would just say I don't know when asked for my opinion.
[Non-exhaustive list of areas in which I am currently epistemically helpless: geopolitics (in particular the Israel/Palestine situation), anthropics, nutrition science, population ethics]
All of which is to say: I think Scott is basically right here, in many cases we shouldn't have too strong of an opinion on complicated matters. But when I re-read the piece recently I was struck by the fact that his whole argument could be summed up much more succinctly (albeit much more pithily) as:
"Don't be gullible."
Huh. Sounds a lot more obvious that way.
Now, don't get me wrong: this is still good advice. I think people should endeavour to not be gullible if at all possible. But it makes you wonder: why did Scott feel the need to write a post denouncing gullibility? After all, most people kind of already think being gullible is bad - who exactly is he arguing against here?
Well, recall that he wrote the post in response to the notion that people should believe arguments and take ideas seriously. These sound like good, LW-approved ideas, but note that unless you're already exceptionally smart or exceptionally well-informed, believing arguments and taking ideas seriously is tantamount to...well, to being gullible. In fact, you could probably think of gullibility as a kind of extreme and pathological form of lightness; a willingness to be swept away by the winds of evidence, no matter how strong (or weak) they may be.
There seems to be some tension here. On the one hand we have an intuitive belief that gullibility is bad; that the proper response to any new claim should be skepticism. But on the other hand we also have some epistemic norms here at LW that are - well, maybe they don't endorse being gullible, but they don't exactly not endorse it either. I'd say the LW memeplex is at least mildly friendly towards the notion that one should believe conclusions that come from convincing-sounding arguments, even if they seem absurd. A core tenet of LW is that we change our mind too little, not too much, and we're certainly all in favour of lightness as a virtue.
Anyway, I thought about this tension for a while and came to the conclusion that I had probably just lost sight of my purpose. The goal of (epistemic) rationality isn't to not be gullible or not be skeptical - the goal is to form correct beliefs, full stop. Terms like gullibility and skepticism are useful to the extent that people tend to be systematically overly accepting or dismissive of new arguments - individual beliefs themselves are simply either right or wrong. So, for example, if we do studies and find out that people tend to accept new ideas too easily on average, then we can write posts explaining why we should all be less gullible, and give tips on how to accomplish this. And if on the other hand it turns out that people actually accept far too few new ideas on average, then we can start talking about how we're all much too skeptical and how we can combat that. But in the end, in terms of becoming less wrong, there's no sense in which gullibility would be intrinsically better or worse than skepticism - they're both just words we use to describe deviations from the ideal, which is accepting only true ideas and rejecting only false ones.
This answer basically wrapped the matter up to my satisfaction, and resolved the sense of tension I was feeling. But afterwards I was left with an additional interesting thought: might gullibility be, if not a desirable end point, then an easier starting point on the path to rationality?
That is: no one should aspire to be gullible, obviously. That would be aspiring towards imperfection. But if you were setting out on a journey to become more rational, and you were forced to choose between starting off too gullible or too skeptical, could gullibility be an easier initial condition?
I think it might be. It strikes me that if you start off too gullible you begin with an important skill: you already know how to change your mind. In fact, changing your mind is in some ways your default setting if you're gullible. And considering that like half the freakin sequences were devoted to learning how to actually change your mind, starting off with some practice in that department could be a very good thing.
I consider myself to be...well, maybe not more gullible than average in absolute terms - I don't get sucked into pyramid scams or send money to Nigerian princes or anything like that. But I'm probably more gullible than average for my intelligence level. There's an old discussion post I wrote a few years back that serves as a perfect demonstration of this (I won't link to it out of embarrassment, but I'm sure you could find it if you looked). And again, this isn't a good thing - to the extent that I'm overly gullible, I aspire to become less gullible (Tsuyoku Naritai!). I'm not trying to excuse any of my past behaviour. But when I look back on my still-ongoing journey towards rationality, I can see that my ability to abandon old ideas at the (relative) drop of a hat has been tremendously useful so far, and I do attribute that ability in part to years of practice at...well, at believing things that people told me, and sometimes gullibly believing things that people told me. Call it epistemic deferentiality, or something - the tacit belief that other people know better than you (especially if they're speaking confidently) and that you should listen to them. It's certainly not a character trait you're going to want to keep as a rationalist, and I'm still trying to do what I can to get rid of it - but as a starting point? You could do worse I think.
Now, I don't pretend that the above is anything more than a plausibility argument, and maybe not a strong one at that. For one I'm not sure how well this idea carves reality at its joints - after all, gullibility isn't quite the same thing as lightness, even if they're closely related. For another, if the above were true, you would probably expect LWer's to be more gullible than average. But that doesn't seem quite right - while LW is admirably willing to engage with new ideas, no matter how absurd they might seem, the default attitude towards a new idea on this site is still one of intense skepticism. Post something half-baked on LW and you will be torn to shreds. Which is great, of course, and I wouldn't have it any other way - but it doesn't really sound like the behaviour of a website full of gullible people.
(Of course, on the other hand it could be that LWer's really are more gullible than average, but they're just smart enough to compensate for it)
Anyway, I'm not sure what to make of this idea, but it seemed interesting and worth a discussion post at least. I'm curious to hear what people think: does any of the above ring true to you? How helpful do you think gullibility is, if it is at all? Can you be "light" without being gullible? And for the sake of collecting information: do you consider yourself to be more or less gullible than average for someone of your intelligence level?
View more: Next