Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Entropy and Temperature

26 spxtr 17 December 2014 08:04AM

Eliezer Yudkowsky previously wrote (6 years ago!) about the second law of thermodynamics. Many commenters were skeptical about the statement, "if you know the positions and momenta of every particle in a glass of water, it is at absolute zero temperature," because they don't know what temperature is. This is a common confusion.


To specify the precise state of a classical system, you need to know its location in phase space. For a bunch of helium atoms whizzing around in a box, phase space is the position and momentum of each helium atom. For N atoms in the box, that means 6N numbers to completely specify the system.

Lets say you know the total energy of the gas, but nothing else. It will be the case that a fantastically huge number of points in phase space will be consistent with that energy.* In the absence of any more information it is correct to assign a uniform distribution to this region of phase space. The entropy of a uniform distribution is the logarithm of the number of points, so that's that. If you also know the volume, then the number of points in phase space consistent with both the energy and volume is necessarily smaller, so the entropy is smaller.

This might be confusing to chemists, since they memorized a formula for the entropy of an ideal gas, and it's ostensibly objective. Someone with perfect knowledge of the system will calculate the same number on the right side of that equation, but to them, that number isn't the entropy. It's the entropy of the gas if you know nothing more than energy, volume, and number of particles.


The existence of temperature follows from the zeroth and second laws of thermodynamics: thermal equilibrium is transitive, and entropy is maximum in equilibrium. Temperature is then defined as the thermodynamic quantity that is the shared by systems in equilibrium.

If two systems are in equilibrium then they cannot increase entropy by flowing energy from one to the other. That means that if we flow a tiny bit of energy from one to the other (δU1 = -δU2), the entropy change in the first must be the opposite of the entropy change of the second (δS1 = -δS2), so that the total entropy (S1 + S2) doesn't change. For systems in equilibrium, this leads to (∂S1/∂U1) = (∂S2/∂U2). Define 1/T = (∂S/∂U), and we are done.

Temperature is sometimes taught as, "a measure of the average kinetic energy of the particles," because for an ideal gas U/= (3/2) kBT. This is wrong, for the same reason that the ideal gas entropy isn't the definition of entropy.

Probability is in the mind. Entropy is a function of probabilities, so entropy is in the mind. Temperature is a derivative of entropy, so temperature is in the mind.

Second Law Trickery

With perfect knowledge of a system, it is possible to extract all of its energy as work. EY states it clearly:

So (again ignoring quantum effects for the moment), if you know the states of all the molecules in a glass of hot water, it is cold in a genuinely thermodynamic sense: you can take electricity out of it and leave behind an ice cube.

Someone who doesn't know the state of the water will observe a violation of the second law. This is allowed. Let that sink in for a minute. Jaynes calls it second law trickery, and I can't explain it better than he does, so I won't try:

A physical system always has more macroscopic degrees of freedom beyond what we control or observe, and by manipulating them a trickster can always make us see an apparent violation of the second law.

Therefore the correct statement of the second law is not that an entropy decrease is impossible in principle, or even improbable; rather that it cannot be achieved reproducibly by manipulating the macrovariables {X1, ..., Xn} that we have chosen to define our macrostate. Any attempt to write a stronger law than this will put one at the mercy of a trickster, who can produce a violation of it.

But recognizing this should increase rather than decrease our confidence in the future of the second law, because it means that if an experimenter ever sees an apparent violation, then instead of issuing a sensational announcement, it will be more prudent to search for that unobserved degree of freedom. That is, the connection of entropy with information works both ways; seeing an apparent decrease of entropy signifies ignorance of what were the relevant macrovariables.


I've actually given you enough information on statistical mechanics to calculate an interesting system. Say you have N particles, each fixed in place to a lattice. Each particle can be in one of two states, with energies 0 and ε. Calculate and plot the entropy if you know the total energy: S(E), and then the energy as a function of temperature: E(T). This is essentially a combinatorics problem, and you may assume that N is large, so use Stirling's approximation. What you will discover should make sense using the correct definitions of entropy and temperature.

*: How many combinations of 1023 numbers between 0 and 10 add up to 5×1023?

How many people am I?

4 Manfred 15 December 2014 06:11PM

Strongly related: the Ebborians

Imagine mapping my brain into two interpenetrating networks. For each brain cell, half of it goes to one map and half to the other. For each connection between cells, half of each connection goes to one map and half to the other. We can call these two mapped out halves Manfred One and Manfred Two. Because neurons are classical, as I think, both of these maps change together. They contain the full pattern of my thoughts. (This situation is even more clear in the Ebborians, who can literally split down the middle.)

So how many people am I? Are Manfred One and Manfred Two both people? Of course, once we have two, why stop there - are there thousands of Manfreds in here, with "me" as only one of them? Put like that it sounds a little overwrought - what's really going on here is the question of what physical system corresponds to "I" in english statements like "I wake up." This may matter.

The impact on anthropic probabilities is somewhat straightforward. With everyday definitions of "I wake up," I wake up just once per day no matter how big my head is. But if the "I" in that sentence is some constant-size physical pattern, then "I wake up" is an event that happens more times if my head is bigger. And so using the variable people-number definition, I expect to wake up with a gigantic head.

The impact on decisions is less big. If I'm in this head with a bunch of other Manfreds, we're all on the same page - it's a non-anthropic problem of coordinated decision-making. For example, if I were to make any monetary bets about my head size, and then donate profits to charity, no matter what definition I'm using, I should bet as if my head size didn't affect anthropic probabilities. So to some extent the real point of this effect is that it is a way anthropic probabilities can be ill-defined. On the other hand, what about preferences that depend directly on person-numbers like how to value people with different head sizes? Or for vegetarians, should we care more about cows than chickens, because each cow is more animals than a chicken is?


According to my common sense, it seems like my body has just one person in it. Why does my common sense think that? I think there are two answers, one unhelpful and one helpful.

The first answer is evolution. Having kids is an action that's independent of what physical system we identify with "I," and so my ancestors never found modeling their bodies as being multiple people useful.

The second answer is causality. Manfred One and Manfred Two are causally distinct from two copies of me in separate bodies but the same input/output. If a difference between the two separated copies arose somehow, (reminiscent of Dennett's factual account) henceforth the two bodies would do and say different things and have different brain states. But if some difference arises between Manfred One and Manfred Two, it is erased by diffusion.

Which is to say, the map that is Manfred One is statically the same pattern as my whole brain, but it's causally different. So is "I" the pattern, or is "I" the causal system? 

In this sort of situation I am happy to stick with common sense, and thus when I say me, I think the causal system is referring to the causal system. But I'm not very sure.


Going back to the Ebborians, one interesting thing about that post is the conflict between common sense and common sense - it seems like common sense that each Ebborian is equally much one person, but it also seems like common sense that if you looked at an Ebborian dividing, there doesn't seem to be a moment where the amount of subjective experience should change, and so amount of subjective experience should be proportional to thickness. But as it is said, just because there are two opposing ideas doesn't mean one of them is right.

On the questions of subjective experience raised in that post, I think this mostly gets cleared up by precise description an  anthropic narrowness. I'm unsure of the relative sizes of this margin and the proof, but the sketch is to replace a mysterious "subjective experience" that spans copies with individual experiences of people who are using a TDT-like theory to choose so that they individually achieve good outcomes given their existence.

Estimating the cost-effectiveness of research

16 owencb 11 December 2014 11:55AM

At a societal level, how much money should we put into medical research, or into fusion research? For individual donors seeking out the best opportunities, how can we compare the expected cost-effectiveness of research projects with more direct interventions?

Over the past few months I've been researching this area for the Global Priorities Project. We've written a variety of articles which focus on different parts of the question. Estimating the cost-effectiveness of research is the central example here, but a lot of the methodology is also applicable to other one-off projects with unknown difficulty (perhaps including political lobbying). I don't think it's all solved, but I do think we've made substantial progress.

I think people here might be interested, so I wanted to share our work. To help you navigate and find the most appropriate pieces, here I collect them, summarise what's contained in each, and explain how they fit together.

  • I gave an overview of my thinking at the Good Done Right conference, held in Oxford in July 2014. The slides and audio of my talk are available; I have developed more sophisticated models for some parts of the area since then.
  • How to treat problems of unknown difficulty introduces the problem: we need to make decisions about when to work more on problems such as research into fusion where we don't know how difficult it will be. It builds some models which allow principled reasoning about how we should act. These models are quite crude but easy to work with: they are intended to lower the bar for Fermi estimates and similar, and provide a starting point for building more sophisticated models.
  • Estimating cost-effectiveness for problems of unknown difficulty picks up from the models in the above post, and asks what they mean for the expected cost-effectiveness of work on the problems. This involves building a model of the counterfactual impact, as solvable research problems are likely to be solved eventually, so the main effect is to move the solution forwards. This post includes several explicit formulae that you can use to produce estimates; it also explains analogies between the explicit model we derive and the qualitative 'three factor' model that GiveWell and 80,000 Hours have used for cause selection.
  • Estimating the cost-effectiveness of research into neglected diseases is an investigation by Max Dalton, which uses the techniques for estimating cost-effectiveness to provide ballpark figures for how valuable we should expect research into vaccines or treatments for neglected diseases to be. The estimates suggest that, if carefully targeted, such research could be more cost-effective than the best direct health interventions currently available for funding.
  • The law of logarithmic returns discusses the question of returns to resources into a field rather than on a single question. With some examples, it suggests that as a first approximation it is often reasonable to assume that diminishing marginal returns take a logarithmic form.
  • Theory behind logarithmic returns explains how some simple generating mechanisms can produce roughly logarithmic returns. This is a complement to the above article: we think having both empirical and theoretical justification for the rule helps us to have higher confidence in it, and to better understand when it's appropriate to generalise to new contexts. In this piece I also highlight areas for further research on the theoretical side, into when the approximation will break down, and what we might want to use instead in these cases.
  • How valuable is medical research? written with Giving What We Can, applies the logarithmic returns model together with counterfactual reasoning to produce an estimate for the cost-effectiveness of medical research as a whole.

I've also made a thread in LessWrong Discussion for people to discuss applications of one of the simpler versions of the cost-effectiveness models, to get Fermi estimates for the value of different areas.

[link] The Philosophy of Intelligence Explosions and Advanced Robotics

12 Kaj_Sotala 02 December 2014 03:44AM

The philosopher John Danaher has posted a list of all the posts that he's written on the topic of robotics and AI. Below is the current version of the list: he says that he will keep updating the page as he writes more.

  • The Singularity: Overview and Framework: This was my first attempt to provide a general overview and framework for understanding the debate about the technological singularity. I suggested that the debate could be organised around three main theses: (i) the explosion thesis -- which claims that there will be an intelligence explosion; (ii) the unfriendliness thesis -- which claims that an advanced artificial intelligence is likely to be "unfriendly"; and (iii) the inevitability thesis -- which claims that the creation of an unfriendly AI will be difficult to avoid, if not inevitable.
  • The Singularity: Overview and Framework Redux: This was my second attempt to provide a general overview and framework for understanding the debate about the technological singularity. I tried to reduce the framework down to two main theses: (i) the explosion thesis and (ii) the unfriendliness thesis.
  • AIs and the Decisive Advantage Thesis: Many people claim that an advanced artificial intelligence would have decisive advantages over human intelligences. Is this right? In this post, I look at Kaj Sotala's argument to that effect.
  • Is there a case for robot slaves? - If robots can be persons -- in the morally thick sense of "person" -- then surely it would be wrong to make them cater to our every whim? Or would it? Steve Petersen argues that the creation of robot slaves might be morally permissible. In this post, I look at what he has to say.
  • The Ethics of Robot Sex: A reasonably self-explanatory title. This post looks at the ethical issues that might arise from the creation of sex robots.
  • Will sex workers be replaced by robots? A Precis: A short summary of a longer article examining the possibility of sex workers being replaced by robots. Contrary to the work of others, I suggest that sex work might be resilient to the phenomenon of technological unemployment.
  • Bostrom on Superintelligence (2) The Instrumental Convergence Thesis: The second part in my series on Bostrom's book. This one examines the instrumental convergence thesis, according to which an intelligent agent, no matter what its final goals may be, is likely to converge upon certain instrumental goals that are unfriendly to human beings.

Superintelligence 12: Malignant failure modes

7 KatjaGrace 02 December 2014 02:02AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twelfth section in the reading guideMalignant failure modes

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: 'Malignant failure modes' from Chapter 8


  1. Malignant failure mode: a failure that involves human extinction; in contrast with many failure modes where the AI doesn't do much. 
  2. Features of malignant failures 
    1. We don't get a second try 
    2. It supposes we have a great deal of success, i.e. enough to make an unprecedentedly competent agent
  3. Some malignant failures:
    1. Perverse instantiation: the AI does what you ask, but what you ask turns out to be most satisfiable in unforeseen and destructive ways.
      1. Example: you ask the AI to make people smile, and it intervenes on their facial muscles or neurochemicals, instead of via their happiness, and in particular via the bits of the world that usually make them happy.
      2. Possible counterargument: if it's so smart, won't it know what we meant? Answer: Yes, it knows, but it's goal is to make you smile, not to do what you meant when you programmed that goal.
      3. AI which can manipulate its own mind easily is at risk of 'wireheading' - that is, a goal of maximizing a reward signal might be perversely instantiated by just manipulating the signal directly. In general, animals can be motivated to do outside things to achieve internal states, however AI with sufficient access to internal state can do this more easily by manipulating internal state. 
      4. Even if we think a goal looks good, we should fear it has perverse instantiations that we haven't appreciated.
    2. Infrastructure profusion: in pursuit of some goal, an AI redirects most resources to infrastructure, at our expense.
      1. Even apparently self-limiting goals can lead to infrastructure profusion. For instance, to an agent whose only goal is to make ten paperclips, once it has apparently made ten paperclips it is always more valuable to try to become more certain that there are really ten paperclips than it is to just stop doing anything.
      2. Examples: Riemann hypothesis catastrophe, paperclip maximizing AI
    3. Mind crime: AI contains morally relevant computations, and treats them badly
      1. Example: AI simulates humans in its mind, for the purpose of learning about human psychology, then quickly destroys them.
      2. Other reasons for simulating morally relevant creatures:
        1. Blackmail
        2. Creating indexical uncertainty in outside creatures

Another view

In this chapter Bostrom discussed the difficulty he perceives in designing goals that don't lead to indefinite resource acquisition. Steven Pinker recently offered a different perspective on the inevitability of resource acquisition:

...The other problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems. It’s telling that many of our techno-prophets can’t entertain the possibility that artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no burning desire to annihilate innocents or dominate the civilization.

Of course we can imagine an evil genius who deliberately designed, built, and released a battalion of robots to sow mass destruction.  But we should keep in mind the chain of probabilities that would have to multiply out before it would be a reality. A Dr. Evil would have to arise with the combination of a thirst for pointless mass murder and a genius for technological innovation. He would have to recruit and manage a team of co-conspirators that exercised perfect secrecy, loyalty, and competence. And the operation would have to survive the hazards of detection, betrayal, stings, blunders, and bad luck. In theory it could happen, but I think we have more pressing things to worry about. 


1. Perverse instantiation is a very old idea. It is what genies are most famous for. King Midas had similar problems. Apparently it was applied to AI by 1947, in With Folded Hands.

2. Adam Elga writes more on simulating people for blackmail and indexical uncertainty.

3. More directions for making AI which don't lead to infrastructure profusion:

  • Some kinds of preferences don't lend themselves to ambitious investments. Anna Salamon talks about risk averse preferences. Short time horizons and goals which are cheap to fulfil should also make long term investments in infrastructure or intelligence augmentation less valuable, compared to direct work on the problem at hand.
  • Oracle and tool AIs are intended to not be goal-directed, but as far as I know it is an open question whether this makes sense. We will get to these later in the book.

5. Often when systems break, or we make errors in them, they don't work at all. Sometimes, they fail more subtly, working well in some sense, but leading us to an undesirable outcome, for instance a malignant failure mode. How can you tell whether a poorly designed AI is likely to just not work, vs. accidentally take over the world? An important consideration for systems in general seems to be the level of abstraction at which the error occurs. We try to build systems so that you can just interact with them at a relatively abstract level, without knowing how the parts work. For instance, you can interact with your GPS by typing places into it, then listening to it, and you don't need to know anything about how it works. If you make an error while up writing your address into the GPS, it will fail by taking you to the wrong place, but it will still direct you there fairly well. If you fail by putting the wires inside the GPS into the wrong places the GPS is more likely to just not work. 

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Are there better ways to specify 'limited' goals? For instance, to ask for ten paperclips without asking for the universe to be devoted to slightly improving the probability of success?
  2. In what circumstances could you be confident that the goals you have given an AI do not permit perverse instantiations? 
  3. Explore possibilities for malignant failure vs. other failures. If we fail, is it actually probable that we will have enough 'success' for our creation to take over the world?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about capability control methods, section 13. To prepare, read “Two agency problems” and “Capability control methods” from Chapter 9The discussion will go live at 6pm Pacific time next Monday December 8. Sign up to be notified here.

Integral versus differential ethics

7 Stuart_Armstrong 01 December 2014 06:04PM

In population ethics...

Most people start out believing that the following are true:

  1. That adding more happy lives is a net positive.
  2. That redistributing happiness more fairly is not a net negative.
  3. That the repugnant conclusion is indeed repugnant.

Some will baulk on the first statement on equality grounds, but most people should accept those three statements as presented. Then they find out about the mere addition paradox.

Someone who then accepts the repugnant could then reason something like this:

Adding happy people and redistributing fairly happiness, if done many, many times, in the way described above, will result in a repugnant conclusion. Each step along the way seems solid, but the conclusion seems wrong. Therefore I will accept the repugnant conclusion, not on its own merits, but because each step is clearly intuitively correct.

Call this the "differential" (or local) way or reasoning about population ethics. As long as each small change seems intuitively an improvement, then the global change must also be.

Adding happy people and redistributing fairly happiness, if done many, many times, in the way described above, will result in a repugnant conclusion. Each step along the way seems solid, but the conclusion seems wrong. Therefore I will reject (at least) one step, not on its own merits, but because the conclusion is clearly intuitively incorrect.

Call this the "integral" (or global) way of reasoning about population ethics. As long as the overall change seems intuitively a deterioration, then some of the small changes along the way must also be.


In general...

Now, I personally tend towards integral rather than differential reasoning on this particular topic. However, I want to make a more general point: philosophy may be over dedicated to differential reasoning. Mainly because it's easy: you can take things apart, simplify them, abstract details away, and appeal to simple principles - and avoid many potential biases along the way.

But it's also a very destructive tool to use in areas where concepts are unclear and cannot easily be made clear. Take the statement "human life is valuable". This can be taken apart quite easily, critiqued from all directions, its lack of easily described meaning its weakness. Nevertheless, integral reasoning is almost always applied: something called "human life" is taken to be "valuable", and many caveats and subdefinitions can be added to these terms without changing the fundamental (integral) acceptance of the statement. If we followed the differential approach, we might end up with the definition of "human life" as "energy exchange across a neurone cell membrane" or something equally ridiculous but much more rigorous.

Now, that example is a parody... but only because no-one sensible does that, we know that we'd lose too much value from that kind of definition. We want to build an extensive/integral definition of life, using our analysis to add clarity rather than simplify to a few core underlying concepts. But in population ethics and many other cases, we do feel free to use differential ethics, replacing vague overarching concepts with clear simplified versions that clearly throw away a lot of the initial concept.

Maybe we do it too much. To pick an example I disagree with (always a good habit), maybe there is such a thing as "society", for instance, not simply the total of individuals and their interactions. You can already use pretty crude consequentialist arguments with "societies" as agents subject to predictable actions and reactions (social science does it all the time), but what if we tried to build a rigorous definition of society as something morally valuable, rather than focusing on individual?

Anyway, we should be aware when, in arguments, we are keeping the broad goal and making the small steps and definitions conform to it, and when we are focusing on the small steps and definitions and following them wherever they lead.

When the uncertainty about the model is higher than the uncertainty in the model

17 Stuart_Armstrong 28 November 2014 06:12PM

Most models attempting to estimate or predict some elements of the world, will come with their own estimates of uncertainty. It could be the Standard Model of physics predicting the mass of the Z boson as 91.1874 ± 0.0021 GeV, or the rather wider uncertainty ranges of economic predictions.

In many cases, though, the uncertainties in or about the model dwarf the estimated uncertainty in the model itself - especially for low probability events. This is a problem, because people working with models often try to use the in-model uncertainty and adjust it to get an estimate of the true uncertainty. They often realise the model is unreliable, but don't have a better one, and they have a measure of uncertainty already, so surely doubling and tripling this should do the trick? Surely...

The following three cases are going to be my go-to examples for showing what a mistake this can be; they cover three situations: extreme error, being in the domain of a hard science, and extreme negative impact.

continue reading »

You have a set amount of "weirdness points". Spend them wisely.

43 peter_hurford 27 November 2014 09:09PM

I've heard of the concept of "weirdness points" many times before, but after a bit of searching I can't find a definitive post describing the concept, so I've decided to make one.  As a disclaimer, I don't think the evidence backing this post is all that strong and I am skeptical, but I do think it's strong enough to be worth considering, and I'm probably going to make some minor life changes based on it.


Chances are that if you're reading this post, you're probably a bit weird in some way.

No offense, of course.  In fact, I actually mean it as a compliment.  Weirdness is incredibly important.  If people weren't willing to deviate from society and hold weird beliefs, we wouldn't have had the important social movements that ended slavery and pushed back against racism, that created democracy, that expanded social roles for women, and that made the world a better place in numerous other ways.

Many things we take for granted now as why our current society as great were once... weird.


Joseph Overton theorized that policy develops through six stagesunthinkable, then radical, then acceptable, then sensible, then popular, then actual policy.  We could see this happen with many policies -- currently same-sex marriage is making its way from popular to actual policy, but not to long ago it was merely acceptable, and not too long before that it was pretty radical.

Some good ideas are currently in the radical range.  Effective altruism itself is such a collection of beliefs typical people would consider pretty radical.  Many people think donating 3% of their income is a lot, let alone the 10% demand that Giving What We Can places, or the 50%+ that some people in the community do.

And that's not all.  Others would suggest that everyone become vegetarian, advocating for open borders and/or universal basic income, theabolishment of gendered language, having more resources into mitigating existential riskfocusing on research into Friendly AIcryonicsand curing death, etc.

While many of these ideas might make the world a better place if made into policy, all of these ideas are pretty weird.


Weirdness, of course, is a drawback.  People take weird opinions less seriously.

The absurdity heuristic is a real bias that people -- even you -- have.  If an idea sounds weird to you, you're less likely to try and believe it,even if there's overwhelming evidence.  And social proof matters -- if less people believe something, people will be less likely to believe it.  Lastly, don't forget the halo effect -- if one part of you seems weird, the rest of you will seem weird too!

(Update: apparently this concept is, itself, already known to social psychology as idiosyncrasy credits.  Thanks, Mr. Commenter!)

...But we can use this knowledge to our advantage.  The halo effect can work in reverse -- if we're normal in many ways, our weird beliefs will seem more normal too.  If we have a notion of weirdness as a kind of currency that we have a limited supply of, we can spend it wisely, without looking like a crank.


All of this leads to the following actionable principles:

Recognize you only have a few "weirdness points" to spend.  Trying to convince all your friends to donate 50% of their income to MIRI, become a vegan, get a cryonics plan, and demand open borders will be met with a lot of resistance.   But -- I hypothesize -- that if you pick one of these ideas and push it, you'll have a lot more success.

Spend your weirdness points effectively.  Perhaps it's really important that people advocate for open borders.  But, perhaps, getting people to donate to developing world health would overall do more good.  In that case, I'd focus on moving donations to the developing world and leave open borders alone, even though it is really important.  You should triage your weirdness effectively the same way you would triage your donations.

Clean up and look good.  Lookism is a problem in society, and I wish people could look "weird" and still be socially acceptable.  But if you're a guy wearing a dress in public, or some punk rocker vegan advocate, recognize that you're spending your weirdness points fighting lookism, which means less weirdness points to spend promoting veganism or something else.

Advocate for more "normal" policies that are almost as good.   Of course, allocating your "weirdness points" on a few issues doesn't mean you have to stop advocating for other important issues -- just consider being less weird about it.  Perhaps universal basic income truly would be a very effective policy to help the poor in the United States.  But reforming the earned income tax credit and relaxing zoning laws would also both do a lot to help the poor in the US, and such suggestions aren't weird.

Use the foot-in-door technique and the door-in-face technique.  The foot-in-door technique involves starting with a small ask and gradually building up the ask, such as suggesting people donate a little bit effectively, and then gradually get them to take the Giving What We Can Pledge.  The door-in-face technique involves making a big ask (e.g., join Giving What We Can) and then substituting it for a smaller ask, like the Life You Can Save pledge or Try Out Giving.

Reconsider effective altruism's clustering of beliefs.  Right now, effective altruism is associated strongly with donating a lot of money and donating effectively, less strongly with impact in career choice, veganism, and existential risk.  Of course, I'm not saying that we should drop some of these memes completely.  But maybe EA should disconnect a bit more and compartmentalize -- for example, leaving AI risk to MIRI, for example, and not talk about it much, say, on 80,000 Hours.  And maybe instead of asking people to both give more AND give more effectively, we could focus more exclusively on asking people to donate what they already do more effectively.

Evaluate the above with more research.  While I think the evidence base behind this is decent, it's not great and I haven't spent that much time developing it.  I think we should look into this more with a review of the relevant literature and some careful, targeted, market research on the individual beliefs within effective altruism (how weird are they?) and how they should be connected or left disconnected.  Maybe this has already been done some?


Also discussed on the EA Forum and EA Facebook group.

The Hostile Arguer

31 Error 27 November 2014 12:30AM

“Your instinct is to talk your way out of the situation, but that is an instinct born of prior interactions with reasonable people of good faith, and inapplicable to this interaction…” Ken White

One of the Less Wrong Study Hall denizens has been having a bit of an issue recently. He became an atheist some time ago. His family was in denial about it for a while, but in recent days they have 1. stopped with the denial bit, and 2. been less than understanding about it. In the course of discussing the issue during break, this line jumped out at me:

“I can defend my views fine enough, just not to my parents.”

And I thought: Well, of course you can’t, because they’re not interested in your views. At all.

I never had to deal with the Religion Argument with my parents, but I did spend my fair share of time failing to argumentatively defend myself. I think I have some useful things to say to those younger and less the-hell-out-of-the-house than me.

A clever arguer is someone that has already decided on their conclusion and is making the best case they possibly can for it. A clever arguer is not necessarily interested in what you currently believe; they are arguing for proposition A and against proposition B. But there is a specific sort of clever arguer, one that I have difficulty defining explicitly but can characterize fairly easily. I call it, as of today, the Hostile Arguer.

It looks something like this:

When your theist parents ask you, “What? Why would you believe that?! We should talk about this,” they do not actually want to know why you believe anything, despite the form of the question. There is no genuine curiosity there. They are instead looking for ammunition. Which, if they are cleverer arguers than you, you are likely to provide. Unless you are epistemically perfect, you believe things that you cannot, on demand, come up with an explicit defense for. Even important things.

In accepting that the onus is solely on you to defend your position – which is what you are implicitly doing, in engaging the question – you are putting yourself at a disadvantage. That is the real point of the question: to bait you into an argument that your interlocutor knows you will lose, whereupon they will expect you to acknowledge defeat and toe the line they define.

Someone in the chat compared this to politics, which makes sense, but I don’t think it’s the best comparison. Politicians usually meet each other as equals. So do debate teams. This is more like a cop asking a suspect where they were on the night of X, or an employer asking a job candidate how much they made at their last job. Answering can hurt you, but can never help you. The question is inherently a trap.

The central characteristic of a hostile arguer is the insincere question. “Why do you believe there is/isn’t a God?” may be genuine curiosity from an impartial friend, or righteous fury from a zealous authority, even though the words themselves are the same. What separates them is the response to answers. The curious friend updates their model of you with your answers; the Hostile Arguer instead updates their battle plan.[1]

So, what do you do about it?

Advice often fails to generalize, so take this with a grain of salt. It seems to me that argument in this sense has at least some of the characteristics of the Prisoner’s Dilemma. Cooperation represents the pursuit of mutual understanding; defection represents the pursuit of victory in debate. Once you are aware that they are defecting, cooperating in return is highly non-optimal. On the other hand, mutual defection – a flamewar online, perhaps, or a big fight in real life in which neither party learns much of anything except how to be pissed off – kind of sucks, too. Especially if you have reason to care, on a personal level, about your opponent. If they’re family, you probably do.

It seems to me that getting out of the game is the way to go, if you can do it.

Never try to defend a proposition against a hostile arguer.[2] They do not care. Your best arguments will fall on deaf ears. Your worst will be picked apart by people who are much better at this than you. Your insecurities will be exploited. If they have direct power over you, it will be abused.

This is especially true for parents, where obstinate disagreement can be viewed as disrespect, and where their power over you is close to absolute. I’m sort of of the opinion that all parents should be considered epistemically hostile until one moves out, as a practical application of the SNAFU Principle. If you find yourself wanting to acknowledge defeat in order to avoid imminent punishment, this is what is going on.

If you have some disagreement important enough for this advice to be relevant, you probably genuinely care about what you believe, and you probably genuinely want to be understood. On some level, you want the other party to “see things your way.” So my second piece of advice is this: Accept that they won’t, and especially accept that it will not happen as a result of anything you say in an argument. If you must explain yourself, write a blog or something and point them to it a few years later. If it’s a religious argument, maybe write the Atheist Sequences. Or the Theist Sequences, if that’s your bent. But don’t let them make you defend yourself on the spot.

The previous point, incidentally, was my personal failure through most of my teenage years (although my difficulties stemmed from school, not religion). I really want to be understood, and I really approach discussion as a search for mutual understanding rather than an attempt at persuasion, by default. I expect most here do the same, which is one reason I feel so at home here. The failure mode I’m warning against is adopting this approach with people who will not respect it and will, in fact, punish your use of it.[3]

It takes two to have an argument, so don’t be the second party, ever, and they will eventually get tired of talking to a wall. You are not morally obliged to justify yourself to people who have pre-judged your justifications. You are not morally obliged to convince the unconvinceable. Silence is always an option. “No comment” also works well, if repeated enough times.

There is the possibility that the other party is able and willing to punish you for refusing to engage. Aside from promoting them from “treat as Hostile Arguer” to “treat as hostile, period”, I’m not sure what to do about this. Someone in the Hall suggested supplying random, irrelevant justifications, as requiring minimal cognitive load while still subverting the argument. I’m not certain how well that will work. It sounds plausible, but I suspect that if someone is running the algorithm “punish all responses that are not ‘yes, I agree and I am sorry and I will do or believe as you say’”, then you’re probably screwed (and should get out sooner rather than later if at all possible).

None of the above advice implies that you are right and they are wrong. You may still be incorrect on whatever factual matter the argument is about. The point I’m trying to make is that, in arguments of this form, the argument is not really about correctness. So if you care about correctness, don’t have it.

Above all, remember this: Tapping out is not just for Less Wrong.

(thanks to all LWSH people who offered suggestions on this post)

After reading the comments and thinking some more about this, I think I need to revise my position a bit. I’m really talking about three different characteristics here:

  1. People who have already made up their mind.
  2. People who are personally invested in making you believe as they do.
  3. People who have power over you.

For all three together, I think my advice still holds. MrMind puts it very concisely in the comments. In the absence of 3, though, JoshuaZ notes some good reasons one might argue anyway; to which I think one ought to add everything mentioned under the Fifth Virtue of Argument.

But one thing that ought not to be added to it is the hope of convincing the other party – either of your position, or of the proposition that you are not stupid or insane for holding it. These are cases where you are personally invested in what they believe, and all I can really say is “don’t do that; it will hurt.” Even if you are correct, you will fail for the reasons given above and more besides. It’s very much a case of Just Lose Hope Already.

  1. I’m using religious authorities harshing on atheists as the example here because that was the immediate cause of this post, but atheists take caution: If you’re asking someone “why do you believe in God?” with the primary intent of cutting their answer down, you’re guilty of this, too.  ↩

  2. Someone commenting on a draft of this post asked how to tell when you’re dealing with a Hostile Arguer. This is the sort of micro-social question that I’m not very good at and probably shouldn’t opine on. Suggestions requested in the comments.  ↩

  3. It occurs to me that the Gay Talk might have a lot in common with this as well. For those who’ve been on the wrong side of that: Did that also feel like a mismatched battle, with you trying to be understood, and them trying to break you down?  ↩

Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda

24 RobbBB 26 November 2014 11:02AM

Edge.org has recently been discussing "the myth of AI". Unfortunately, although Superintelligence is cited in the opening, most of the participants don't seem to have looked into Bostrom's arguments. (Luke has written a brief response to some of the misunderstandings Pinker and others exhibit.) The most interesting comment is Stuart Russell's, at the very bottom:

Of Myths and Moonshine

"We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief."

So wrote Leo Szilard, describing the events of March 3, 1939, when he demonstrated a neutron-induced uranium fission reaction. According to the historian Richard Rhodes, Szilard had the idea for a neutron-induced chain reaction on September 12, 1933, while crossing the road next to Russell Square in London. The previous day, Ernest Rutherford, a world authority on radioactivity, had given a "warning…to those who seek a source of power in the transmutation of atoms – such expectations are the merest moonshine."

Thus, the gap between authoritative statements of technological impossibility and the "miracle of understanding" (to borrow a phrase from Nathan Myhrvold) that renders the impossible possible may sometimes be measured not in centuries, as Rod Brooks suggests, but in hours.

None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore's law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world's information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.

This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius. AI research has been accelerating rapidly as pieces of the conceptual framework fall into place, the building blocks gain in size and strength, and commercial investment outstrips academic research activity. Senior AI researchers express noticeably more optimism about the field's prospects than was the case even a few years ago, and correspondingly greater concern about the potential risks.

No one in the field is calling for regulation of basic research; given the potential benefits of AI for humanity, that seems both infeasible and misdirected. The right response seems to be to change the goals of the field itself; instead of pure intelligence, we need to build intelligence that is provably aligned with human values. For practical reasons, we will need to solve the value alignment problem even for relatively unintelligent AI systems that operate in the human environment. There is cause for optimism, if we understand that this issue is an intrinsic part of AI, much as containment is an intrinsic part of modern nuclear fusion research. The world need not be headed for grief.

I'd quibble with a point or two, but this strikes me as an extraordinarily good introduction to the issue. I hope it gets reposted somewhere it can stand on its own.

Russell has previously written on this topic in Artificial Intelligence: A Modern Approach and the essays "The long-term future of AI," "Transcending complacency on superintelligent machines," and "An AI researcher enjoys watching his own execution." He's also been interviewed by GiveWell.

View more: Next