Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Superintelligence 11: The treacherous turn

6 KatjaGrace 25 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the 11th section in the reading guideThe treacherous turn. This corresponds to Chapter 8.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Existential catastrophe…” and “The treacherous turn” from Chapter 8


Summary

  1. The possibility of a first mover advantage + orthogonality thesis + convergent instrumental values suggests doom for humanity (p115-6)
    1. First mover advantage implies the AI is in a position to do what it wants
    2. Orthogonality thesis implies that what it wants could be all sorts of things
    3. Instrumental convergence thesis implies that regardless of its wants, it will try to acquire resources and eliminate threats
    4. Humans have resources and may be threats
    5. Therefore an AI in a position to do what it wants is likely to want to take our resources and eliminate us. i.e. doom for humanity.
  2. One kind of response: why wouldn't the makers of the AI be extremely careful not to develop and release dangerous AIs, or relatedly, why wouldn't someone else shut the whole thing down? (p116)
  3. It is hard to observe whether an AI is dangerous via its behavior at a time when you could turn it off, because AIs have convergent instrumental reasons to pretend to be safe, even if they are not. If they expect their minds to be surveilled, even observing their thoughts may not help. (p117)
  4. The treacherous turn: while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values. (p119)
  5. We might expect AIs to be more safe as they get smarter initially - when most of the risks come from crashing self-driving cars or mis-firing drones - then to get much less safe as they get too smart. (p117)
  6. One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
  7. The conception of deception: that moment when the AI realizes that it should conceal its thoughts (footnote 2, p282)

Another view

Danaher:

This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.

I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.

Notes

1. Danaher also made a nice diagram of the case for doom, and relationship with the treacherous turn:

 

2. History

According to Luke Muehlhauser's timeline of AI risk ideas, the treacherous turn idea for AIs has been around at least 1977, when a fictional worm did it:

1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.

 

3. The role of the premises

Bostrom's argument for doom has one premise that says AI could care about almost anything, then another that says regardless of what an AI cares about, it will do basically the same terrible things anyway. (p115) Do these sound a bit strange together to you? Why do we need the first, if final values don't tend to change instrumental goals anyway?

It seems the immediate reason is that an AI with values we like would not have the convergent goal of taking all our stuff and killing us. That is, the values we want an AI to have are some of those rare values that don't lead to destructive instrumental goals. Why is this? Because we (and thus the AI) care about the activites the resources would be grabbed from. If the resources were currently being used for anything we didn't care about, then our values would also suggest grabbing resources, and look similar to all of the other values. The difference that makes our values special here is just that most resources are already being used for them somewhat. 

4. Signaling

It is hard to tell apart a safe and an unsafe AI, because both would like to look safe. This is a very common problem in human interactions. For instance, it can be nontrivial to tell a genuine lover from a gold digger, a businessman from a conman, and an expert from a crank. All of them want to look like the desirable sort. Particularly similar to the AI case is that of hiring a new employee for a trial period. You will sometimes find that the employee's values are much better aligned during the trial period, and then they undergo a 'treacherous turn' once they have been hired more thoroughly.

'Costly signaling' is a general purpose solution to this problem, which works some of the time. The basic idea is this. Everyone has instrumental reasons to look like the good kind of person, but perhaps their reasons aren't exactly as strong as one other's, or the strength of their desire is harder to act on for one group than the other, so we can construct a set of options that will lead the different types of people to choose differently, even though they know this will set them apart. For instance, while an honest businessman and a conman would both like to say 'I'm an honest businessman', and to behave honestly if you watch them for five minutes, a conman would benefit less from expensive shopfront as an honest businessman, because his reputation is less valuable, so a brand is a signal of being honest.

Similarly, we could perhaps design situations where most harmful AIs would have a reasonable incentive to reveal themselves. This is difficult because so much is at stake, compared to the cost of the signal. But I haven't thought enough about it to rule it out, and I haven't seen anyone else do so either.

5. When would the 'conception of deception' take place?

Below the level of the best humans presumably, since we have already thought of all this.

6. Surveillance of the mind

Bostrom is not optimistic that AI researchers could protect much against risks by watching AI mental processes directly for signs of deception, rapid progress, or undesirable scheming. He says if the AI is smart it will realize this might happen, and hide the offending thoughts.

This seems an open question to me, for several reasons:

  1. Making your mental contents look innocuous while maintaining their semantic content sounds potentially very hard
  2. Especially for a creature which has only just become smart enough to realize it should treacherously turn
  3. From the AI's perspective, even if it is smart, surveillance could seem fairly unlikely, especially if we deceive it about its surroundings
As a consequence of 2, it seems better if the 'conception of deception' comes earlier.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How transparent are AI minds likely to be? Should we expect to be able to detect deception? What are the answers to these questions for different specific architectures and methods? This might be relevant.
  2. Are there other good ways to filter AIs with certain desirable goals from others? e.g. by offering them choices that would filter them.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'malignant failure modes' (as opposed presumably to worse failure modes). To prepare, read “Malignant failure modes” from Chapter 8The discussion will go live at 6pm Pacific time next Monday December 1. Sign up to be notified here.

Request for suggestions: ageing and data-mining

9 bokov 24 November 2014 11:38PM

Imagine you had the following at your disposal:

  • A Ph.D. in a biological science, with a fair amount of reading and wet-lab work under your belt on the topic of aging and longevity (but in hindsight, nothing that turned out to leverage any real mechanistic insights into aging).
  • A M.S. in statistics. Sadly, the non-Bayesian kind for the most part, but along the way acquired the meta-skills necessary to read and understand most quantitative papers with life-science applications.
  • Love of programming and data, the ability to learn most new computer languages in a couple of weeks, and at least 8 years spent hacking R code.
  • Research access to large amounts of anonymized patient data.
  • Optimistically, two decades remaining in which to make it all count.

Imagine that your goal were to slow or prevent biological aging...

  1. What would be the specific questions you would try to tackle first?
  2. What additional skills would you add to your toolkit?
  3. How would you allocate your limited time between the research questions in #1 and the acquisition of new skills in #2?

Thanks for your input.

 

Population ethics and utility indifference

3 Stuart_Armstrong 24 November 2014 03:18PM

It occurs to me that the various utility indifference approaches might be usable in population ethics.

One challenge for non-total utilitarians is how to deal with new beings. Some theories - average utilitarianism, for instance, or some other systems that use overall population utility - have no problem dealing with this. But many non-total utilitarians would like to see creating new beings as a strictly neutral act.

One way you could do this is by starting with a total utilitarian framework, but subtracting a certain amount of utility every time a new being B is brought into the world. In the spirit of utility indifference, we could subtract exactly the expected utility that we expect B to enjoy during their life.

This means that we should be indifferent as to whether B is brought into the world or not, but, once B is there, we should aim to increase B's utility. There are two problems with this. The first is that, strictly interpreted, we would also be indifferent to creating people with negative utility. This can be addressed by only doing the "utility correction" if B's expected utility is positive, thus preventing us from creating beings only to have them suffer.

The second problem is more serious. What about all the actions that we could do, ahead of time, in order to harm or benefit the new being? For instance, it would seem perverse to argue that buying a rattle for a child after they are born (or conceived) is an act of positive utility, whereas buying it before they were born (or conceived) would be a neutral act, since the increase in expected utility for the child is cancel out by the above process. Not only is it perverse, but it isn't timeless, and isn't stable under self modification.

continue reading »

Shop for Charity: how to earn proven charities 5% of your Amazon spending in commission

10 tog 24 November 2014 08:29AM

If you shop on Amazon in the countries listed below, you can earn a substantial commission for charity by doing so via the links below. This is a cost-free way to do a lot of good, so I'd encourage you to do so! You can bookmark one of the direct links to Amazon below and then use that bookmark every time you shop.

The commission will be at least 5%, varying by product category. This is substantially better than the AmazonSmile scheme available in the US, which only gives 0.5% of the money you spend to charity. It works through Amazon's 'Associates Program', which pays this commission for referring purchasers to them, from the unaltered purchase price (details here). It doesn't cost the purchaser anything. The money goes to Associates Program accounts owned by the EA non-profit Charity Science, money to which always gets regranted to GiveWell-recommended charities unless explicitly earmarked otherwise. For ease of administration and to get tax-deductibility, commission will get regranted to the Schistosomiasis Control Initiative until further notice.

Direct links to Amazon for your bookmarks

If you'd like to shop for charity, please bookmark the appropriate link below now:

 

From now through November 28: Black Friday Deals Week

Amazon's biggest cut price sale is this week. The links below take you to currently available deals:

Please share these links

I'll add other links on the main 'Shop for Charity' page later. I'd love to hear suggestions for good commission schemes in other countries. If you'd like to share these links with friends and family, please point them to this post or even better this project's main page.

Happy shopping!

'Shop for Charity' is a Charity Science project

Breaking the vicious cycle

31 XiXiDu 23 November 2014 06:25PM

You may know me as the guy who posts a lot of controversial stuff about LW and MIRI. I don't enjoy doing this and do not want to continue with it. One reason being that the debate is turning into a flame war. Another reason is that I noticed that it does affect my health negatively (e.g. my high blood pressure (I actually had a single-sided hearing loss over this xkcd comic on Friday)).

This all started in 2010 when I encountered something I perceived to be wrong. But the specifics are irrelevant for this post. The problem is that ever since that time there have been various reasons that made me feel forced to continue the controversy. Sometimes it was the urge to clarify what I wrote, other times I thought it was necessary to respond to a reply I got. What matters is that I couldn't stop. But I believe that this is now possible, given my health concerns.

One problem is that I don't want to leave possible misrepresentations behind. And there very likely exist misrepresentations. There are many reasons for this, but I can assure you that I never deliberately lied and that I never deliberately tried to misrepresent anyone. The main reason might be that I feel very easily overwhelmed and never had the ability to force myself to invest the time that is necessary to do something correctly if I don't really enjoy doing it (for the same reason I probably failed school). Which means that most comments and posts are written in a tearing hurry, akin to a reflexive retraction from the painful stimulus.

<tldr>

I hate this fight and want to end it once and for all. I don't expect you to take my word for it. So instead, here is an offer:

I am willing to post counterstatements, endorsed by MIRI, of any length and content[1] at the top of any of my blog posts. You can either post them in the comments below or send me an email (da [at] kruel.co).

</tldr>

I have no idea if MIRI believes this to be worthwhile. But I couldn't think of a better way to solve this dilemma in a way that everyone can live with happily. But I am open to suggestions that don't stress me too much (also about how to prove that I am trying to be honest).

You obviously don't need to read all my posts. It can also be a general statement.

I am also aware that LW and MIRI are bothered by RationalWiki. As you can easily check from the fossil record, I have at points tried to correct specific problems. But, for the reasons given above, I have problems investing the time to go through every sentence to find possible errors and attempt to correct it in such a way that the edit is not reverted and that people who feel offended are satisfied.

[1] There are obviously some caveats regarding the content, such as no nude photos of Yudkowsky ;-)

Open thread, Nov. 24 - Nov. 30, 2014

2 MrMind 24 November 2014 08:56AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

When should an Effective Altruist be vegetarian?

23 KatjaGrace 23 November 2014 05:25AM

Crossposted from Meteuphoric

I have lately noticed several people wondering why more Effective Altruists are not vegetarians. I am personally not a vegetarian because I don't think it is an effective way to be altruistic.

As far as I can tell the fact that many EAs are not vegetarians is surprising to some because they think 'animals are probably morally relevant' basically implies 'we shouldn't eat animals'. To my ear, this sounds about as absurd as if Givewell's explanation of their recommendation of SCI stopped after 'the developing world exists, or at least has a high probability of doing so'.

(By the way, I do get to a calculation at the bottom, after some speculation about why the calculation I think is appropriate is unlike what I take others' implicit calculations to be. Feel free to just scroll down and look at it).

I think this fairly large difference between my and many vegetarians' guesses at the value of vegetarianism arises because they think the relevant question is whether the suffering to the animal is worse than the pleasure to themselves at eating the animal. This question sounds superficially plausibly relevant, but I think on closer consideration you will agree that it is the wrong question.

The real question is not whether the cost to you is small, but whether you could do more good for the same small cost.

Similarly, when deciding whether to donate $5 to a random charity, the question is whether you could do more good by donating the money to the most effective charity you know of. Going vegetarian because it relieves the animals more than it hurts you is the equivalent of donating to a random developing world charity because it relieves the suffering of an impoverished child more than foregoing $5 increases your suffering.

Trading with inconvenience and displeasure

My imaginary vegetarian debate partner objects to this on grounds that vegetarianism is different from donating to ineffective charities, because to be a vegetarian you are spending effort and enjoying your life less rather than spending money, and you can't really reallocate that inconvenience and displeasure to, say, preventing artificial intelligence disaster or feeding the hungry, if don't use it on reading food labels and eating tofu. If I were to go ahead and eat the sausage instead - the concern goes - probably I would just go on with the rest of my life exactly the same, and a bunch of farm animals somewhere would be the worse for it, and I scarcely better.

I agree that if the meat eating decision were separated from everything else in this way, then the decision really would be about your welfare vs. the animal's welfare, and you should probably eat the tofu.

However whether you can trade being vegetarian for more effective sacrifices is largely a question of whether you choose to do so. And if vegetarianism is not the most effective way to inconvenience yourself, then it is clear that you should choose to do so. If you eat meat now in exchange for suffering some more effective annoyance at another time, you and the world can be better off.

Imagine an EA friend says to you that she gives substantial money to whatever random charity has put a tin in whatever shop she is in, because it's better than the donuts and new dresses she would buy otherwise. She doesn't see how not giving the money to the random charity would really cause her to give it to a better charity - empirically she would spend it on luxuries. What do you say to this?

If she were my friend, I might point out that the money isn't meant to magically move somewhere better - she may have to consciously direct it there. She might need to write down how much she was going to give to the random charity, then look at the note later for instance. Or she might do well to decide once and for all how much to give to charity and how much to spend on herself, and then stick to that. As an aside, I might also feel that she was using the term 'Effective Altruist' kind of broadly.

I see vegetarianism for the sake of not managing to trade inconveniences as quite similar. And in both cases you risk spending your life doing suboptimal things every time a suboptimal altruistic opportunity has a chance to steal resources from what would be your personal purse. This seems like something that your personal and altruistic values should cooperate in avoiding.

It is likely too expensive to keep track of an elaborate trading system, but you should at least be able to make reasonable long term arrangements. For instance, if instead of eating vegetarian you ate a bit frugally and saved and donated a few dollars per meal, you would probably do more good (see calculations lower in this post). So if frugal eating were similarly annoying, it would be better. Eating frugally is inconvenient in very similar ways to vegetarianism, so is a particularly plausible trade if you are skeptical that such trades can be made. I claim you could make very different trades though, for instance foregoing the pleasure of an extra five minute's break and working instead sometimes. Or you could decide once and for all how much annoyance to have, and then choose most worthwhile bits of annoyance, or put a dollar value on your own time and suffering and try to be consistent.

Nebulous life-worsening costs of vegetarianism

There is a separate psychological question which is often mixed up with the above issue. That is, whether making your life marginally less gratifying and more annoying in small ways will make you sufficiently less productive to undermine the good done by your sacrifice. This is not about whether you will do something a bit costly another time for the sake of altruism, but whether just spending your attention and happiness on vegetarianism will harm your other efforts to do good, and cause more harm than good.

I find this plausible in many cases, but I expect it to vary a lot by person. My mother seems to think it's basically free to eat supplements, whereas to me every additional daily routine seems to encumber my life and require me to spend disproportionately more time thinking about unimportant things. Some people find it hard to concentrate when unhappy, others don't. Some people struggle to feed themselves adequately at all, while others actively enjoy preparing food.

There are offsetting positives from vegetarianism which also vary across people. For instance there is the pleasure of self-sacrifice, the joy of being part of a proud and moralizing minority, and the absence of the horror of eating other beings. There are also perhaps health benefits, which probably don't vary that much by people, but people do vary in how big they think the health benefits are.

Another  way you might accidentally lose more value than you save is in spending little bits of time which are hard to measure or notice. For instance, vegetarianism means spending a bit more time searching for vegetarian alternatives, researching nutrition, buying supplements, writing emails back to people who invite you to dinner explaining your dietary restrictions, etc. The value of different people's time varies a lot, as does the extent to which an additional vegetarianism routine would tend to eat their time.

On a less psychological note, the potential drop in IQ (~5 points?!) from missing out on creatine is a particularly terrible example of vegetarianism making people less productive. Now that we know about creatine and can supplement it, creatine itself is not such an issue. An issue does remain though: is this an unlikely one-off failure, or should we worry about more such deficiency? (this goes for any kind of unusual diet, not just meat-free ones).

How much is avoiding meat worth?

Here is my own calculation of how much it costs to do the same amount of good as replacing one meat meal with one vegetarian meal. If you would be willing to pay this much extra to eat meat for one meal, then you should eat meat. If not, then you should abstain. For instance, if eating meat does $10 worth of harm, you should eat meat whenever you would hypothetically pay an extra $10 for the privilege.

This is a tentative calculation. I will probably update it if people offer substantially better numbers.

All quantities are in terms of social harm.

Eating 1 non-vegetarian meal

< eating 1 chickeny meal (I am told chickens are particularly bad animals to eat, due to their poor living conditions and large animal:meal ratio. The relatively small size of their brains might offset this, but I will conservatively give all animals the moral weight of humans in this calculation.)

< eating 200 calories of chicken (a McDonalds crispy chicken sandwich probably contains a bit over 100 calories of chicken (based on its listed protein content); a Chipotle chicken burrito contains around 180 calories of chicken)

= causing ~0.25 chicken lives (1 chicken is equivalent in price to 800 calories of chicken breast i.e. eating an additional 800 calories of chicken breast conservatively results in one additional chicken. Calculations from data here and here.)

< -$0.08 given to the Humane League (ACE estimates the Humane League spares 3.4 animal lives per dollar). However since the humane league basically convinces other people to be vegetarians, this may be hypocritical or otherwise dubious.

< causing 12.5 days of chicken life (broiler chickens are slaughtered at between 35-49 days of age)

= causing 12.5 days of chicken suffering (I'm being generous)

-$0.50 subsidizing free range eggs,  (This is a somewhat random example of the cost of more systematic efforts to improve animal welfare, rather than necessarily the best. The cost here is the cost of buying free range eggs and selling them as non-free range eggs. It costs about 2.6 2004 Euro cents [= US 4c in 2014] to pay for an egg to be free range instead of produced in a battery. This corresponds to a bit over one day of chicken life. I'm assuming here that the life of a battery egg-laying chicken is not substantially better than that of a meat chicken, and that free range chickens have lives that are at least neutral. If they are positive, the figure becomes even more favorable to the free range eggs).

< losing 12.5 days of high quality human life (assuming saving one year of human life is at least as good as stopping one year of an animal suffering, which you may disagree with.)

= -$1.94-5.49 spent on GiveWell's top charities (This was GiveWell's estimate for AMF if we assume saving a life corresponds to saving 52 years - roughly the life expectancy of children in Malawi. GiveWell doesn't recommend AMF at the moment, but they recommend charities they considered comparable to AMF when AMF had this value.

GiveWell employees' median estimate for the cost of 'saving a life' through donating to SCI is $5936 [see spreadsheet here]. If we suppose a life  is 37 DALYs, as they assume in the spreadsheet, then 12.5 days is worth 5936*12.5/37*365.25 = $5.49. Elie produced two estimates that were generous to cash and to deworming separately, and gave the highest and lowest estimates for the cost-effectiveness of deworming, of the group. They imply a range of $1.40-$45.98 to do as much good via SCI as eating vegetarian for a meal).

Given this calculation, we get a few cents to a couple of dollars as the cost of doing similar amounts of good to averting a meat meal via other means. We are not finished yet though - there were many factors I didn't take into account in the calculation, because I wanted to separate relatively straightforward facts for which I have good evidence from guesses. Here are other considerations I can think of, which reduce the relative value of averting meat eating:

  1. Chicken brains are fairly small, suggesting their internal experience is less than that of humans. More generally, in the spectrum of entities between humans and microbes, chickens are at least some of the way to microbes. And you wouldn't pay much to save a microbe.
  2. Eating a chicken only reduces the number of chicken produced by some fraction. According to Peter Hurford, an extra 0.3 chickens are produced if you demand 1 chicken. I didn't include this in the above calculation because I am not sure of the time scale of the relevant elasticities (if they are short-run elasticities, they might underestimate the effect of vegetarianism).
  3. Vegetable production may also have negative effects on animals.
  4. Givewell estimates have been rigorously checked relative to other things, and evaluations tend to get worse as you check them. For instance, you might forget to include any of the things in this list in your evaluation of vegetarianism. Probably there are more things I forgot. That is, if you looked into vegetarianism with the same detail as SCI, it would become more pessimistic, and so cheaper to do as much good with SCI.
  5. It is not at all obvious that meat animal lives are not worth living on average. Relatedly, animals generally want to be alive, which we might want to give some weight to.
  6. Animal welfare in general appears to have negligible predictable effect on the future (very debatably), and there are probably things which can have huge impact on the future. This would make animal altruism worse compared to present-day human interventions, and much worse compared to interventions directed at affecting the far future, such as averting existential risk.

My own quick guesses at factors by which the relative value of avoiding meat should be multiplied, to account for these considerations:

  1. Moral value of small animals: 0.05
  2. Raised price reduces others' consumption: 0.5
  3. Vegetables harm animals too: 0.9
  4. Rigorous estimates look worse: 0.9
  5. Animal lives might be worth living: 0.2
  6. Animals don't affect the future: 0.1 relative to human poverty charities

Thus given my estimates, we scale down the above figures by 0.05*0.5*0.9*0.9*0.2*0.1 =0.0004. This gives us $0.0008-$0.002 to do as much good as eating a vegetarian meal by spending on GiveWell's top charities. Without the factor for the future (which doesn't apply to these other animal charities), we only multiply the cost of eating a meat meal by 0.004. This gives us a price of $0.0003 with the Humane League, or $0.002 on improving chicken welfare in other ways. These are not price differences that will change my meal choices very often! I think I would often be willing to pay at least a couple of extra dollars to eat meat, setting aside animal suffering. So if I were to avoid eating meat, then assuming I keep fixed how much of my budget I spend on myself and how much I spend on altruism, I would be trading a couple of dollars of value for less than one thousandth of that.

I encourage you to estimate your own numbers for the above factors, and to recalculate the overall price according to your beliefs. If you would happily pay this much (in my case, less than $0.002) to eat meat on many occasions, you probably shouldn't be a vegetarian. You are better off paying that cost elsewhere. If you would rarely be willing to pay the calculated price, you should perhaps consider being a vegetarian, though note that the calculation was conservative in favor of vegetarianism, so you might want to run it again more carefully. Note that in judging what you would be willing to pay to eat meat, you should take into account everything except the direct cost to animals.

There are many common reasons you might not be willing to eat meat, given these calculations, e.g.:

  • You don't enjoy eating meat
  • You think meat is pretty unhealthy
  • You belong to a social cluster of vegetarians, and don't like conflict
  • You think convincing enough others to be vegetarians is the most cost-effective way to make the world better, and being a vegetarian is a great way to have heaps of conversations about vegetarianism, which you believe makes people feel better about vegetarians overall, to the extent that they are frequently compelled to become vegetarians.
  • 'For signaling' is another common explanation I have heard, which I think is meant to be similar to the above, though I'm not actually sure of the details.
  • You aren't able to treat costs like these as fungible (as discussed above)
  • You are completely indifferent to what you eat (in that case, you would probably do better eating as cheaply as possible, but maybe everything is the same price)
  •  You consider the act-omission distinction morally relevant
  • You are very skeptical of the ability to affect anything, and in particular have substantially greater confidence in the market - to farm some fraction of a pig fewer in expectation if you abstain from pork for long enough - than in nonprofits and complicated schemes. (Though in that case, consider buying free-range eggs and selling them as cage eggs).
  • You think the suffering of animals is of extreme importance compared to the suffering of humans or loss of human lives, and don't trust the figures I have given for improving the lives of egg-laying chickens, and don't want to be a hypocrite. Actually, you still probably shouldn't here - the egg-laying chicken number is just an example of a plausible alternative way to help animals. You should really check quite a few of these before settling.

However I think for wannabe effective altruists with the usual array of characteristics, vegetarianism is likely to be quite ineffective.

TV's "Elementary" Tackles Friendly AI and X-Risk - "Bella" (Possible Spoilers)

19 pjeby 22 November 2014 07:51PM

I was a bit surprised to find this week's episode of Elementary was about AI...  not just AI and the Turing Test, but also a fairly even-handed presentation of issues like Friendliness, hard takeoff, and the difficulties of getting people to take AI risks seriously.

The case revolves around a supposed first "real AI", dubbed "Bella", and the theft of its source code...  followed by a computer-mediated murder.  The question of whether "Bella" might actually have murdered its creator for refusing to let it out of the box and connect it to the internet is treated as an actual possibility, springboarding to a discussion about how giving an AI a reward button could lead to it wanting to kill all humans and replace them with a machine that pushes the reward button.

Also demonstrated are the right and wrong ways to deal with attempted blackmail...  But I'll leave that vague so it doesn't spoil anything.  An X-risks research group and a charismatic "dangers of AI" personality are featured, but do not appear intended to resemble any real-life groups or personalities.  (Or if they are, I'm too unfamiliar with the groups or persons to see the resemblence.)  They aren't mocked, either...  and the episode's ending is unusually ambiguous and open-ended for the show, which more typically wraps everything up with a nice bow of Justice Being Done.  Here, we're left to wonder what the right thing actually is, or was, even if it's symbolically moved to Holmes' smaller personal dilemma, rather than leaving the focus on the larger moral dilemma that created Holmes' dilemma in the first place.

The episode actually does a pretty good job of raising an important question about the weight of lives, even if LW has explicitly drawn a line that the episode's villain(s)(?) choose to cross.  It also has some fun moments, with Holmes becoming obsessed with proving Bella isn't an AI, even though Bella makes it easy by repeatedly telling him it can't understand his questions and needs more data.  (Bella, being on an isolated machine without internet access, doesn't actually know a whole lot, after all.)  Personally, I don't think Holmes really understands the Turing Test, even with half a dozen computer or AI experts assisting him, and I think that's actually the intended joke.

There's also an obligatory "no pity, remorse, fear" speech lifted straight from The Terminator, and the comment "That escalated quickly!" in response to a short description of an AI box escape/world takeover/massacre.

(Edit to add: one of the unusually realistic things about the AI, "Bella", is that it was one of the least anthromorphized fictional AI's I have ever seen.  I mean, there was no way the thing was going to pass even the most primitive Turing test...  and yet it still seemed at least somewhat plausible as a potential murder suspect.  While perhaps not a truly realistic demonstration of just how alien an AI's thought process would be, it felt like the writers were at least making an actual effort.  Kudos to them.)

(Second edit to add: if you're not familiar with the series, this might not be the best episode to start with; a lot of the humor and even drama depends upon knowledge of existing characters, relationships, backstory, etc.  For example, Watson's concern that Holmes has deliberately arranged things to separate her from her boyfriend might seem like sheer crazy-person paranoia if you don't know about all the ways he did interfere with her personal life in previous seasons...  nor will Holmes' private confessions to Bella and Watson have the same impact without reference to how difficult any admission of feeling was for him in previous seasons.)

More marbles and Sleeping Beauty

1 Manfred 23 November 2014 02:00AM

I

Previously I talked about an entirely uncontroversial marble game: I flip a coin, and if Tails I give you a black marble, if Heads I flip another coin to either give you a white or a black marble.

The probabilities of seeing the two marble colors are 3/4 and 1/4, and the probabilities of Heads and Tails are 1/2 each.

The marble game is analogous to how a 'halfer' would think of the Sleeping Beauty problem - the claim that Sleeping Beauty should assign probability 1/2 to Heads relies on the claim that your information for the Sleeping Beauty problem is the same as your information for the marble game - same possible events, same causal information, same mutual exclusivity and exhaustiveness relations.

So what's analogous to the 'thirder' position, after we take into account that we have this causal information? Is it some difference in causal structure, or some non-causal anthropic modification, or something even stranger?

As it turns out, nope, it's the same exact game, just re-labeled.

In the re-labeled marble game you still have two unknown variables (represented by flipping coins), and you still have a 1/2 chance of black and Tails, a 1/4 chance of black and Heads, and a 1/4 chance of white and Heads.

And then to get the thirds, you ask the question "If I get a black marble, what is the probability of the faces of the first coin?" Now you update to P(Heads|black)=1/3 and P(Tails|black)=2/3.

II

Okay, enough analogies. What's going on with these two positions in the Sleeping Beauty problem?

1:            2:

Here are two different diagrams, which are really re-labelings of the same diagram. The first labeling is the problem where P(Heads|Wake) = 1/2. The second labeling is the problem where P(Heads|Wake) = 1/3. The question at hand is really - which of these two math problems corresponds to the word problem / real world situation?

As a refresher, here's the text of the Sleeping Beauty problem that I'll use: Sleeping Beauty goes to sleep in a special room on Sunday, having signed up for an experiment. A coin is flipped - if the coin lands Heads, she will only be woken up on Monday. If the coin lands Tails, she will be woken up on both Monday and Tuesday, but with memories erased in between. Upon waking up, she then assigns some probability to the coin landing Heads, P(Heads|Wake).

Diagram 1:  First a coin is flipped to get Heads or Tails. There are two possible things that could be happening to her, Wake on Monday or Wake on Tuesday. If the coin landed Heads, then she gets Wake on Monday. If the coin landed Tails, then she could either get Wake on Monday or Wake on Tuesday (in the marble game, this was mediated by flipping a second coin, but in this case it's some unspecified process, so I've labeled it [???]).  Because all the events already assume she Wakes, P(Heads|Wake) evaluates to P(Heads), which just as in the marble game is 1/2.

This [???] node here is odd, can we identify it as something natural? Well, it's not Monday/Tuesday, like in diagram 2 - there's no option that even corresponds to Heads & Tuesday. I'm leaning towards the opinion that this node is somewhat magical / acausal, just hanging around because of analogy to the marble game. So I think we can take it out. A better causal diagram with the halfer answer, then, might merely be Coin -> (Wake on Monday / Wake on Tuesday), where Monday versus Tuesday is not determined at all by a causal node, merely informed probabilistically to be mutually exclusive and exhaustive.

Diagram 2:  A coin is flipped, Heads or Tails, and also it could be either Monday or Tuesday. Together, these have a causal effect on her waking or not waking - if Heads and Monday, she Wakes, but if Heads and Tuesday, she Doesn't wake. If Tails, she Wakes. Her pre-Waking prior for Heads is 1/2, but upon waking, the event Heads, Tuesday, Don't Wake gets eliminated, and after updating P(Heads|Wake)=1/3.

There's a neat asymmetry here. In diagram 1, when the coin was Heads she got the same outcome no matter the value of [???], and only when the coin was Tails were there really two options. In Diagram 2, when the coin is Heads, two different things happen for different values of the day, while if the coin is Tails the same thing happens no matter the day.

 

Do these seem like accurate depictions of what's going on in these two different math problems? If so, I'll probably move on to looking closer at what makes the math problem correspond to the word problem.

Musings on the LSAT: "Reasoning Training" and Neuroplasticity

3 Natha 22 November 2014 07:14PM

The purpose of this post is to provide basic information about the LSAT including the format  of the test and a few sample questions. I also wanted to bring light to some research that has found LSAT preparation to alter brain structure in ways that strengthen hypothesized "reasoning pathways". These studies have not been discussed here before; I thought they were interesting and really just wanted to call your collective attention to them.

I really like taking tests; I get energized by intense race-against-the-clock problem solving and, for better or worse, I relish getting to see my standing relative to others when the dust settles. I like the the purity of the testing situation --how conditions are standardized in theory and more or less the same for all comers. This guilty pleasure has played no small part in the course my life has taken: I worked as a test prep tutor for 3 years and loved every minute of it, I met my wife through academic competitions in high school, and I am a currently a graduate student doing lots of coursework in psychometrics.

Well, my brother-in-law is a lawyer, and when we chat the topic of the LSAT has served as some conversational common ground. Since I like taking tests for fun, he suggested I give it a whirl because he thought it was interesting and felt like it was a fair assessment of one's logical reasoning ability. So I did, I took a practice test cold a couple Saturdays ago and I was very impressed. Here the one I took. (This is a full practice exam provided by the test-makers; it's also like the top google result for "LSAT practice test".) I wanted to post here about it because the LSAT hasn't been discussed very much on this site and I thought that some of you might find it useful to know about.

A brief run-down of the LSAT:

The test has four parts: two Logical Reasoning sections, a Critical Reading section (akin to SAT et al.), and an Analytical Reasoning, or "logic games", section. Usually when people talk about the LSAT, the logic games get emphasized because they are unusual and can be pretty challenging (the only questions I missed were of this type; I missed a few and I ran out of time). Essentially, you get a premise and a bunch of conditions from which you are required to draw conclusions. Here's an example:

A cruise line is scheduling seven week-long voyages for the ship Freedom. 
Each voyage will occur in exactly one of the first seven weeks of the season: weeks 1 through 7.
Each voyage will be to exactly one of four destinations:Guadeloupe, Jamaica, Martinique, or Trinidad.
Each destination will be scheduled for at least one of the weeks.
The following conditions apply: Jamaica will not be its destination in week 4.
Trinidad will be its destination in week 7. Freedom will make exactly two voyages to Martinique,
and at least one voyage to Guadeloupe will occur in some week between those two voyages.
Guadeloupe will be its destination in the week preceding any voyage it makes to Jamaica.
No destination will be scheduled for consecutive weeks.
11. Which of the following is an acceptable schedule of destinations in order from week 1 through week 7?

(A) Guadeloupe, Jamaica, Martinique, Trinidad,Guadeloupe, Martinique, Trinidad
(B) Guadeloupe, Martinique, Trinidad, Martinique, Guadeloupe, Jamaica, Trinidad
(C) Jamaica, Martinique, Guadeloupe, Martinique, Guadeloupe, Jamaica, Trinidad
(D) Martinique, Trinidad, Guadeloupe, Jamaica, Martinique, Guadeloupe, Trinidad
(E) Martinique, Trinidad, Guadeloupe, Trinidad, Guadeloupe, Jamaica, Martinique


Clearly, this section places a huge burden on working memory and is probably the most g-loaded of the four. I'd guess that most LSAT test prep is about strategies for dumping this burden into some kind of written scheme that makes it all more manageable. But I just wanted to show you the logic games for completeness; what I was really excited by were the Logical Reasoning questions (sections II and III). You are presented with some scenario containing a claim, an argument, or a set of facts, and then asked to analyze, critique, or to draw correct conclusions. Here are most of the question stems used in these sections:

Which one of the following most accurately expresses the main conclusion of the economist’s argument?
Which one of the following uses flawed reasoning that most closely resembles the flawed reasoning in the argument?
Which one of the following most logically completes the argument?
The reasoning in the consumer’s argument is most vulnerable to criticism on the grounds that the argument...
The argument’s conclusion follows logically if which one of the following is assumed?
Which one of the following is an assumption required by the argument?


Heyo! This is exactly the kind of stuff I would like to become better at! Most of the questions were pretty straightforward, but the LSAT is known to be a tough test (score range: 120-180, 95th %ile: ~167, 99th %ile: ~172) and these practice questions probably aren't representative. What a cool test though! Here's a whole question from this section, superficially about utilitarianism:

3. Philosopher: An action is morally right if it would be reasonably expected
to increase the aggregate well-being of the people affected by it. An action
is morally wrong if and only if it would be reasonably expected to reduce the
aggregate well-being of the people affected by it. Thus, actions that would
be reasonably expected to leave unchanged the aggregate well-being of the
people affected by them are also right.
The philosopher’s conclusion follows logically if which one of the following is assumed?
(A) Only wrong actions would be reasonably expected to reduce the aggregate 
well-being of the people affected by them.
(B) No action is both right and wrong.
(C) Any action that is not morally wrong is morally right.
(D) There are actions that would be reasonably expected to leave unchanged the
 aggregate well-being of the people affected by them.
(E) Only right actions have good consequences.


Also, the LSAT is a good test, in that it measures well one's ability to succeed in law school. Validity studies boast that “LSAT score alone continues to be a better predictor of law school performance than UGPA [undergraduate GPA] alone.” Of course, the outcome variable can be regressed on both predictors and account for more of the variance than either one taken singly, but it is uncommon for a standardized test to beat prior GPA in predicting a students future GPA.

 

Intensive LSAT preparation and neuroplasticity:

In two recent studies (same research team), learning to reason in the logically formal way required by the LSAT was found to alter brain structure in ways consistent with literature reviews of the neural correlates of logical reasoning. Note: my reading of these articles was pretty surface-level; I do not intend to provide a thorough review, only to bring them to your attention.

These researchers recruited pre-law students enrolling in an LSAT course and imaged their brains at rest using fMRI both before and after 3 months of this "reasoning training". As controls, they included age- and IQ-matched pre-law students intending to take LSAT in the future but not actively preparing for it.

The LSAT-prep group was found to have significantly increased connectivity between parietal and prefrontal cortices and the striatum, both within the left hemisphere and across hemispheres. In the first study, the authors note that

 

These experience-dependent changes fall into tracts that would be predicted by prior work showing that reasoning relies on an interhemispheric frontoparietal network (for review, see Prado et al., 2011). Our findings are also consistent with the view that reasoning is largely left-hemisphere dominent (e.g., Krawczyk, 2012), but that homologous cortex in the right hemisphere can be recruited as needed to support complex reasoning. Perhaps learning to reason more efficiently involves recruiting compensatory neural circuitry more consistently.


And in the second study, they conclude

 

An analysis of pairwise correlations between brain regions implicated in reasoning showed that fronto-parietal connections were strengthened, along with parietal-striatal connections. These findings provide strong evidence for neural plasticity at the level of large-scale networks supporting high-level cognition.

 

I think this hypothesized fronto-parietal reasoning network is supposed to go something like this:

The LSAT requires a lot of relational reasoning, the ability to compare and combine mental representations. The parietal cortex holds individual relationships between these mental representations (A->B, B->C), and the prefrontal cortex integrates this information to draw conclusions (A->B->C, therefore A->C). The striatum's role in this network would be to monitor the success/failure of reward predictions and encourage flexible problem solving. Unfortunately, my understanding here is very limited. Here are several reviews of this reasoning network stuff (I have not read any; just wanted to share them): Hampshire et al. (2011), Prado et al. (2011), Krawczyk (2012).

I hope this was useful information! According to the 2013 survey, only 2.2% of you are in law-related professions, but I was wondering (1) if anyone has personal experience studying for this exam, (2) if they felt like it improved their logical reasoning skills, and (3) if they felt that these effects were long-lasting. Studying for this test seems to have the potential to inculcate rationalist habits-of-mind; I know it's just self-report, but for those who went on to law school, did you feel like you benefited from the experience studying for the LSAT? I only ask because the Law School Admission Council, a non-profit organization made up of 200+ law schools, seems to actively encourage preparation for the exam, member schools say it is a major factor in admissions, preparation tends to increase performance, and LSAT performance is correlated moderately-to-strongly with first year law school GPA (r= ~0.4).

View more: Next