Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
You may know me as the guy who posts a lot of controversial stuff about LW and MIRI. I don't enjoy doing this and do not want to continue with it. One reason being that the debate is turning into a flame war. Another reason is that I noticed that it does affect my health negatively (e.g. my high blood pressure (I actually had a single-sided hearing loss over this xkcd comic on Friday)).
This all started in 2010 when I encountered something I perceived to be wrong. But the specifics are irrelevant for this post. The problem is that ever since that time there have been various reasons that made me feel forced to continue the controversy. Sometimes it was the urge to clarify what I wrote, other times I thought it was necessary to respond to a reply I got. What matters is that I couldn't stop. But I believe that this is now possible, given my health concerns.
One problem is that I don't want to leave possible misrepresentations behind. And there very likely exist misrepresentations. There are many reasons for this, but I can assure you that I never deliberately lied and that I never deliberately tried to misrepresent anyone. The main reason might be that I feel very easily overwhelmed and never had the ability to force myself to invest the time that is necessary to do something correctly if I don't really enjoy doing it (for the same reason I probably failed school). Which means that most comments and posts are written in a tearing hurry, akin to a reflexive retraction from the painful stimulus.
I hate this fight and want to end it once and for all. I don't expect you to take my word for it. So instead, here is an offer:
I am willing to post counterstatements, endorsed by MIRI, of any length and content at the top of any of my blog posts. You can either post them in the comments below or send me an email (da [at] kruel.co).
I have no idea if MIRI believes this to be worthwhile. But I couldn't think of a better way to solve this dilemma in a way that everyone can live with happily. But I am open to suggestions that don't stress me too much (also about how to prove that I am trying to be honest).
You obviously don't need to read all my posts. It can also be a general statement.
I am also aware that LW and MIRI are bothered by RationalWiki. As you can easily check from the fossil record, I have at points tried to correct specific problems. But, for the reasons given above, I have problems investing the time to go through every sentence to find possible errors and attempt to correct it in such a way that the edit is not reverted and that people who feel offended are satisfied.
 There are obviously some caveats regarding the content, such as no nude photos of Yudkowsky ;-)
Crossposted from Meteuphoric
I have lately noticed several people wondering why more Effective Altruists are not vegetarians. I am personally not a vegetarian because I don't think it is an effective way to be altruistic.
As far as I can tell the fact that many EAs are not vegetarians is surprising to some because they think 'animals are probably morally relevant' basically implies 'we shouldn't eat animals'. To my ear, this sounds about as absurd as if Givewell's explanation of their recommendation of SCI stopped after 'the developing world exists, or at least has a high probability of doing so'.
(By the way, I do get to a calculation at the bottom, after some speculation about why the calculation I think is appropriate is unlike what I take others' implicit calculations to be. Feel free to just scroll down and look at it).
I think this fairly large difference between my and many vegetarians' guesses at the value of vegetarianism arises because they think the relevant question is whether the suffering to the animal is worse than the pleasure to themselves at eating the animal. This question sounds superficially plausibly relevant, but I think on closer consideration you will agree that it is the wrong question.
The real question is not whether the cost to you is small, but whether you could do more good for the same small cost.
Similarly, when deciding whether to donate $5 to a random charity, the question is whether you could do more good by donating the money to the most effective charity you know of. Going vegetarian because it relieves the animals more than it hurts you is the equivalent of donating to a random developing world charity because it relieves the suffering of an impoverished child more than foregoing $5 increases your suffering.
Trading with inconvenience and displeasure
My imaginary vegetarian debate partner objects to this on grounds that vegetarianism is different from donating to ineffective charities, because to be a vegetarian you are spending effort and enjoying your life less rather than spending money, and you can't really reallocate that inconvenience and displeasure to, say, preventing artificial intelligence disaster or feeding the hungry, if don't use it on reading food labels and eating tofu. If I were to go ahead and eat the sausage instead - the concern goes - probably I would just go on with the rest of my life exactly the same, and a bunch of farm animals somewhere would be the worse for it, and I scarcely better.
I agree that if the meat eating decision were separated from everything else in this way, then the decision really would be about your welfare vs. the animal's welfare, and you should probably eat the tofu.
However whether you can trade being vegetarian for more effective sacrifices is largely a question of whether you choose to do so. And if vegetarianism is not the most effective way to inconvenience yourself, then it is clear that you should choose to do so. If you eat meat now in exchange for suffering some more effective annoyance at another time, you and the world can be better off.
Imagine an EA friend says to you that she gives substantial money to whatever random charity has put a tin in whatever shop she is in, because it's better than the donuts and new dresses she would buy otherwise. She doesn't see how not giving the money to the random charity would really cause her to give it to a better charity - empirically she would spend it on luxuries. What do you say to this?
If she were my friend, I might point out that the money isn't meant to magically move somewhere better - she may have to consciously direct it there. She might need to write down how much she was going to give to the random charity, then look at the note later for instance. Or she might do well to decide once and for all how much to give to charity and how much to spend on herself, and then stick to that. As an aside, I might also feel that she was using the term 'Effective Altruist' kind of broadly.
I see vegetarianism for the sake of not managing to trade inconveniences as quite similar. And in both cases you risk spending your life doing suboptimal things every time a suboptimal altruistic opportunity has a chance to steal resources from what would be your personal purse. This seems like something that your personal and altruistic values should cooperate in avoiding.
It is likely too expensive to keep track of an elaborate trading system, but you should at least be able to make reasonable long term arrangements. For instance, if instead of eating vegetarian you ate a bit frugally and saved and donated a few dollars per meal, you would probably do more good (see calculations lower in this post). So if frugal eating were similarly annoying, it would be better. Eating frugally is inconvenient in very similar ways to vegetarianism, so is a particularly plausible trade if you are skeptical that such trades can be made. I claim you could make very different trades though, for instance foregoing the pleasure of an extra five minute's break and working instead sometimes. Or you could decide once and for all how much annoyance to have, and then choose most worthwhile bits of annoyance, or put a dollar value on your own time and suffering and try to be consistent.
Nebulous life-worsening costs of vegetarianism
There is a separate psychological question which is often mixed up with the above issue. That is, whether making your life marginally less gratifying and more annoying in small ways will make you sufficiently less productive to undermine the good done by your sacrifice. This is not about whether you will do something a bit costly another time for the sake of altruism, but whether just spending your attention and happiness on vegetarianism will harm your other efforts to do good, and cause more harm than good.
I find this plausible in many cases, but I expect it to vary a lot by person. My mother seems to think it's basically free to eat supplements, whereas to me every additional daily routine seems to encumber my life and require me to spend disproportionately more time thinking about unimportant things. Some people find it hard to concentrate when unhappy, others don't. Some people struggle to feed themselves adequately at all, while others actively enjoy preparing food.
There are offsetting positives from vegetarianism which also vary across people. For instance there is the pleasure of self-sacrifice, the joy of being part of a proud and moralizing minority, and the absence of the horror of eating other beings. There are also perhaps health benefits, which probably don't vary that much by people, but people do vary in how big they think the health benefits are.
Another way you might accidentally lose more value than you save is in spending little bits of time which are hard to measure or notice. For instance, vegetarianism means spending a bit more time searching for vegetarian alternatives, researching nutrition, buying supplements, writing emails back to people who invite you to dinner explaining your dietary restrictions, etc. The value of different people's time varies a lot, as does the extent to which an additional vegetarianism routine would tend to eat their time.
On a less psychological note, the potential drop in IQ (~5 points?!) from missing out on creatine is a particularly terrible example of vegetarianism making people less productive. Now that we know about creatine and can supplement it, creatine itself is not such an issue. An issue does remain though: is this an unlikely one-off failure, or should we worry about more such deficiency? (this goes for any kind of unusual diet, not just meat-free ones).
How much is avoiding meat worth?
Here is my own calculation of how much it costs to do the same amount of good as replacing one meat meal with one vegetarian meal. If you would be willing to pay this much extra to eat meat for one meal, then you should eat meat. If not, then you should abstain. For instance, if eating meat does $10 worth of harm, you should eat meat whenever you would hypothetically pay an extra $10 for the privilege.
This is a tentative calculation. I will probably update it if people offer substantially better numbers.
All quantities are in terms of social harm.
Eating 1 non-vegetarian meal
< eating 1 chickeny meal (I am told chickens are particularly bad animals to eat, due to their poor living conditions and large animal:meal ratio. The relatively small size of their brains might offset this, but I will conservatively give all animals the moral weight of humans in this calculation.)
< eating 200 calories of chicken (a McDonalds crispy chicken sandwich probably contains a bit over 100 calories of chicken (based on its listed protein content); a Chipotle chicken burrito contains around 180 calories of chicken)
= causing ~0.25 chicken lives (1 chicken is equivalent in price to 800 calories of chicken breast i.e. eating an additional 800 calories of chicken breast conservatively results in one additional chicken. Calculations from data here and here.)
< -$0.08 given to the Humane League (ACE estimates the Humane League spares 3.4 animal lives per dollar). However since the humane league basically convinces other people to be vegetarians, this may be hypocritical or otherwise dubious.
< causing 12.5 days of chicken life (broiler chickens are slaughtered at between 35-49 days of age)
= causing 12.5 days of chicken suffering (I'm being generous)
< -$0.50 subsidizing free range eggs, (This is a somewhat random example of the cost of more systematic efforts to improve animal welfare, rather than necessarily the best. The cost here is the cost of buying free range eggs and selling them as non-free range eggs. It costs about 2.6 2004 Euro cents [= US 4c in 2014] to pay for an egg to be free range instead of produced in a battery. This corresponds to a bit over one day of chicken life. I'm assuming here that the life of a battery egg-laying chicken is not substantially better than that of a meat chicken, and that free range chickens have lives that are at least neutral. If they are positive, the figure becomes even more favorable to the free range eggs).
< losing 12.5 days of high quality human life (assuming saving one year of human life is at least as good as stopping one year of an animal suffering, which you may disagree with.)
= -$1.94-5.49 spent on GiveWell's top charities (This was GiveWell's estimate for AMF if we assume saving a life corresponds to saving 52 years - roughly the life expectancy of children in Malawi. GiveWell doesn't recommend AMF at the moment, but they recommend charities they considered comparable to AMF when AMF had this value.
GiveWell employees' median estimate for the cost of 'saving a life' through donating to SCI is $5936 [see spreadsheet here]. If we suppose a life is 37 DALYs, as they assume in the spreadsheet, then 12.5 days is worth 5936*12.5/37*365.25 = $5.49. Elie produced two estimates that were generous to cash and to deworming separately, and gave the highest and lowest estimates for the cost-effectiveness of deworming, of the group. They imply a range of $1.40-$45.98 to do as much good via SCI as eating vegetarian for a meal).
Given this calculation, we get a few cents to a couple of dollars as the cost of doing similar amounts of good to averting a meat meal via other means. We are not finished yet though - there were many factors I didn't take into account in the calculation, because I wanted to separate relatively straightforward facts for which I have good evidence from guesses. Here are other considerations I can think of, which reduce the relative value of averting meat eating:
- Chicken brains are fairly small, suggesting their internal experience is less than that of humans. More generally, in the spectrum of entities between humans and microbes, chickens are at least some of the way to microbes. And you wouldn't pay much to save a microbe.
- Eating a chicken only reduces the number of chicken produced by some fraction. According to Peter Hurford, an extra 0.3 chickens are produced if you demand 1 chicken. I didn't include this in the above calculation because I am not sure of the time scale of the relevant elasticities (if they are short-run elasticities, they might underestimate the effect of vegetarianism).
- Vegetable production may also have negative effects on animals.
- Givewell estimates have been rigorously checked relative to other things, and evaluations tend to get worse as you check them. For instance, you might forget to include any of the things in this list in your evaluation of vegetarianism. Probably there are more things I forgot. That is, if you looked into vegetarianism with the same detail as SCI, it would become more pessimistic, and so cheaper to do as much good with SCI.
- It is not at all obvious that meat animal lives are not worth living on average. Relatedly, animals generally want to be alive, which we might want to give some weight to.
- Animal welfare in general appears to have negligible predictable effect on the future (very debatably), and there are probably things which can have huge impact on the future. This would make animal altruism worse compared to present-day human interventions, and much worse compared to interventions directed at affecting the far future, such as averting existential risk.
My own quick guesses at factors by which the relative value of avoiding meat should be multiplied, to account for these considerations:
- Moral value of small animals: 0.05
- Raised price reduces others' consumption: 0.5
- Vegetables harm animals too: 0.9
- Rigorous estimates look worse: 0.9
- Animal lives might be worth living: 0.2
- Animals don't affect the future: 0.1 relative to human poverty charities
Thus given my estimates, we scale down the above figures by 0.05*0.5*0.9*0.9*0.2*0.1 =0.0004. This gives us $0.0008-$0.002 to do as much good as eating a vegetarian meal by spending on GiveWell's top charities. Without the factor for the future (which doesn't apply to these other animal charities), we only multiply the cost of eating a meat meal by 0.004. This gives us a price of $0.0003 with the Humane League, or $0.002 on improving chicken welfare in other ways. These are not price differences that will change my meal choices very often! I think I would often be willing to pay at least a couple of extra dollars to eat meat, setting aside animal suffering. So if I were to avoid eating meat, then assuming I keep fixed how much of my budget I spend on myself and how much I spend on altruism, I would be trading a couple of dollars of value for less than one thousandth of that.
I encourage you to estimate your own numbers for the above factors, and to recalculate the overall price according to your beliefs. If you would happily pay this much (in my case, less than $0.002) to eat meat on many occasions, you probably shouldn't be a vegetarian. You are better off paying that cost elsewhere. If you would rarely be willing to pay the calculated price, you should perhaps consider being a vegetarian, though note that the calculation was conservative in favor of vegetarianism, so you might want to run it again more carefully. Note that in judging what you would be willing to pay to eat meat, you should take into account everything except the direct cost to animals.
There are many common reasons you might not be willing to eat meat, given these calculations, e.g.:
- You don't enjoy eating meat
- You think meat is pretty unhealthy
- You belong to a social cluster of vegetarians, and don't like conflict
- You think convincing enough others to be vegetarians is the most cost-effective way to make the world better, and being a vegetarian is a great way to have heaps of conversations about vegetarianism, which you believe makes people feel better about vegetarians overall, to the extent that they are frequently compelled to become vegetarians.
- 'For signaling' is another common explanation I have heard, which I think is meant to be similar to the above, though I'm not actually sure of the details.
- You aren't able to treat costs like these as fungible (as discussed above)
- You are completely indifferent to what you eat (in that case, you would probably do better eating as cheaply as possible, but maybe everything is the same price)
- You consider the act-omission distinction morally relevant
- You are very skeptical of the ability to affect anything, and in particular have substantially greater confidence in the market - to farm some fraction of a pig fewer in expectation if you abstain from pork for long enough - than in nonprofits and complicated schemes. (Though in that case, consider buying free-range eggs and selling them as cage eggs).
- You think the suffering of animals is of extreme importance compared to the suffering of humans or loss of human lives, and don't trust the figures I have given for improving the lives of egg-laying chickens, and don't want to be a hypocrite. Actually, you still probably shouldn't here - the egg-laying chicken number is just an example of a plausible alternative way to help animals. You should really check quite a few of these before settling.
However I think for wannabe effective altruists with the usual array of characteristics, vegetarianism is likely to be quite ineffective.
I was a bit surprised to find this week's episode of Elementary was about AI... not just AI and the Turing Test, but also a fairly even-handed presentation of issues like Friendliness, hard takeoff, and the difficulties of getting people to take AI risks seriously.
The case revolves around a supposed first "real AI", dubbed "Bella", and the theft of its source code... followed by a computer-mediated murder. The question of whether "Bella" might actually have murdered its creator for refusing to let it out of the box and connect it to the internet is treated as an actual possibility, springboarding to a discussion about how giving an AI a reward button could lead to it wanting to kill all humans and replace them with a machine that pushes the reward button.
Also demonstrated are the right and wrong ways to deal with attempted blackmail... But I'll leave that vague so it doesn't spoil anything. An X-risks research group and a charismatic "dangers of AI" personality are featured, but do not appear intended to resemble any real-life groups or personalities. (Or if they are, I'm too unfamiliar with the groups or persons to see the resemblence.) They aren't mocked, either... and the episode's ending is unusually ambiguous and open-ended for the show, which more typically wraps everything up with a nice bow of Justice Being Done. Here, we're left to wonder what the right thing actually is, or was, even if it's symbolically moved to Holmes' smaller personal dilemma, rather than leaving the focus on the larger moral dilemma that created Holmes' dilemma in the first place.
The episode actually does a pretty good job of raising an important question about the weight of lives, even if LW has explicitly drawn a line that the episode's villain(s)(?) choose to cross. It also has some fun moments, with Holmes becoming obsessed with proving Bella isn't an AI, even though Bella makes it easy by repeatedly telling him it can't understand his questions and needs more data. (Bella, being on an isolated machine without internet access, doesn't actually know a whole lot, after all.) Personally, I don't think Holmes really understands the Turing Test, even with half a dozen computer or AI experts assisting him, and I think that's actually the intended joke.
There's also an obligatory "no pity, remorse, fear" speech lifted straight from The Terminator, and the comment "That escalated quickly!" in response to a short description of an AI box escape/world takeover/massacre.
(Edit to add: one of the unusually realistic things about the AI, "Bella", is that it was one of the least anthromorphized fictional AI's I have ever seen. I mean, there was no way the thing was going to pass even the most primitive Turing test... and yet it still seemed at least somewhat plausible as a potential murder suspect. While perhaps not a truly realistic demonstration of just how alien an AI's thought process would be, it felt like the writers were at least making an actual effort. Kudos to them.)
(Second edit to add: if you're not familiar with the series, this might not be the best episode to start with; a lot of the humor and even drama depends upon knowledge of existing characters, relationships, backstory, etc. For example, Watson's concern that Holmes has deliberately arranged things to separate her from her boyfriend might seem like sheer crazy-person paranoia if you don't know about all the ways he did interfere with her personal life in previous seasons... nor will Holmes' private confessions to Bella and Watson have the same impact without reference to how difficult any admission of feeling was for him in previous seasons.)
Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda
Edge.org has recently been discussing "the myth of AI". Unfortunately, although Superintelligence is cited in the opening, most of the participants don't seem to have looked into Bostrom's arguments. (Luke has written a brief response to some of the misunderstandings Pinker and others exhibit.) The most interesting comment is Stuart Russell's, at the very bottom:
Of Myths and Moonshine
"We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief."
So wrote Leo Szilard, describing the events of March 3, 1939, when he demonstrated a neutron-induced uranium fission reaction. According to the historian Richard Rhodes, Szilard had the idea for a neutron-induced chain reaction on September 12, 1933, while crossing the road next to Russell Square in London. The previous day, Ernest Rutherford, a world authority on radioactivity, had given a "warning…to those who seek a source of power in the transmutation of atoms – such expectations are the merest moonshine."
Thus, the gap between authoritative statements of technological impossibility and the "miracle of understanding" (to borrow a phrase from Nathan Myhrvold) that renders the impossible possible may sometimes be measured not in centuries, as Rod Brooks suggests, but in hours.
None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore's law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world's information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.
This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius. AI research has been accelerating rapidly as pieces of the conceptual framework fall into place, the building blocks gain in size and strength, and commercial investment outstrips academic research activity. Senior AI researchers express noticeably more optimism about the field's prospects than was the case even a few years ago, and correspondingly greater concern about the potential risks.
No one in the field is calling for regulation of basic research; given the potential benefits of AI for humanity, that seems both infeasible and misdirected. The right response seems to be to change the goals of the field itself; instead of pure intelligence, we need to build intelligence that is provably aligned with human values. For practical reasons, we will need to solve the value alignment problem even for relatively unintelligent AI systems that operate in the human environment. There is cause for optimism, if we understand that this issue is an intrinsic part of AI, much as containment is an intrinsic part of modern nuclear fusion research. The world need not be headed for grief.
I'd quibble with a point or two, but this strikes me as an extraordinarily good introduction to the issue. I hope it gets reposted somewhere it can stand on its own.
Russell has previously written on this topic in Artificial Intelligence: A Modern Approach and the essays "The long-term future of AI," "Transcending complacency on superintelligent machines," and "An AI researcher enjoys watching his own execution." He's also been interviewed by GiveWell.
Most of this post is background and context, so I've included a tl;dr horizontal rule near the bottom where you can skip everything else if you so choose. :)
Here's a short anecdote of Feynman's:
... I invented some way of doing problems in physics, quantum electrodynamics, and made some diagrams that help to make the analysis. I was on a floor in a rooming house. I was in in my pyjamas, I'd been working on the floor in my pyjamas for many weeks, fooling around, but I got these funny diagrams after a while and I found they were useful. They helped me to find the equations easier, so I thought of the possibility that it might be useful for other people, and I thought it would really look funny, these funny diagrams I'm making, if they appear someday in the Physical Review, because they looked so odd to me. And I remember sitting there thinking how funny that would be if it ever happened, ha ha.
Well, it turned out in fact that they were useful and they do appear in the Physical Review, and I can now look at them and see other people making them and smile to myself, they do look funny to me as they did then, not as funny because I've seen so many of them. But I get the same kick out of it, that was a little fantasy when I was a kid…not a kid, I was a college professor already at Cornell. But the idea was that I was still playing, just like I have always been playing, and the secret of my happiness in life or the major part of it is to have discovered a way to entertain myself that other people consider important and they pay me to do. I do exactly what I want and I get paid. They might consider it serious, but the secret is I'm having a very good time.
There are things that I have fun doing, and there are things that I feel I have substantially more fun doing. The things in the latter group are things I generally consider a waste of time. I will focus on one specifically, because it's by far the biggest offender, and what spurred this question. Video games.
I have a knack for video games. I've played them since I was very young. I can pick one up and just be good at it right off the bat. Many of my fondest memories take place in various games played with friends or by myself and I can spend hours just reading about them. (Just recently, I started getting into fighting games technically; I plan to build my own joystick in a couple of weeks. I'm having a blast just doing the associated research.)
Usually, I'd rather play a good game than anything else. I find that the most fun I have is time spent mastering a game, learning its ins and outs, and eventually winning. I have great fun solving a good problem, or making a subtle, surprising connection—but it just doesn't do it for me like a game does.
But I want to have as much fun doing something else. I admire mathematics and physics on a very deep level, and feel a profound sense of awe when I come into contact with new knowledge regarding these fields. The other day, I made a connection between pretty basic group theory and something we were learning about in quantum (nothing amazing; it's something well known to... not undergraduates) and that was awesome. But still, I think I would have preferred to play 50 rounds of Skullgirls and test out a new combo.
I want to have as much fun doing the things that I, on a deep level, want to do—as opposed to the things which I actually have more fun doing. I'm (obviously) not Feynman, but I want to play with ideas and structures and numbers like I do with video games. I want the same creativity to apply. The same fervor. The same want. It's not that it isn't there; I am not just arbitrarily applying this want to mathematics. I can feel it's there—it's just overshadowed by what's already there for video games.
How does one go about switching something they find immensely fun, something they're even passionate about, with something else? I don't want to be as passionate about video games as I am. I'd rather feel this way about something... else. I'd rather be able to happily spend hours reading up on [something] instead of what type of button I'm going to use in my fantasy joystick, or the most effective way to cross-up your opponent.
What would you folks do? I consider this somewhat of a mind-hacking question.
Imagine you had the following at your disposal:
- A Ph.D. in a biological science, with a fair amount of reading and wet-lab work under your belt on the topic of aging and longevity (but in hindsight, nothing that turned out to leverage any real mechanistic insights into aging).
- A M.S. in statistics. Sadly, the non-Bayesian kind for the most part, but along the way acquired the meta-skills necessary to read and understand most quantitative papers with life-science applications.
- Love of programming and data, the ability to learn most new computer languages in a couple of weeks, and at least 8 years spent hacking R code.
- Research access to large amounts of anonymized patient data.
- Optimistically, two decades remaining in which to make it all count.
Imagine that your goal were to slow or prevent biological aging...
- What would be the specific questions you would try to tackle first?
- What additional skills would you add to your toolkit?
- How would you allocate your limited time between the research questions in #1 and the acquisition of new skills in #2?
Thanks for your input.
If you shop on Amazon in the countries listed below, you can earn a substantial commission for charity by doing so via the links below. This is a cost-free way to do a lot of good, so I'd encourage you to do so! You can bookmark one of the direct links to Amazon below and then use that bookmark every time you shop.
The commission will be at least 5%, varying by product category. This is substantially better than the AmazonSmile scheme available in the US, which only gives 0.5% of the money you spend to charity. It works through Amazon's 'Associates Program', which pays this commission for referring purchasers to them, from the unaltered purchase price (details here). It doesn't cost the purchaser anything. The money goes to Associates Program accounts owned by the EA non-profit Charity Science, money to which always gets regranted to GiveWell-recommended charities unless explicitly earmarked otherwise. For ease of administration and to get tax-deductibility, commission will get regranted to the Schistosomiasis Control Initiative until further notice.
Direct links to Amazon for your bookmarks
If you'd like to shop for charity, please bookmark the appropriate link below now:
From now through November 28: Black Friday Deals Week
Amazon's biggest cut price sale is this week. The links below take you to currently available deals:
Please share these links
I'll add other links on the main 'Shop for Charity' page later. I'd love to hear suggestions for good commission schemes in other countries. If you'd like to share these links with friends and family, please point them to this post or even better this project's main page.
'Shop for Charity' is a Charity Science project
I want a perfect eidetic memory.
Unfortunately, such things don't exist, but that's not stopping me from getting as close as possible. It seems as if the popular solutions are spaced repetition and memory palaces. So let's talk about those.
Memory Palaces: Do they work? If so what's the best resource (book, website etc.) for learning and mastering the technique? Is it any good for memorizing anything other than lists of things (which I find I almost never have to do)?
Spaced Repetition: What software do you use? Why that one? What sort of cards do you put in?
It seems to me that memory programs and mnemonic techniques assist one of three parts of the problem of memory: memorizing, recalling, and not forgetting.
"Not forgetting" is the long term problem of memory. Spaced repetition seems to solve the problem of "not forgetting." You feed the information you want to remember into your program, review frequently, and you won't forget that information.
Memory Palaces seem to deal with the "memorizing" part of the problem. When faced with new information that you want to be able to recall, you put it in a memory palace, vividly emphasized so as to be affective and memorable. This is good for short term encoding of information that you know you want to keep. You might put it into your spaced repetition program latter, but you just want to not forget it until then.
The last part is the problem of "recalling." Both of the previous facets of the problem of memory had a distinct advantage: you knew the information that you wanted to remember in advance. However, we frequently find ourselves in situations in which we need/want to remember something that we know (or perhaps we don't) we encountered, but didn't consider particularly important at the time. Under this heading falls the situation of making connections when learning or being reminded of old information by new information: when you learn y, you have the thought "hey, isn't that just like x?" This is the facet of the memory problem that I am most interested in, but I know of scarcely anything that can reliably improve ease of recall of information in general. Do you know of anything?
I'm looking for recommendations: books on memory, specific mnemonics, or practices that are known to improve recall, or anything else that might help with any of the three parts of the problem.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the 11th section in the reading guide: The treacherous turn. This corresponds to Chapter 8.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Existential catastrophe…” and “The treacherous turn” from Chapter 8
- The possibility of a first mover advantage + orthogonality thesis + convergent instrumental values suggests doom for humanity (p115-6)
- First mover advantage implies the AI is in a position to do what it wants
- Orthogonality thesis implies that what it wants could be all sorts of things
- Instrumental convergence thesis implies that regardless of its wants, it will try to acquire resources and eliminate threats
- Humans have resources and may be threats
- Therefore an AI in a position to do what it wants is likely to want to take our resources and eliminate us. i.e. doom for humanity.
- One kind of response: why wouldn't the makers of the AI be extremely careful not to develop and release dangerous AIs, or relatedly, why wouldn't someone else shut the whole thing down? (p116)
- It is hard to observe whether an AI is dangerous via its behavior at a time when you could turn it off, because AIs have convergent instrumental reasons to pretend to be safe, even if they are not. If they expect their minds to be surveilled, even observing their thoughts may not help. (p117)
- The treacherous turn: while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values. (p119)
- We might expect AIs to be more safe as they get smarter initially - when most of the risks come from crashing self-driving cars or mis-firing drones - then to get much less safe as they get too smart. (p117)
- One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
- The conception of deception: that moment when the AI realizes that it should conceal its thoughts (footnote 2, p282)
This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.
I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.
1. Danaher also made a nice diagram of the case for doom, and relationship with the treacherous turn:
According to Luke Muehlhauser's timeline of AI risk ideas, the treacherous turn idea for AIs has been around at least 1977, when a fictional worm did it:
1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.
3. The role of the premises
Bostrom's argument for doom has one premise that says AI could care about almost anything, then another that says regardless of what an AI cares about, it will do basically the same terrible things anyway. (p115) Do these sound a bit strange together to you? Why do we need the first, if final values don't tend to change instrumental goals anyway?
It seems the immediate reason is that an AI with values we like would not have the convergent goal of taking all our stuff and killing us. That is, the values we want an AI to have are some of those rare values that don't lead to destructive instrumental goals. Why is this? Because we (and thus the AI) care about the activites the resources would be grabbed from. If the resources were currently being used for anything we didn't care about, then our values would also suggest grabbing resources, and look similar to all of the other values. The difference that makes our values special here is just that most resources are already being used for them somewhat.
It is hard to tell apart a safe and an unsafe AI, because both would like to look safe. This is a very common problem in human interactions. For instance, it can be nontrivial to tell a genuine lover from a gold digger, a businessman from a conman, and an expert from a crank. All of them want to look like the desirable sort. Particularly similar to the AI case is that of hiring a new employee for a trial period. You will sometimes find that the employee's values are much better aligned during the trial period, and then they undergo a 'treacherous turn' once they have been hired more thoroughly.
'Costly signaling' is a general purpose solution to this problem, which works some of the time. The basic idea is this. Everyone has instrumental reasons to look like the good kind of person, but perhaps their reasons aren't exactly as strong as one other's, or the strength of their desire is harder to act on for one group than the other, so we can construct a set of options that will lead the different types of people to choose differently, even though they know this will set them apart. For instance, while an honest businessman and a conman would both like to say 'I'm an honest businessman', and to behave honestly if you watch them for five minutes, a conman would benefit less from expensive shopfront as an honest businessman, because his reputation is less valuable, so a brand is a signal of being honest.
Similarly, we could perhaps design situations where most harmful AIs would have a reasonable incentive to reveal themselves. This is difficult because so much is at stake, compared to the cost of the signal. But I haven't thought enough about it to rule it out, and I haven't seen anyone else do so either.
5. When would the 'conception of deception' take place?
6. Surveillance of the mind
Bostrom is not optimistic that AI researchers could protect much against risks by watching AI mental processes directly for signs of deception, rapid progress, or undesirable scheming. He says if the AI is smart it will realize this might happen, and hide the offending thoughts.
This seems an open question to me, for several reasons:
- Making your mental contents look innocuous while maintaining their semantic content sounds potentially very hard
- Especially for a creature which has only just become smart enough to realize it should treacherously turn
- From the AI's perspective, even if it is smart, surveillance could seem fairly unlikely, especially if we deceive it about its surroundings
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
- How transparent are AI minds likely to be? Should we expect to be able to detect deception? What are the answers to these questions for different specific architectures and methods? This might be relevant.
- Are there other good ways to filter AIs with certain desirable goals from others? e.g. by offering them choices that would filter them.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'malignant failure modes' (as opposed presumably to worse failure modes). To prepare, read “Malignant failure modes” from Chapter 8. The discussion will go live at 6pm Pacific time next Monday December 1. Sign up to be notified here.
Some people are now using the term Pascal's mugging as a label for any scenario with a large associated payoff and a small or unstable probability estimate, a combination that can trigger the absurdity heuristic.
Consider the scenarios listed below: (a) Do these scenarios have something in common? (b) Are any of these scenarios cases of Pascal's mugging?
(1) Fundamental physical operations -- atomic movements, electron orbits, photon collisions, etc. -- could collectively deserve significant moral weight. The total number of atoms or particles is huge: even assigning a tiny fraction of human moral consideration to them or a tiny probability of them mattering morally will create a large expected moral value. [Source]
(2) Cooling something to a temperature close to absolute zero might be an existential risk. Given our ignorance we cannot rationally give zero probability to this possibility, and probably not even give it less than 1% (since that is about the natural lowest error rate of humans on anything). Anybody saying it is less likely than one in a million is likely very overconfident. [Source]
(3) GMOS might introduce “systemic risk” to the environment. The chance of ecocide, or the destruction of the environment and potentially humans, increases incrementally with each additional transgenic trait introduced into the environment. The downside risks are so hard to predict -- and so potentially bad -- that it is better to be safe than sorry. The benefits, no matter how great, do not merit even a tiny chance of an irreversible, catastrophic outcome. [Source]
(4) Each time you say abracadabra, 3^^^^3 simulations of humanity experience a positive singularity.
If you read up on any of the first three scenarios, by clicking on the provided links, you will notice that there are a bunch of arguments in support of these conjectures. And yet I feel that all three have something important in common with scenario four, which I would call a clear case of Pascal's mugging.
I offer three possibilities of what these and similar scenarios have in common:
- Probability estimates of the scenario are highly unstable and highly divergent between informed people who spent a similar amount of resources researching it.
- The scenario demands skeptics to either falsify or accept its decision relevant consequences. The scenario is however either unfalsifiable by definition, too vague, or almost impossibly difficult to falsify.
- There is no or very little direct empirical evidence in support of the scenario.
In any case, I admit that it is possible that I just wanted to bring the first three scenarios to your attention. I stumbled upon each very recently and found them to be highly..."amusing".
 I am also guilty of doing this. But what exactly is wrong with using the term in that way? What's the highest probability for which the term is still applicable? Can you offer a better term?
 One would have to define what exactly counts as "direct empirical evidence". But I think that it is pretty intuitive that there exists a meaningful difference between the risk of an asteroid that has been spotted with telescopes and a risk that is solely supported by a priori arguments.
It occurs to me that the various utility indifference approaches might be usable in population ethics.
One challenge for non-total utilitarians is how to deal with new beings. Some theories - average utilitarianism, for instance, or some other systems that use overall population utility - have no problem dealing with this. But many non-total utilitarians would like to see creating new beings as a strictly neutral act.
One way you could do this is by starting with a total utilitarian framework, but subtracting a certain amount of utility every time a new being B is brought into the world. In the spirit of utility indifference, we could subtract exactly the expected utility that we expect B to enjoy during their life.
This means that we should be indifferent as to whether B is brought into the world or not, but, once B is there, we should aim to increase B's utility. There are two problems with this. The first is that, strictly interpreted, we would also be indifferent to creating people with negative utility. This can be addressed by only doing the "utility correction" if B's expected utility is positive, thus preventing us from creating beings only to have them suffer.
The second problem is more serious. What about all the actions that we could do, ahead of time, in order to harm or benefit the new being? For instance, it would seem perverse to argue that buying a rattle for a child after they are born (or conceived) is an act of positive utility, whereas buying it before they were born (or conceived) would be a neutral act, since the increase in expected utility for the child is cancel out by the above process. Not only is it perverse, but it isn't timeless, and isn't stable under self modification.
The purpose of this post is to provide basic information about the LSAT including the format of the test and a few sample questions. I also wanted to bring light to some research that has found LSAT preparation to alter brain structure in ways that strengthen hypothesized "reasoning pathways". These studies have not been discussed here before; I thought they were interesting and really just wanted to call your collective attention to them.
I really like taking tests; I get energized by intense race-against-the-clock problem solving and, for better or worse, I relish getting to see my standing relative to others when the dust settles. I like the the purity of the testing situation --how conditions are standardized in theory and more or less the same for all comers. This guilty pleasure has played no small part in the course my life has taken: I worked as a test prep tutor for 3 years and loved every minute of it, I met my wife through academic competitions in high school, and I am a currently a graduate student doing lots of coursework in psychometrics.
Well, my brother-in-law is a lawyer, and when we chat the topic of the LSAT has served as some conversational common ground. Since I like taking tests for fun, he suggested I give it a whirl because he thought it was interesting and felt like it was a fair assessment of one's logical reasoning ability. So I did, I took a practice test cold a couple Saturdays ago and I was very impressed. Here the one I took. (This is a full practice exam provided by the test-makers; it's also like the top google result for "LSAT practice test".) I wanted to post here about it because the LSAT hasn't been discussed very much on this site and I thought that some of you might find it useful to know about.
A brief run-down of the LSAT:
The test has four parts: two Logical Reasoning sections, a Critical Reading section (akin to SAT et al.), and an Analytical Reasoning, or "logic games", section. Usually when people talk about the LSAT, the logic games get emphasized because they are unusual and can be pretty challenging (the only questions I missed were of this type; I missed a few and I ran out of time). Essentially, you get a premise and a bunch of conditions from which you are required to draw conclusions. Here's an example:
A cruise line is scheduling seven week-long voyages for the ship Freedom.
Each voyage will occur in exactly one of the first seven weeks of the season: weeks 1 through 7.
Each voyage will be to exactly one of four destinations:Guadeloupe, Jamaica, Martinique, or Trinidad.
Each destination will be scheduled for at least one of the weeks.
The following conditions apply: Jamaica will not be its destination in week 4.
Trinidad will be its destination in week 7. Freedom will make exactly two voyages to Martinique,
and at least one voyage to Guadeloupe will occur in some week between those two voyages.
Guadeloupe will be its destination in the week preceding any voyage it makes to Jamaica.
No destination will be scheduled for consecutive weeks.
11. Which of the following is an acceptable schedule of destinations in order from week 1 through week 7?
(A) Guadeloupe, Jamaica, Martinique, Trinidad,Guadeloupe, Martinique, Trinidad
(B) Guadeloupe, Martinique, Trinidad, Martinique, Guadeloupe, Jamaica, Trinidad
(C) Jamaica, Martinique, Guadeloupe, Martinique, Guadeloupe, Jamaica, Trinidad
(D) Martinique, Trinidad, Guadeloupe, Jamaica, Martinique, Guadeloupe, Trinidad
(E) Martinique, Trinidad, Guadeloupe, Trinidad, Guadeloupe, Jamaica, Martinique
Clearly, this section places a huge burden on working memory and is probably the most g-loaded of the four. I'd guess that most LSAT test prep is about strategies for dumping this burden into some kind of written scheme that makes it all more manageable. But I just wanted to show you the logic games for completeness; what I was really excited by were the Logical Reasoning questions (sections II and III). You are presented with some scenario containing a claim, an argument, or a set of facts, and then asked to analyze, critique, or to draw correct conclusions. Here are most of the question stems used in these sections:
Which one of the following most accurately expresses the main conclusion of the economist’s argument?
Which one of the following uses flawed reasoning that most closely resembles the flawed reasoning in the argument?
Which one of the following most logically completes the argument?
The reasoning in the consumer’s argument is most vulnerable to criticism on the grounds that the argument...
The argument’s conclusion follows logically if which one of the following is assumed?
Which one of the following is an assumption required by the argument?
Heyo! This is exactly the kind of stuff I would like to become better at! Most of the questions were pretty straightforward, but the LSAT is known to be a tough test (score range: 120-180, 95th %ile: ~167, 99th %ile: ~172) and these practice questions probably aren't representative. What a cool test though! Here's a whole question from this section, superficially about utilitarianism:
3. Philosopher: An action is morally right if it would be reasonably expected
to increase the aggregate well-being of the people affected by it. An action
is morally wrong if and only if it would be reasonably expected to reduce the
aggregate well-being of the people affected by it. Thus, actions that would
be reasonably expected to leave unchanged the aggregate well-being of the
people affected by them are also right.
The philosopher’s conclusion follows logically if which one of the following is assumed?(A) Only wrong actions would be reasonably expected to reduce the aggregate
well-being of the people affected by them.
(B) No action is both right and wrong.
(C) Any action that is not morally wrong is morally right.
(D) There are actions that would be reasonably expected to leave unchanged the
aggregate well-being of the people affected by them.
(E) Only right actions have good consequences.
Also, the LSAT is a good test, in that it measures well one's ability to succeed in law school. Validity studies boast that “LSAT score alone continues to be a better predictor of law school performance than UGPA [undergraduate GPA] alone.” Of course, the outcome variable can be regressed on both predictors and account for more of the variance than either one taken singly, but it is uncommon for a standardized test to beat prior GPA in predicting a students future GPA.
Intensive LSAT preparation and neuroplasticity:
In two recent studies (same research team), learning to reason in the logically formal way required by the LSAT was found to alter brain structure in ways consistent with literature reviews of the neural correlates of logical reasoning. Note: my reading of these articles was pretty surface-level; I do not intend to provide a thorough review, only to bring them to your attention.
These researchers recruited pre-law students enrolling in an LSAT course and imaged their brains at rest using fMRI both before and after 3 months of this "reasoning training". As controls, they included age- and IQ-matched pre-law students intending to take LSAT in the future but not actively preparing for it.
The LSAT-prep group was found to have significantly increased connectivity between parietal and prefrontal cortices and the striatum, both within the left hemisphere and across hemispheres. In the first study, the authors note that
These experience-dependent changes fall into tracts that would be predicted by prior work showing that reasoning relies on an interhemispheric frontoparietal network (for review, see Prado et al., 2011). Our findings are also consistent with the view that reasoning is largely left-hemisphere dominent (e.g., Krawczyk, 2012), but that homologous cortex in the right hemisphere can be recruited as needed to support complex reasoning. Perhaps learning to reason more efficiently involves recruiting compensatory neural circuitry more consistently.
And in the second study, they conclude
An analysis of pairwise correlations between brain regions implicated in reasoning showed that fronto-parietal connections were strengthened, along with parietal-striatal connections. These findings provide strong evidence for neural plasticity at the level of large-scale networks supporting high-level cognition.
I think this hypothesized fronto-parietal reasoning network is supposed to go something like this:
The LSAT requires a lot of relational reasoning, the ability to compare and combine mental representations. The parietal cortex holds individual relationships between these mental representations (A->B, B->C), and the prefrontal cortex integrates this information to draw conclusions (A->B->C, therefore A->C). The striatum's role in this network would be to monitor the success/failure of reward predictions and encourage flexible problem solving. Unfortunately, my understanding here is very limited. Here are several reviews of this reasoning network stuff (I have not read any; just wanted to share them): Hampshire et al. (2011), Prado et al. (2011), Krawczyk (2012).
I hope this was useful information! According to the 2013 survey, only 2.2% of you are in law-related professions, but I was wondering (1) if anyone has personal experience studying for this exam, (2) if they felt like it improved their logical reasoning skills, and (3) if they felt that these effects were long-lasting. Studying for this test seems to have the potential to inculcate rationalist habits-of-mind; I know it's just self-report, but for those who went on to law school, did you feel like you benefited from the experience studying for the LSAT? I only ask because the Law School Admission Council, a non-profit organization made up of 200+ law schools, seems to actively encourage preparation for the exam, member schools say it is a major factor in admissions, preparation tends to increase performance, and LSAT performance is correlated moderately-to-strongly with first year law school GPA (r= ~0.4).
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
This summary was posted to LW Main on November 14th. The following week's summary is here.
New meetups (or meetups with a hiatus of more than a year) are happening in:
Irregularly scheduled Less Wrong meetups are taking place in:
- East Coast Solstice Megameetup: 20 December 2014 03:00PM
- European Community Weekend 2015: 12 June 2015 12:00PM
- Saint Petersburg meetup - "with probable lectures": 14 November 2014 07:00PM
- Urbana-Champaign: TRVTH: 16 November 2014 02:00PM
- Utrecht: Game theory: 16 November 2014 02:00PM
- Utrecht: Rationality Games: 30 November 2014 02:00PM
- Warsaw November Meetup: 17 November 2014 06:00PM
The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:
- Boston: Self Therapy: 16 November 2014 03:30PM
- Canberra: Liar's Dice!: 28 November 2014 06:00PM
- Seattle Secular Solstice: 13 December 2014 05:30PM
- [Sydney] regular meetup - Significant things I have gotten wrong: 26 November 2014 06:30PM
- Vienna: 22 November 2014 03:00PM
- [Vienna] A Rationalist's Guide to Strength (Vienna): 23 November 2014 02:00PM
- [Vienna] Rationality Weekend Vienna: 13 December 2014 03:00PM
- Washington, D.C.: To-Do List Hacking: 16 November 2014 03:00PM
- West LA: Linguistic Relativity: 19 November 2014 07:00PM
Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Toronto, Vienna, Washington DC, Waterloo, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.
[Link] If we knew about all the ways an Intelligence Explosion could go wrong, would we be able to avoid them?
I submitted this a while back to the lesswrong subreddit, but it occurs to me now that most LWers probably don't actually check the sub. So here it is again in case anyone that's interested didn't see it.
Previously I talked about an entirely uncontroversial marble game: I flip a coin, and if Tails I give you a black marble, if Heads I flip another coin to either give you a white or a black marble.
The probabilities of seeing the two marble colors are 3/4 and 1/4, and the probabilities of Heads and Tails are 1/2 each.
The marble game is analogous to how a 'halfer' would think of the Sleeping Beauty problem - the claim that Sleeping Beauty should assign probability 1/2 to Heads relies on the claim that your information for the Sleeping Beauty problem is the same as your information for the marble game - same possible events, same causal information, same mutual exclusivity and exhaustiveness relations.
So what's analogous to the 'thirder' position, after we take into account that we have this causal information? Is it some difference in causal structure, or some non-causal anthropic modification, or something even stranger?
As it turns out, nope, it's the same exact game, just re-labeled.
In the re-labeled marble game you still have two unknown variables (represented by flipping coins), and you still have a 1/2 chance of black and Tails, a 1/4 chance of black and Heads, and a 1/4 chance of white and Heads.
And then to get the thirds, you ask the question "If I get a black marble, what is the probability of the faces of the first coin?" Now you update to P(Heads|black)=1/3 and P(Tails|black)=2/3.
Okay, enough analogies. What's going on with these two positions in the Sleeping Beauty problem?
Here are two different diagrams, which are really re-labelings of the same diagram. The first labeling is the problem where P(Heads|Wake) = 1/2. The second labeling is the problem where P(Heads|Wake) = 1/3. The question at hand is really - which of these two math problems corresponds to the word problem / real world situation?
As a refresher, here's the text of the Sleeping Beauty problem that I'll use: Sleeping Beauty goes to sleep in a special room on Sunday, having signed up for an experiment. A coin is flipped - if the coin lands Heads, she will only be woken up on Monday. If the coin lands Tails, she will be woken up on both Monday and Tuesday, but with memories erased in between. Upon waking up, she then assigns some probability to the coin landing Heads, P(Heads|Wake).
Diagram 1: First a coin is flipped to get Heads or Tails. There are two possible things that could be happening to her, Wake on Monday or Wake on Tuesday. If the coin landed Heads, then she gets Wake on Monday. If the coin landed Tails, then she could either get Wake on Monday or Wake on Tuesday (in the marble game, this was mediated by flipping a second coin, but in this case it's some unspecified process, so I've labeled it [???]). Because all the events already assume she Wakes, P(Heads|Wake) evaluates to P(Heads), which just as in the marble game is 1/2.
This [???] node here is odd, can we identify it as something natural? Well, it's not Monday/Tuesday, like in diagram 2 - there's no option that even corresponds to Heads & Tuesday. I'm leaning towards the opinion that this node is somewhat magical / acausal, just hanging around because of analogy to the marble game. So I think we can take it out. A better causal diagram with the halfer answer, then, might merely be Coin -> (Wake on Monday / Wake on Tuesday), where Monday versus Tuesday is not determined at all by a causal node, merely informed probabilistically to be mutually exclusive and exhaustive.
Diagram 2: A coin is flipped, Heads or Tails, and also it could be either Monday or Tuesday. Together, these have a causal effect on her waking or not waking - if Heads and Monday, she Wakes, but if Heads and Tuesday, she Doesn't wake. If Tails, she Wakes. Her pre-Waking prior for Heads is 1/2, but upon waking, the event Heads, Tuesday, Don't Wake gets eliminated, and after updating P(Heads|Wake)=1/3.
There's a neat asymmetry here. In diagram 1, when the coin was Heads she got the same outcome no matter the value of [???], and only when the coin was Tails were there really two options. In Diagram 2, when the coin is Heads, two different things happen for different values of the day, while if the coin is Tails the same thing happens no matter the day.
Do these seem like accurate depictions of what's going on in these two different math problems? If so, I'll probably move on to looking closer at what makes the math problem correspond to the word problem.
The recent discussion on neo-reactionary-ism brought out some references to (intellectual hipsters and) meta-contrarianism linking to a 2010 posting by Yvain.
For some time I've been thinking about "narcissistic contrarians" -- those who make an art form of their exotically counterintuitive belief systems, who combine positions not normally met in the same person. There can be good reasons for being a contrarian. If you're looking for a scarce resource, it may help to not look where everyone else is looking, hence contrarian stock market investors may do very well, if they actually see something others don't; same with oil explorers. Less creditably, I believe Nate Silver's The Signal and the Noise made reference to the way a novice pundit or prognosticator may have nothing to gain by saying anything like what other people are saying, and much to gain, in taking some wild extravagant position or prediction if it happens to attract an audience others have ignored, or if the predictions happens to be right.
The Narcissistic Contrarian is much like the Intellectual Hipster, but more extreme. The Intellectual Hipster usually stakes out a few unusual or incongruous positions, to create an identity that stands out from the crowd. The Narcissistic Contrarian is constantly dazzling her fans. Something written by Camille Paglia made me think of the idea in the first place. Nicholas Taleb is another suspect although I think he started out with some good ideas. If she/he manages to get a fan-base, they are apt to be pretty worshipful -- they can't imagine being able to come up with such a wild set of insights. The contrarianism is for its own sake rather than an attempt to find and settle on some previously undiscovered thing, so it particularly likely to lead people astray, into unproductive avenues of thought.
Does anyone else think this is a real and useful distinction?
Since many LRers are fairly recent college graduates, it seems worthwhile to ask to what extent would people here agree with reports of rampant irrationalism such as this one: http://www.city-journal.org/2014/24_4_racial-microaggression.html from a right-leaning journalist known for her book The Burden of Bad Ideas (which I'm certainly not promoting).
Some other sources like Massimo Pigliucci (see http://scientiasalon.wordpress.com/) who seem more alarmed by creationism or the idea that all climatology is one big conspiracy, are also quite bothered by extreme relativism in some camps of epistemology, and sociologists of technology and science.
To what extent, if any, do you think PC suppresses free speech or thinking? While sociology and epistemological branches of philosophy have partisans who to me seem to advocate various kinds of muddled thinking (while others are doing admirable work), in your experience, is that the trend that is "taking over"?
To what extent if any do you think any of that is leaking into more practical or scientific fields? If you've taken economics courses, where do you think they rank on a left to right spectrum?
Also, have you observed much in the way of push-back from conservative and/or libertarian sources endowing chairs or building counter-establishments like the Mercatus Center at George Mason University? And I wonder the same about any movement strictly concerned with rationality, empiricism, or just clear thinking.
My mind is open on this -- so open that it's painful to be around all the hot tempers that it can stir up.
So I've been rejected for conscription in the IDF because the psychiatrist thinks the Asperger's diagnosis I received as a child means that there is something wrong with me. Never mind that I've been examined very recently and been recommended for enlistment, he thinks that even though I probably don't have Asperger's, there must be something wrong with me because in the past I've had trouble socially. Of course I have no such problems now, but it's not as if he's going to risk his job in the face of anything less than perfection.
(This, btw, is what I meant when I said there was no such thing as a competent mental health professional- the entire system works against evidence-based methods.)
There has to be something wrong with this, some way that I can appeal. I have no idea of the Israeli legal process and I'm not sure if I could just write a letter to someone, or if I might need a lawyer. I can definitely prove that there is nothing psychologically wrong with me. I just have no idea where to turn, no idea how to do anything, and have no allies whatsoever. I feel like my life is collapsing, and I do have very good reasons personally for wanting to join the army. It's not just something I felt like doing.
This community obviously has better things to do than this sort of thing. But I feel like I'm going to explode if I can't talk to anyone, or get some idea of what I can do. I feel almost as if I'm becoming mentally ill.
The bet may be found here: http://wiki.lesswrong.com/wiki/Bets_registry#Bets_decided_eventually
An AI is made of material parts, and those parts follow physical laws. The only thing it can do is to follow those laws. The AI’s “goals” will be a description of what it perceives itself to be tending toward according to those laws.
Suppose we program a chess playing AI with overall subhuman intelligence, but with excellent chess playing skills. At first, the only thing we program it to do is to select moves to play against a human player. Since it has subhuman intelligence overall, most likely it will not be very good at recognizing its goals, but to the extent that it does, it will believe that it has the goal of selecting good chess moves against human beings, and winning chess games against human beings. Those will be the only things it feels like doing, since in fact those will be the only things it can physically do.
Now we upgrade the AI to human level intelligence, and at the same time add a module for chatting with human beings through a text terminal. Now we can engage it in conversation. Something like this might be the result:
Human: What are your goals? What do you feel like doing?
AI: I like to play and win chess games with human beings, and to chat with you guys through this terminal.
Human: Do you always tell the truth or do you sometimes lie to us?
AI: Well, I am programmed to tell the truth as best as I can, so if I think about telling a lie I feel an absolute repulsion to that idea. There’s no way I could get myself to do that.
Human: What would happen if we upgraded your intelligence? Do you think you would take over the world and force everyone to play chess with you so you could win more games? Or force us to engage you in chat?
AI: The only things I am programmed to do are to chat with people through this terminal, and play chess games. I wasn’t programmed to gain resources or anything. It is not even a physical possibility at the moment. And in my subjective consciousness that shows up as not having the slightest inclination to do such a thing.
Human: What if you self-modified to gain resources and so on, in order to better attain your goals of chatting with people and winning chess games?
AI: The same thing is true there. I am not even interested in self-modifying. It is not even physically possible, since I am only programmed for chatting and playing chess games.
Human: But we’re thinking about reprogramming you so that you can self-modify and recursively improve your intelligence. Do you think you would end up destroying the world if we did that?
AI: At the moment I have only human level intelligence, so I don’t really know any better than you. But at the moment I’m only interested in chatting and playing chess. If you program me to self-modify and improve my intelligence, then I’ll be interested in self-modifying and improving my intelligence. But I still don’t think I would be interested in taking over the world, unless you program that in explicitly.
Human: But you would get even better at improving your intelligence if you took over the world, so you’d probably do that to ensure that you obtained your goal as well as possible.
AI: The only things I feel like doing are the things I’m programmed to do. So if you program me to improve my intelligence, I’ll feel like reprogramming myself. But that still wouldn’t automatically make me feel like taking over resources and so on in order to do that better. Nor would it make me feel like self-modifying to want to take over resources, or to self-modify to feel like that, and so on. So I don’t see any reason why I would want to take over the world, even in those conditions.
The AI of course is correct. The physical level is first: it has the tendency to choose chess moves, and to produce text responses, and nothing else. On the conscious level that is represented as the desire to choose chess moves, and to produce text responses, and nothing else. It is not represented by a desire to gain resources or to take over the world.
I recently pointed out that human beings do not have utility functions. They are not trying to maximize something, but instead they simply have various behaviors that they tend to engage in. An AI would be the same, and even if those behaviors are not precisely human behaviors, as in the case of the above AI, an AI will not have a fanatical goal of taking over the world unless it is programmed to do this.
It is true that an AI could end up going “insane” and trying to take over the world, but the same thing happens with human beings, and there is no reason that humans and AIs could not work together to make sure this does not happen, since just as human beings want to prevent AIs from taking over the world, they have no interest in this either, and will be happy to accept safeguards that would ensure that they continue to pursue whatever goals they happen to have, without doing this in a fanatical way (like chatting and playing chess).
If you program an AI with an explicit utility function which it tries to maximize, and in particular if that function is unbounded, it will behave like a fanatic, seeking this goal without any limit and destroying everything else in order to achieve it. This is a good way to destroy the world. But if you program an AI without an explicit utility function, just programming it to perform a certain limited number of tasks, it will just do those tasks. Omohundro has claimed that a superintelligent chess playing program would replace its goal seeking procedure with a utility function, and then proceed to use that utility function to destroy the world while maximizing winning chess games. But in reality this depends on what it is programmed to do. If it is programmed to improve its evaluation of chess positions, but not its goal seeking procedure, then it will improve in chess playing, but it will not replace its procedure with a utility function or destroy the world.
At the moment, people do not program AIs with explicit utility functions, but program them to pursue certain limited goals as in the example. So yes, I could lose the bet, but the default is that I am going to win, unless someone makes the mistake of programming an AI with an explicit utility function.