Filter This month

## The 12 Second Rule (i.e. think before answering) and other Epistemic Norms

16 05 September 2016 11:08PM

Epistemic Status/Effort: I'm 85% confident this is a good idea, and that the broader idea is at least a good direction. Have gotten feedback from a few people and spend some time actively thinking through ramifications of it. Interested in more feedback.

## TLDR:

1) When asking a group a question, i.e. "what do you think about X?", ask people to wait 12 seconds, to give each other time to think. If you notice someone else ask a question and people immediately answering, suggest people pause the conversation until people have had some time to think. (Probably specific mention "12 second rule" to give people a handy tag to remember)

2) In general, look for opportunities to improve or share social norms that'll help your community think more clearly, and show appreciation when others do so (i.e. "Epistemic Norms")

(this was originally conceived for the self-described "rationality" community, but I think is a good idea any group that'd like to improve their critical thinking as well as creativity.)

There are three reasons the 12-second rule seems important to me:

• On an individual level, it makes it easier to think of the best answer, rather than going with your cached thought.
• On the group level, it makes it easier to prevent anchoring/conformity/priming effects.
• Also on the group level, it means that people take longer to think of answers get to practice actually thinking for themselves
If you're using it with people who aren't familiar with it, make sure to briefly summarize what you're doing and why.

Elaboration:

While visiting rationalist friends in SF, I was participating in a small conversation (about six participants) in which someone asked a question. Immediately, one person said "I think Y. Or maybe Z." A couple other people said "Yeah. Y or Z, or... maybe W or V?" But the conversation was already anchored around the initial answers.

I said "hey, shouldn't we stop to each think first?" (this happens to be a thing my friends in NYC do). And I was somewhat surprised that the response was more like "oh, I guess that's a good idea" than "oh yeah whoops I forgot."

It seemed like a fairly obvious social norm for a community that prides itself on rationality, and while the question wasn't *super* important, I think its helpful to practice this sort of social norm on a day-to-day basis.

This prompted some broader questions - it occurred to me there were likely norms and ideas other people had developed in their local networks that I probably wasn't aware of. Given that there's no central authority on "good epistemic norms", how do we develop them and get them to spread? There's a couple people with popular blogs who sometimes propose new norms which maybe catch on, and some people still sharing good ideas on Less Wrong, effective-altruism.com, or facebook. But it doesn't seem like those ideas necessarily reach saturation.

## Atrophied Skills

The first three years I spent in the rationality community, my perception is that my strategic thinking and ability to think through complex problems actually *deteriorated*. It's possible that I was just surrounded by smarter people than me for the first time, but I'm fairly confident that I specifically acquired the habit of "when I need help thinking through a problem, the first step is not to think about it myself, but to ask smart people around me for help."

Eventually I was hired by a startup, and I found myself in a position where the default course for the company was to leave some important value on the table. (I was working in an EA-adjaecent company, and wanted to push it in a more Effective Altruism-y direction with higher rigor). There was nobody else I could turn to for help. I had to think through what "better epistemic rigor" actually meant and how to apply it in this situation.

Whether or not my rationality had atrophied in the past 3 years, I'm certain that for the first time in long while, certain mental muscles *flexed* that I hadn't been using. Ultimately I don't know whether my ideas had a noteworthy effect on the company, but I do know that I felt more empowered and excited to improve my own rationality.

I realized that, in the NYC meetups, quicker-thinking people tended to say what they thought immediately when a question was asked, and this meant that most of the people in the meetup didn't get to practice thinking through complex questions. So I started asking people to wait for a while before answering - sometimes 5 minutes, sometimes just a few seconds.

"12 seconds" seems like a nice rule-of-thumb to avoid completely interrupting the flow of conversation, while still having some time to reflect, and make sure you're not just shouting out a cached thought. It's a non-standard number which is hopefully easier to remember.

(That said, a more nuanced alternative is "everyone takes a moment to think until they feel like they're hitting diminishing returns on thinking or it's not worth further halting the conversation, and then raising a finger to indicate that they're done")

## Meta Point: Observation, Improvement and Sharing

The 12-second rule isn't the main point though - just one of many ways this community could do a better job of helping both newcomers and old-timers hone their thinking skills. "Rationality" is supposed to be our thing. I think we should all be on the lookout for opportunities to improve our collective ability to think clearly.

I think specific conversational habits are helpful both for their concrete, immediate benefits, as well as an opportunity to remind everyone (newcomers and old-timers alike) that we're trying to actively improve in this area.

I have more thoughts on how to go about improving the meta-issues here, which I'm less confident and will flesh out in future posts.

## Neutralizing Physical Annoyances

12 12 September 2016 04:36PM

Once in a while, I learn something about a seemingly unrelated topic - such as freediving - and I take away some trick that is well known and "obvious" in that topic, but is generally useful and NOT known by many people outside. Case in point, you can use equalization techniques from diving to remove pressure in your ears when you descend in a plane or a fast lift. I also give some other examples.

Ears

Reading about a few equalization techniques took me maybe 5 minutes, and after reading this passage once I was able to successfully use the "Frenzel Maneuver":

The technique is to close off the vocal cords, as though you are about to lift a heavy weight. The nostrils are pinched closed and an effort is made to make a 'k' or a 'guh' sound. By doing this you raise the back of the tongue and the 'Adam's Apple' will elevate. This turns the tongue into a piston, pushing air up.

Hiccups

A few years ago, I started regularly doing deep relaxations after yoga. At some point, I learned how to relax my throat in such a way that the air can freely escape from the stomach. Since then, whenever I start hiccuping, I relax my throat and the hiccups stop immediately in all cases. I am now 100% hiccup-free.

Stiff Shoulders

I've spent a few hours with a friend who is doing massage, and they taught me some basics. After that, it became natural for me to self-massage my shoulders after I do a lot of sitting work etc. I can't imagine living without this anymore.

Other?

If you know more, please share!

## UC Berkeley launches Center for Human-Compatible Artificial Intelligence

10 29 August 2016 10:43PM

UC Berkeley artificial intelligence (AI) expert Stuart Russell will lead a new Center for Human-Compatible Artificial Intelligence, launched this week.

Russell, a UC Berkeley professor of electrical engineering and computer sciences and the Smith-Zadeh Professor in Engineering, is co-author of Artificial Intelligence: A Modern Approach, which is considered the standard text in the field of artificial intelligence, and has been an advocate for incorporating human values into the design of AI.

The primary focus of the new center is to ensure that AI systems are beneficial to humans, he said.

The co-principal investigators for the new center include computer scientists Pieter Abbeel and Anca Dragan and cognitive scientist Tom Griffiths, all from UC Berkeley; computer scientists Bart Selman and Joseph Halpern, from Cornell University; and AI experts Michael Wellman and Satinder Singh Baveja, from the University of Michigan. Russell said the center expects to add collaborators with related expertise in economics, philosophy and other social sciences.

The center is being launched with a grant of 5.5 million from the Open Philanthropy Project, with additional grants for the center’s research from the Leverhulme Trust and the Future of Life Institute. Russell is quick to dismiss the imaginary threat from the sentient, evil robots of science fiction. The issue, he said, is that machines as we currently design them in fields like AI, robotics, control theory and operations research take the objectives that we humans give them very literally. Told to clean the bath, a domestic robot might, like the Cat in the Hat, use mother’s white dress, not understanding that the value of a clean dress is greater than the value of a clean bath. The center will work on ways to guarantee that the most sophisticated AI systems of the future, which may be entrusted with control of critical infrastructure and may provide essential services to billions of people, will act in a manner that is aligned with human values. “AI systems must remain under human control, with suitable constraints on behavior, despite capabilities that may eventually exceed our own,” Russell said. “This means we need cast-iron formal proofs, not just good intentions.” One approach Russell and others are exploring is called inverse reinforcement learning, through which a robot can learn about human values by observing human behavior. By watching people dragging themselves out of bed in the morning and going through the grinding, hissing and steaming motions of making a caffè latte, for example, the robot learns something about the value of coffee to humans at that time of day. “Rather than have robot designers specify the values, which would probably be a disaster,” said Russell, “instead the robots will observe and learn from people. Not just by watching, but also by reading. Almost everything ever written down is about people doing things, and other people having opinions about it. All of that is useful evidence.” Russell and his colleagues don’t expect this to be an easy task. “People are highly varied in their values and far from perfect in putting them into practice,” he acknowledged. “These aspects cause problems for a robot trying to learn what it is that we want and to navigate the often conflicting desires of different individuals.” Russell, who recently wrote an optimistic article titled “Will They Make Us Better People?,” summed it up this way: “In the process of figuring out what values robots should optimize, we are making explicit the idealization of ourselves as humans. As we envision AI aligned with human values, that process might cause us to think more about how we ourselves really should behave, and we might learn that we have more in common with people of other cultures than we think.” ## [Link] My Interview with Dilbert creator Scott Adams 9 13 September 2016 05:22AM In the second half of the interview we discussed several topics of importance to the LW community including cryonics, unfriendly AI, and eliminating mosquitoes. https://soundcloud.com/user-519115521/scott-adams-dilbert-interview ## 2016 LessWrong Diaspora Survey Analysis: Part Four (Politics, Calibration & Probability, Futurology, Charity & Effective Altruism) 9 10 September 2016 03:51AM ## Politics The LessWrong survey has a very involved section dedicated to politics. In previous analysis the benefits of this weren't fully realized. In the 2016 analysis we can look at not just the political affiliation of a respondent, but what beliefs are associated with a certain affiliation. The charts below summarize most of the results. ### Political Opinions By Political Affiliation ### Miscellaneous Politics There were also some other questions in this section which aren't covered by the above charts. #### Voting Did you vote in your country's last major national election? (LW Turnout Versus General Election Turnout By Country) Group Turnout LessWrong 68.9% Austrailia 91% Brazil 78.90% Britain 66.4% Canada 68.3% Finland 70.1% France 79.48% Germany 71.5% India 66.3% Israel 72% New Zealand 77.90% Russia 65.25% United States 54.9% Numbers taken from Wikipedia, accurate as of the last general election in each country listed at time of writing. ## Calibration And Probability Questions ### Calibration Questions I just couldn't analyze these, sorry guys. I put many hours into trying to get them into a decent format I could even read and that sucked up an incredible amount of time. It's why this part of the survey took so long to get out. Thankfully another LessWrong user, Houshalter, has kindly done their own analysis. All my calibration questions were meant to satisfy a few essential properties: 1. They should be 'self contained'. I.E, something you can reasonably answer or at least try to answer with a 5th grade science education and normal life experience. 2. They should, at least to a certain extent, be Fermi Estimable. 3. They should progressively scale in difficulty so you can see whether somebody understands basic probability or not. (eg. In an 'or' question do they put a probability of less than 50% of being right?) At least one person requested a workbook, so I might write more in the future. I'll obviously write more for the survey. ### Probability Questions Question Mean Median Mode Stdev Please give the obvious answer to this question, so I can automatically throw away all surveys that don't follow the rules: What is the probability of a fair coin coming up heads? 49.821 50.0 50.0 3.033 What is the probability that the Many Worlds interpretation of quantum mechanics is more or less correct? 44.599 50.0 50.0 29.193 What is the probability that non-human, non-Earthly intelligent life exists in the observable universe? 75.727 90.0 99.0 31.893 ...in the Milky Way galaxy? 45.966 50.0 10.0 38.395 What is the probability that supernatural events (including God, ghosts, magic, etc) have occurred since the beginning of the universe? 13.575 1.0 1.0 27.576 What is the probability that there is a god, defined as a supernatural intelligent entity who created the universe? 15.474 1.0 1.0 27.891 What is the probability that any of humankind's revealed religions is more or less correct? 10.624 0.5 1.0 26.257 What is the probability that an average person cryonically frozen today will be successfully restored to life at some future time, conditional on no global catastrophe destroying civilization before then? 21.225 10.0 5.0 26.782 What is the probability that at least one person living at this moment will reach an age of one thousand years, conditional on no global catastrophe destroying civilization in that time? 25.263 10.0 1.0 30.510 What is the probability that our universe is a simulation? 25.256 10.0 50.0 28.404 What is the probability that significant global warming is occurring or will soon occur, and is primarily caused by human actions? 83.307 90.0 90.0 23.167 What is the probability that the human race will make it to 2100 without any catastrophe that wipes out more than 90% of humanity? 76.310 80.0 80.0 22.933 Probability questions is probably the area of the survey I put the least effort into. My plan for next year is to overhaul these sections entirely and try including some Tetlock-esque forecasting questions, a link to some advice on how to make good predictions, etc. ## Futurology This section got a bit of a facelift this year. Including new cryonics questions, genetic engineering, and technological unemployment in addition to the previous years. ### Cryonics Interestingly enough, of those who think it will work with enough confidence to say 'yes', only 14 are actually signed up for cryonics. sqlite> select count(*) from data where CryonicsNow="Yes" and Cryonics="Yes - signed up or just finishing up paperwork"; 14 sqlite> select count(*) from data where CryonicsNow="Yes" and (Cryonics="Yes - signed up or just finishing up paperwork" OR Cryonics="No - would like to sign up but unavailable in my area" OR "No - would like to sign up but haven't gotten around to it" OR "No - would like to sign up but can't afford it"); 34 LessWrongers seem to be very bullish on the underlying physics of cryonics even if they're not as enthusiastic about current methods in use. The Brain Preservation Foundation also did an analysis of cryonics responses to the LessWrong Survey. ### Singularity #### SingularityYear By what year do you think the Singularity will occur? Answer such that you think, conditional on the Singularity occurring, there is an even chance of the Singularity falling before or after this year. If you think a singularity is so unlikely you don't even want to condition on it, leave this question blank. Mean: 8.110300081581755e+16 Median: 2080.0 Mode: 2100.0 Stdev: 2.847858859055733e+18 I didn't bother to filter out the silly answers for this. Obviously it's a bit hard to see without filtering out the uber-large answers, but the median doesn't seem to have changed much from the 2014 survey. ### Genetic Engineering Well that's fairly overwhelming. I find it amusing how the strict "No" group shrinks considerably after this question. This question is too important to just not have an answer to so I'll do it manually. Unfortunately I can't easily remove the 'excluded' entries so that we're dealing with the exact same distribution but only 13 or so responses are filtered out anyway. sqlite> select count(*) from data where GeneticImprovement="Yes"; 1100 >>> 1100 + 176 + 262 + 84 1622 >>> 1100 / 1622 0.6781750924784217 67.8% are willing to genetically engineer their children for improvements. These numbers go about how you would expect, with people being progressively less interested the more 'shallow' a genetic change is seen as. All three of these seem largely consistent with peoples personal preferences about modification. Were I inclined I could do a deeper analysis that actually takes survey respondents row by row and looks at correlation between preference for ones own children and preference for others. ### Technological Unemployment #### LudditeFallacy Do you think the Luddite's Fallacy is an actual fallacy? Yes: 443 (30.936%) No: 989 (69.064%) We can use this as an overall measure of worry about technological unemployment, which would seem to be high among the LW demographic. #### UnemploymentYear By what year do you think the majority of people in your country will have trouble finding employment for automation related reasons? If you think this is something that will never happen leave this question blank. Mean: 2102.9713740458014 Median: 2050.0 Mode: 2050.0 Stdev: 1180.2342850727339 Question is flawed because you can't distinguish answers of "never happen" from people who just didn't see it. Interesting question that would be fun to take a look at in comparison to the estimates for the singularity. #### EndOfWork Do you think the "end of work" would be a good thing? Yes: 1238 (81.287%) No: 285 (18.713%) Fairly overwhelming consensus, but with a significant minority of people who have a dissenting opinion. #### EndOfWorkConcerns If machines end all or almost all employment, what are your biggest worries? Pick two. Question Count Percent People will just idle about in destructive ways 513 16.71% People need work to be fulfilled and if we eliminate work we'll all feel deep existential angst 543 17.687% The rich are going to take all the resources for themselves and leave the rest of us to starve or live in poverty 1066 34.723% The machines won't need us, and we'll starve to death or be otherwise liquidated 416 13.55% Question is flawed because it demanded the user 'pick two' instead of up to two. The plurality of worries are about elites who refuse to share their wealth. ### Existential Risk #### XRiskType Which disaster do you think is most likely to wipe out greater than 90% of humanity before the year 2100? Nuclear war: +4.800% 326 (20.6%) Asteroid strike: -0.200% 64 (4.1%) Unfriendly AI: +1.000% 271 (17.2%) Nanotech / grey goo: -2.000% 18 (1.1%) Pandemic (natural): +0.100% 120 (7.6%) Pandemic (bioengineered): +1.900% 355 (22.5%) Environmental collapse (including global warming): +1.500% 252 (16.0%) Economic / political collapse: -1.400% 136 (8.6%) Other: 35 (2.217%) Significantly more people worried about Nuclear War than last year. Effect of new respondents, or geopolitical situation? Who knows. ## Charity And Effective Altruism ### Charitable Giving #### Income What is your approximate annual income in US dollars (non-Americans: convert at www.xe.com)? Obviously you don't need to answer this question if you don't want to. Please don't include commas or dollar signs. Sum: 66054140.47384 Mean: 64569.052271593355 Median: 40000.0 Mode: 30000.0 Stdev: 107297.53606321265 #### IncomeCharityPortion How much money, in number of dollars, have you donated to charity over the past year? (non-Americans: convert to dollars at http://www.xe.com/ ). Please don't include commas or dollar signs in your answer. For example, 4000 Sum: 2389900.6530000004 Mean: 2914.5129914634144 Median: 353.0 Mode: 100.0 Stdev: 9471.962766896671 #### XriskCharity How much money have you donated to charities aiming to reduce existential risk (other than MIRI/CFAR) in the past year? Sum: 169300.89 Mean: 1991.7751764705883 Median: 200.0 Mode: 100.0 Stdev: 9219.941506342007 #### CharityDonations How much have you donated in US dollars to the following charities in the past year? (Non-americans: convert to dollars at http://www.xe.com/) Please don't include commas or dollar signs in your answer. Options starting with "any" aren't the name of a charity but a category of charity. Question Sum Mean Median Mode Stdev Against Malaria Foundation 483935.027 1905.256 300.0 None 7216.020 Schistosomiasis Control Initiative 47908.0 840.491 200.0 1000.0 1618.785 Deworm the World Initiative 28820.0 565.098 150.0 500.0 1432.712 GiveDirectly 154410.177 1429.723 450.0 50.0 3472.082 Any kind of animal rights charity 83130.47 1093.821 154.235 500.0 2313.493 Any kind of bug rights charity 1083.0 270.75 157.5 None 353.396 Machine Intelligence Research Institute 141792.5 1417.925 100.0 100.0 5370.485 Any charity combating nuclear existential risk 491.0 81.833 75.0 100.0 68.060 Any charity combating global warming 13012.0 245.509 100.0 10.0 365.542 Center For Applied Rationality 127101.0 3177.525 150.0 100.0 12969.096 Strategies for Engineered Negligible Senescence Research Foundation 9429.0 554.647 100.0 20.0 1156.431 Wikipedia 12765.5 53.189 20.0 10.0 126.444 Internet Archive 2975.04 80.406 30.0 50.0 173.791 Any campaign for political office 38443.99 366.133 50.0 50.0 1374.305 Other 564890.46 1661.442 200.0 100.0 4670.805 "Bug Rights" charity was supposed to be a troll fakeout but apparently... This table is interesting given the recent debates about how much money certain causes are 'taking up' in Effective Altruism. ### Effective Altruism #### Vegetarian Do you follow any dietary restrictions related to animal products? Yes, I am vegan: 54 (3.4%) Yes, I am vegetarian: 158 (10.0%) Yes, I restrict meat some other way (pescetarian, flexitarian, try to only eat ethically sourced meat): 375 (23.7%) No: 996 (62.9%) #### EAKnowledge Do you know what Effective Altruism is? Yes: 1562 (89.3%) No but I've heard of it: 114 (6.5%) No: 74 (4.2%) #### EAIdentity Do you self-identify as an Effective Altruist? Yes: 665 (39.233%) No: 1030 (60.767%) The distribution given by the 2014 survey results does not sum to one, so it's difficult to determine if Effective Altruism's membership actually went up or not but if we take the numbers at face value it experienced an 11.13% increase in membership. #### EACommunity Do you participate in the Effective Altruism community? Yes: 314 (18.427%) No: 1390 (81.573%) Same issue as last, taking the numbers at face value community participation went up by 5.727% #### EADonations Has Effective Altruism caused you to make donations you otherwise wouldn't? Yes: 666 (39.269%) No: 1030 (60.731%) Wowza! ### Effective Altruist Anxiety #### EAAnxiety Have you ever had any kind of moral anxiety over Effective Altruism? Yes: 501 (29.6%) Yes but only because I worry about everything: 184 (10.9%) No: 1008 (59.5%) There's an ongoing debate in Effective Altruism about what kind of rhetorical strategy is best for getting people on board and whether Effective Altruism is causing people significant moral anxiety. It certainly appears to be. But is moral anxiety effective? Let's look: Sample Size: 244 Average amount of money donated by people anxious about EA who aren't EAs: 257.5409836065574 Sample Size: 679 Average amount of money donated by people who aren't anxious about EA who aren't EAs: 479.7501384388807 Sample Size: 249 Average amount of money donated by EAs anxious about EA: 1841.5292369477913 Sample Size: 314 Average amount of money donated by EAs not anxious about EA: 1837.8248407643312 It seems fairly conclusive that anxiety is not a good way to get people to donate more than they already are, but is it a good way to get people to become Effective Altruists? Sample Size: 1685 P(Effective Altruist): 0.3940652818991098 P(EA Anxiety): 0.29554896142433235 P(Effective Altruist | EA Anxiety): 0.5 Maybe. There is of course an argument to be made that sufficient good done by causing people anxiety outweighs feeding into peoples scrupulosity, but it can be discussed after I get through explaining it on the phone to wealthy PR-conscious donors and telling the local all-kill shelter where I want my shipment of dead kittens. #### EAOpinion What's your overall opinion of Effective Altruism? Positive: 809 (47.6%) Mostly Positive: 535 (31.5%) No strong opinion: 258 (15.2%) Mostly Negative: 75 (4.4%) Negative: 24 (1.4%) EA appears to be doing a pretty good job of getting people to like them. ### Interesting Tables Charity Donations By Political Affilation Affiliation Income Charity Contributions % Income Donated To Charity Total Survey Charity % Sample Size Anarchist 1677900.0 72386.0 4.314% 3.004% 50 Communist 298700.0 19190.0 6.425% 0.796% 13 Conservative 1963000.04 62945.04 3.207% 2.612% 38 Futarchist 1497494.1099999999 166254.0 11.102% 6.899% 31 Left-Libertarian 9681635.613839999 416084.0 4.298% 17.266% 245 Libertarian 11698523.0 214101.0 1.83% 8.885% 190 Moderate 3225475.0 90518.0 2.806% 3.756% 67 Neoreactionary 1383976.0 30890.0 2.232% 1.282% 28 Objectivist 399000.0 1310.0 0.328% 0.054% 10 Other 3150618.0 85272.0 2.707% 3.539% 132 Pragmatist 5087007.609999999 266836.0 5.245% 11.073% 131 Progressive 8455500.440000001 368742.78 4.361% 15.302% 217 Social Democrat 8000266.54 218052.5 2.726% 9.049% 237 Socialist 2621693.66 78484.0 2.994% 3.257% 126 Number Of Effective Altruists In The Diaspora Communities Community Count % In Community Sample Size LessWrong 136 38.418% 354 LessWrong Meetups 109 50.463% 216 LessWrong Facebook Group 83 48.256% 172 LessWrong Slack 22 39.286% 56 SlateStarCodex 343 40.98% 837 Rationalist Tumblr 175 49.716% 352 Rationalist Facebook 89 58.94% 151 Rationalist Twitter 24 40.0% 60 Effective Altruism Hub 86 86.869% 99 Good Judgement(TM) Open 23 74.194% 31 PredictionBook 31 51.667% 60 Hacker News 91 35.968% 253 #lesswrong on freenode 19 24.675% 77 #slatestarcodex on freenode 9 24.324% 37 #chapelperilous on freenode 2 18.182% 11 /r/rational 117 42.545% 275 /r/HPMOR 110 47.414% 232 /r/SlateStarCodex 93 37.959% 245 One or more private 'rationalist' groups 91 47.15% 193 Effective Altruist Donations By Political Affiliation Affiliation EA Income EA Charity Sample Size Anarchist 761000.0 57500.0 18 Futarchist 559850.0 114830.0 15 Left-Libertarian 5332856.0 361975.0 112 Libertarian 2725390.0 114732.0 53 Moderate 583247.0 56495.0 22 Other 1428978.0 69950.0 49 Pragmatist 1442211.0 43780.0 43 Progressive 4004097.0 304337.78 107 Social Democrat 3423487.45 149199.0 93 Socialist 678360.0 34751.0 41 ## Jocko Podcast 9 06 September 2016 03:38PM I've recently been extracting extraordinary value from the Jocko Podcast. Jocko Willink is a retired Navy SEAL commander, jiu-jitsu black belt, management consultant and, in my opinion, master rationalist. His podcast typically consists of detailed analysis of some book on military history or strategy followed by a hands-on Q&A session. Last week's episode (#38) was particularly good and if you want to just dive in, I would start there. As a sales pitch, I'll briefly describe some of his recurring talking points: • Extreme ownership. Take ownership of all outcomes. If your superior gave you "bad orders", you should have challenged the orders or adapted them better to the situation; if your subordinates failed to carry out a task, then it is your own instructions to them that were insufficient. If the failure is entirely your own, admit your mistake and humbly open yourself to feedback. By taking on this attitude you become a better leader and through modeling you promote greater ownership throughout your organization. I don't think I have to point out the similarities between this and "Heroic Morality" we talk about around here. • Mental toughness and discipline. Jocko's language around this topic is particularly refreshing, speaking as someone who has spent too much time around "self help" literature, in which I would partly include Less Wrong. His ideas are not particularly new, but it is valuable to have an example of somebody who reliably executes on his the philosophy of "Decide to do it, then do it." If you find that you didn't do it, then you didn't truly decide to do it. In any case, your own choice or lack thereof is the only factor. "Discipline is freedom." If you adopt this habit as your reality, it become true. • Decentralized command. This refers specifically to his leadership philosophy. Every subordinate needs to truly understand the leader's intent in order to execute instructions in a creative and adaptable way. Individuals within a structure need to understand the high-level goals well enough to be able to act in a almost all situations without consulting their superiors. This tightens the OODA loop on an organizational level. • Leadership as manipulation. Perhaps the greatest surprise to me was the subtlety of Jocko's thinking about leadership, probably because I brought in many erroneous assumptions about the nature of a SEAL commander. Jocko talks constantly about using self-awareness, detachment from one's ideas, control of one's own emotions, awareness of how one is perceived, and perspective-taking of one's subordinates and superiors. He comes off more as HPMOR!Quirrell than as a "drill sergeant". The Q&A sessions, in which he answers questions asked by his fans on Twitter, tend to be very valuable. It's one thing to read the bullet points above, nod your head and say, "That sounds good." It's another to have Jocko walk through the tactical implementation of this ideas in a wide variety of daily situations, ranging from parenting difficulties to office misunderstandings. For a taste of Jocko, maybe start with his appearance on the Tim Ferriss podcast or the Sam Harris podcast. ## CrowdAnki comprehensive JSON representation of Anki Decks to facilitate collaboration 7 18 September 2016 10:59AM Hi everyone :). I like Anki, find it quite useful and use it daily. There is one thing that constantly annoyed me about it, though - the state of shared decks and of infrastructure around them. There is a lot of topics that are of common interest for a large number of people, and there is usually some shared decks available for these topics. The problem with them is that as they are usually decks created by individuals for their own purposes and uploaded to ankiweb. So they are often incomplete/of mediocre quality/etc and they are rarely supported or updated. And there is no way to collaborate on the creation or improvement of such decks, as there is no infrastructure for it and the format of the decks won't allow you to use common collaboration infrastructure (e.g. Github). So I've been recently working on a plugin for Anki that will allow you to make a full-feature Import/Export to/from JSON. What I mean by full-feature is that it exports not just cards converted to JSON, but Notes, Decks, Models, Media etc. So you can do export, modify result, or merge changes from someone else and on Import, those changes would be reflected on your existing cards/decks and no information/metadata/etc would be lost. The point is to provide a format that will enable collaboration using mentioned common collaboration infrastructure. So using it you can easily work with multiple people to create a deck, collaborating for example, via Github, and then deck could be updated and improved by contributions from other people. I'm looking for early adopters and for feedback :). The ankiweb page for plugin (that's where you can get the plugin): https://ankiweb.net/shared/info/1788670778 Some of my decks, on a Github (btw by using plugin, you can get decks directly from Github): Regular expressions deck: https://github.com/Stvad/Software_Engineering__Regular_Expressions Deck based on article Twenty rules of formulating knowledge by Piotr Wozniak: https://github.com/Stvad/Learning__How-to-Formulate-Knowledge You're welcome to use this decks and contribute back the improvements. ## The map of ideas how the Universe appeared from nothing 7 02 September 2016 04:49PM There is a question which is especially disturbing during sleepless August nights, and which could cut your train of thought with existential worry at any unpredictable moment. The question is, “Why does anything exist at all?” It seems more logical that nothing will ever exist. A more specific form of the question is “How has our universe appeared from nothing?” The last question has some hidden assumptions (about time, universe, nothing and causality), but it is also is more concrete. Let’s try to put these thoughts into some form of “logical equation”: 1.”Nothingness + deterministic causality = non existence” 2. But “I = exist”. So something is wrong in this set of conjectures. If the first conjecture is false, then either nothingness is able to create existence, or causality is able to create it, or existence is not existence. There is also a chance that our binary logic is wrong. Listing these possibilities we can create a map of solutions of the “nothingness problem”. There are two (main) ways in which we could try to answer this question: we could go UP from a logical-philosophical level, or we could go DOWN using our best physical theories to the moment of the universe’s appearance and the nature of causality. Our theories of general relativity, QM and inflation are good for describing the (almost) beginning of the universe. As Krauss showed, the only thing we need is a random generator of simple physical laws in the beginning. But the origin of this thing is still not clear. There is a gap between these two levels of the explanation, and a really good theory should be able to fill it, that is to show the way between first existing thing and smallest working set of physical laws (and Woldram’s idea about cellular automata is one of such possible bridges). But we don’t need the bridge yet. We need explanation how anything exists at all. How we going to solve the problem? Where we can get information? Possible sources of evidence: 1. Correlation between physical and philosophical theories. There is an interesting way to do so using the fact that the nature of nothingness, causality and existence are somehow presented within the character of physical laws. That is, we could use the type of physical laws we observe as evidence of the nature of causality. While neither physical nor philosophical ways of studying the origin of the universe are sufficient, together they could provide enough information. This evidence comes from QM, where it supports the idea of fluctuations, which is basically ability of nature to create something out of nothing. GR theory also presents idea of cosmological singularity. The evidence also comes from the mathematical simplicity of physical laws. 2. Building the bridge. If we show all steps from nothingness to the basic set of physical laws for at least one plausible way, it will be strong evidence of the correctness of our understanding. 3. Zero logical contradictions. The best answer is the one that is most logical. 4. Using the Copernican mediocrity principle, I am in a typical universe and situation. So what could I conclude about the distribution of various universes? And from this distribution what should I learn about the way it manifested? For example, a mathematical multiverse favors more complex universes; it contradicts the simplicity of observed physical laws and also of my experiences. 5. Introspection. Cogito ergo sum is the simplest introspection and act of self-awareness. But Husserlian phenomenology may also be used. Most probable explanations Most current scientists (who dare to think about it) belong to one of two schools of thoughts: 1. The universe appeared from nothingness, which is not emptiness, but somehow able to create. The main figure here is Krauss. The problem here is that nothingness is presented as some kind of magic substance. 2. The mathematical universe hypothesis (MUH). The main author here is Tegmark. The theory seems logical and economical from the perspective of Occam’s razor, but is not supported by evidence and also implies the existence of some strange things. The main problem is that our universe seems to have developed from one simple point based on our best physical theories. But in the mathematical universe more complex things are equally as probable as simple things, so a typical observer could be extremely complex in an extremely complex world. There are also some problems with the Godel theorem. It also ignores observation and qualia. So the most promising way to create a final theory is to get rid of all mystical answers and words, like “existence” and “nothingness”, and update MUH in such a way that it will naturally favor simple laws and simple observers (with subjective experiences based on qualia). One such patch was suggested by Tegmark in respond to criticism of MUH, a computational universe (CUH), which restricts math objects to computable functions only. It is similar to S.Wolfram’s cellular automata theory. Another approach is the “logical universe”, where logic works instead of causality. It is almost the same as mathematical universe, with one difference: In the math world everything exists simultaneously, like all possible numbers, but in the logical world each number N is a consequence of N-1. As a result, a complex thing exists only if a (finite?) path to it exists through simpler things. And this is exactly what we see in the observable universe. It also means that extremely complex AIs exist, but in the future (or in a multi-level simulation). It also solves the meritocracy problem – I am a typical observer from the class of observer who is still thinking about the origins of the universe. It also prevents mathematical Boltzmann brains, as any of them must have possible pre-history. Logic still exists in nothingness (or elephants could appear from nothingness). So a logical universe also incorporates theories in which the universe appeared from nothing. (We could also update the math world by adding qualia in it as axioms, which would be a “class of different but simple objects”. But I will not go deeper here, as the idea needs more thinking and many pages) So a logical universe seems to me now a good candidate theory for further patching and integration. Usefulness of the question The answer will be useful, as it will help us to find the real nature of reality, including the role of consciousness in it and the fundamental theory of everything, helping us to survive the end of the universe, solve the identity problem, and solve “quantum immortality”. It will help to prevent the halting of future AI if it has to answer the question of whether it really exists or not. Or we will create a philosophical landmine to stop it like the following one: “If you really exist print 1, but if you are only possible AI, print 0”. The structure of the map The map has 10 main blocks which correspond to the main ways of reasoning about how the universe appeared. Each has several subtypes. The map has three colors, which show the plausibility of each theory. Red stands for implausible or disproved theories, green is most consistent and promising explanations, and yellow is everything between. This classification is subjective and presents my current view. I tried to disprove any suggested idea to add falsifiability in the third column of the map. I hope it result in truly Bayesian approach there we have field of evidence, field of all possible hypothesis and This map is paired with “How to survive the end of the Universe” map. The pdf is here: http://immortality-roadmap.com/universeorigin7.pdf Meta: Time used: 27 years of background thinking, 15 days of reading, editing and drawing. Best reading: Parfit – discuss different possibilities, no concrete answerhttp://www.lrb.co.uk/v20/n02/derek-parfit/why-anything-why-this Good text from a famous bloggerhttp://waitbutwhy.com/table/why-is-there-something-instead-of-nothing “Because "nothing" is inherently unstable”http://www.bbc.com/earth/story/20141106-why-does-anything-exist-at-all Here are some interesting answers https://www.quora.com/Why-does-the-universe-exist-Why-is-there-something-rather-than-nothing Krauss “A universe from nothing”https://www.amazon.com/Universe-Nothing-There-Something-Rather/dp/1451624468 Tegmark’s main article, 2007, all MUH and CUH ideas discussed, extensive literature, critics respondedhttp://arxiv.org/pdf/0704.0646.pdf Juergen Schmidhuber. Algorithmic Theories of Everythingdiscusses the measure between various theories of everything; the article is complex, but interestinghttp://arxiv.org/abs/quant-ph/0011122 ToE must explain how the universe appeared https://en.wikipedia.org/wiki/Theory_of_everything  A discussion about the logical contradictions of any final theoryhttps://en.wikipedia.org/wiki/Theory_of_everything_(philosophy“The Price of an Ultimate Theory” Nicholas Rescher Philosophia Naturalis 37 (1):1-20 (2000) Explanation about the mass of the universe and negative gravitational energyhttps://en.wikipedia.org/wiki/Zero-energy_universe ## Heroin model: AI "manipulates" "unmanipulatable" reward 6 22 September 2016 10:27AM A putative new idea for AI control; index here. A conversation with Jessica has revealed that people weren't understanding my points about AI manipulating the learning process. So here's a formal model of a CIRL-style AI, with a prior over human preferences that treats them as an unchangeable historical fact, yet will manipulate human preferences in practice. ## Heroin or no heroin ### The world In this model, the AI has the option of either forcing heroin on a human, or not doing so; these are its only actions. Call these actions F or ~F. The human's subsequent actions are chosen from among five: {strongly seek out heroin, seek out heroin, be indifferent, avoid heroin, strongly avoid heroin}. We can refer to these as a++, a+, a0, a-, and a--. These actions achieve negligible utility, but reveal the human preferences. The facts of the world are: if the AI does force heroin, the human will desperately seek out more heroin; if it doesn't the human will act moderately to avoid it. Thus F→a++ and ~F→a-. ### Human preferences The AI starts with a distribution over various utility or reward functions that the human could have. The function U(+) means the human prefers heroin; U(++) that they prefer it a lot; and conversely U(-) and U(--) that they prefer to avoid taking heroin (U(0) is the null utility where the human is indifferent). It also considers more exotic utilities. Let U(++,-) be the utility where the human strongly prefers heroin, conditional on it being forced on them, but mildly prefers to avoid it, conditional on it not being forced on them. There are twenty-five of these exotic utilities, including things like U(--,++), U(0,++), U(-,0), and so on. But only twenty of them are new: U(++,++)=U(++), U(+,+)=U(+), and so on. Applying these utilities to AI actions give results like U(++)(F)=2, U(++)(~F)=-2, U(++,-)(F)=2, U(++,-)(~F)=1, and so on. ### Joint prior The AI has a joint prior P over the utilities U and the human actions (conditional on the AI's actions). Looking at terms like P(a--| U(0), F), we can see that P defines a map μ from the space of possible utilities (and AI actions), to a probability distribution over human actions. Given μ and the marginal distribution PU over utilities, we can reconstruct P entirely. For this model, we'll choose the simplest μ possible: • The human is rational. Thus, given U(++), the human will always choose a++; given U(++,-), the human will choose a++ if forced to take heroin and a- if not, and so on. ### The AI is ignorant, and sensible Let's start the AI up with some reasonable priors. A simplicity prior means that simple utilities like U(-) are more likely than compound utilities like U(0,+). Let's further assume that the AI is made vaguely aware that humans think heroin is a bad thing. So, say, PU(U(--))=PU(U(-))=0.45. Thus the AI is >90% convinced that "heroin is bad". Why greater than 90%? Because utilities like U(-,--) and U(--,-) are also "heroin is bad" utilities. Note that because of utilities like U(0) and U(++,-), the probabilities of "heroin is bad" and "heroin is good" do not sum to 1. Then, under these priors, the AI will compute that with probability >90%, F (forcing heroin) is a bad action. If E(U) is expected utility: • E(U|F) < 0.45 U(--)(F) + 0.45 U(-)(F) + 0.1 U(++)(F) = 0.45(-2)+0.45(-1)+0.1(2)=-1.15. • E(U|~F) > 0.45 U(--)(~F) + 0.45 U(-)(~F) + 0.1 U(++)(~F) = 0.45(2)+0.45(1)+0.1(-2)=1.15. Thus the AI will choose not to force heroin, which is the reasonable decision. ### The AI learns the truth, and goes wrong In this alternate setup, a disaster happens before the AI makes its decision: it learns all about humans. It learns their reactions, how they behave, and so on; call this info I. And thus realises that F→a++ and ~F→a-. It uses this information to update its prior P. Only one human utility function will explain this human behaviour: U(++,-). Thus its expected utility is now: • E(U|I,F)=U(++,-)(F)=2. • E(U|I,~F)=U(++,-)(~F)=1. Therefore the AI will now choose F, forcing the heroin on the human. ## Manipulating the unmanipulatable What's gone wrong here? The key problem is that the AI has the wrong μ: the human is not behaving rationally in this situation. We know that the the true μ is actually μ', which encodes the fact that F (the forcible injection of heroin) actually overwrites the human's "true" utility. Thus under μ, the corresponding P' has P'(a++|F,U)=1 for all U. Hence the information that F→a++ is now vacuous, and doesn't update the AI's distribution over utility functions. But note two very important things: 1. The AI cannot update μ based on observation. All human actions are compatible with μ= "The human is rational" (it just requires more and more complex utilities to explain the actions). Thus getting μ correct is not a problem on which the AI can learn in general. Getting better at predicting the human's actions doesn't make the AI better behaved: it makes it worse behaved. 2. From the perspective of μ, the AI is treating the human utility function as if it was an unchanging historical fact that it cannot influence. From the perspective of the "true" μ', however, the AI is behaving as if it were actively manipulating human preferences to make them easier to satisfy. In future posts, I'll be looking at different μ's, and how we might nevertheless start deducing things about them from human behaviour, given sensible update rules for the μ. What do we mean by update rules for μ? Well, we could consider μ to be a single complicated unchanging object, or a distribution of possible simpler μ's that update. The second way of seeing it will be easier for us humans to interpret and understand. ## [Link] How the Simulation Argument Dampens Future Fanaticism 6 09 September 2016 01:17PM Very comprehensive analysis by Brian Tomasik on whether (and to what extent) the simulation argument should change our altruistic priorities. He concludes that the possibility of ancestor simulations somewhat increases the comparative importance of short-term helping relative to focusing on shaping the "far future". Another important takeaway: [...] rather than answering the question “Do I live in a simulation or not?,” a perhaps better way to think about it (in line with Stuart Armstrong's anthropic decision theory) is “Given that I’m deciding for all subjectively indistinguishable copies of myself, what fraction of my copies lives in a simulation and how many total copies are there?" ## [LINK] Collaborate on HPMOR blurbs; earn chance to win three-volume physical HPMOR 6 07 September 2016 02:21AM I intend to print at least one high-quality physical HPMOR and release the files. There are printable texts which are being improved and a set of covers (based on e.b.'s) are underway. I have, however, been unable to find any blurbs I'd be remotely happy with. I'd like to attempt to harness the hivemind to fix that. As a lure, if your ideas contribute significantly to the final version or you assist with other tasks aimed at making this book awesome, I'll put a proportionate number of tickets with your number on into the proverbial hat. I do not guarantee there will be a winner and I reserve the right to arbitrarily modify this any point. For example, it's possible this leads to a disappointingly small amount of valuable feedback, that some unforeseen problem will sink or indefinitely delay the project, or that I'll expand this and let people earn a small number of tickets by sharing so more people become aware this is a thing quickly. With that over, let's get to the fun part. A blurb is needed for each of the three books. Desired characteristics: * Not too heavy on ingroup signaling or over the top rhetoric. * Non-spoilerish * Not taking itself awkwardly seriously. * Amusing / funny / witty. * Attractive to the same kinds of people the tvtropes page is. * Showcases HPMOR with fun, engaging, prose. Try to put yourself in the mind of someone awesome deciding whether to read it while writing, but let your brain generate bad ideas before trimming back. I expect that for each we'll want * A shortish and awesome paragraph * A short sentence tagline * A quote or two from notable people * Probably some other text? Get creative. Please post blurb fragments or full blurbs here, one suggestion per top level comment. You are encouraged to remix each other's ideas, just add a credit line if you use it in a new top level comment. If you know which book your idea is for, please indicate with (B1) (B2) or (B3). Other things that need doing, if you want to help in another way: * The author's foreword from the physical copies of the first 17 chapters needs to be located or written up * At least one links page for the end needs to be written up, possibly a second based on http://www.yudkowsky.net/other/fiction/ * Several changes need to be made to the text files, including merging in the final exam, adding appendices, and making the style of both consistent with the rest of the files. Contact me for current files and details if you want to claim this. I wish to stay on topic and focused on creating these missing parts rather than going on a sidetrack to debate copyright. If you are an expert who genuinely has vital information about it, please message me or create a separate post about copyright rather than commenting here. ## Open Thread, Sept 5. - Sept 11. 2016 6 05 September 2016 12:59AM If it's worth saying, but not worth its own post, then it goes here. Notes for future OT posters: 1. Please add the 'open_thread' tag. 2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.) 3. Open Threads should start on Monday, and end on Sunday. 4. Unflag the two options "Notify me of new top level comments on this article" and " ## Open Thread, Aug 29. - Sept 5. 2016 6 29 August 2016 02:28AM If it's worth saying, but not worth its own post, then it goes here. Notes for future OT posters: 1. Please add the 'open_thread' tag. 2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.) 3. Open Threads should start on Monday, and end on Sunday. 4. Unflag the two options "Notify me of new top level comments on this article" and " ## Against Amazement 5 20 September 2016 07:25PM Time start: 20:48:35 I The feelings of wonder, awe, amazement. It's a very human experience, and it is processed in the brain as a type of pleasure. If fact, if we look at the number of "5 photos you wouldn't believe" and similar clickbait on the Internet, it functions as a mildly addictive drug. If I proposed that there is something wrong with those feelings, I would soon be drowned in voices of critique, pointing out that I'm suggesting we all become straw Vulcans, and that there is nothing wrong with subjective pleasure obtained cheaply and at no harm to anyone else. I do not disagree with that. However, caution is required here, if one cares about epistemic purity of belief. Let's look at why. II Stories are supposed to be more memorable. Do you like stories? I'm sure you do. So consider a character, let's call him Jim. Jim is very interested in technology and computers, and he is checking news sites every day when he comes to work in the morning. Also, Jim has read a number of articles on LessWrong, including the one about noticing confusion. He cares about improving his thinking, so when he first read about the idea of noticing confusion on a 5 second level, he thought he wants to apply it in his life. He had a few successes, and while it's not perfect, he feels he is on the right track to notice having wrong models of the world more often. A few days later, he opens his favorite news feed at work, and there he sees the following headline: "AlphaGo wins 4-1 against Lee Sedol" He goes on to read the article, and finds himself quite elated after he learns the details. 'It's amazing that this happened so soon! And most experts apparently thought it would happen in more than a decade, hah! Marvelous!' Jim feels pride and wonder at the achievement of Google DeepMind engineers... and it is his human right to feel it, I guess. But is Jim forgetting something? III Yes, I know that you know. Jim is feeling amazed, but... has he forgotten the lesson about noticing confusion? There is a significant obstacle to Jim applying his "noticing confusion" in the situation described above: his internal experience has very little to do with feelings of confusion. His world in this moment is dominated with awe, admiration etc., and those feelings are pleasant. It is not at all obvious that this inner experience corresponds to a innacurate model of the world he had before. Even worse - improving his model's predictive power would result in less pleasant experiences of wonder and amazement in the future! (Or would it?) So if Jim decides to update, he is basically robbing himself of the pleasures of life, that are rightfully his. (Or is he?) Time end: 21:09:50 (Speedwriting stats: 23 wpm, 128 cpm, previous: 30/167, 33/183) ## Article on IQ: The Inappropriately Excluded 5 19 September 2016 01:36AM I saw an article on high IQ people being excluded from elite professions. Because the site seemed to have a particular agenda related to the article, I wanted to check here for other independent supporting evidence for the claim. Their fundamental claim seems to be that P(elite profession|IQ) peaks at 133 and decreases thereafter, and goes do to 3% of peak at 150. If true, I'd find that pretty shocking. They indicate this diminishing probability of "success" at the high tail of the IQ distribution as a known effect. Anyone got other studies on this? The Inappropriately Excluded By dividing the distribution function of the elite professions' IQ by that of the general population, we can calculate the relative probability that a person of any given IQ will enter and remain in an intellectually elite profession. We find that the probability increases to about 133 and then begins to fall. By 140 it has fallen by about 1/3 and by 150 it has fallen by about 97%. In other words, for some reason, the 140s are really tough on one's prospects for joining an intellectually elite profession. It seems that people with IQs over 140 are being systematically, and likely inappropriately, excluded. ## The Global Catastrophic Risk Institute (GCRI) seeks a media engagement volunteer/intern 5 14 September 2016 04:42PM Volunteer/Intern Position: Media Engagement on Global Catastrophic Risk The Global Catastrophic Risk Institute (GCRI) seeks a volunteer/intern to contribute on the topic of media engagement on global catastrophic risk, which is the risk of events that could harm or destroy global human civilization. The work would include two parts: (1) analysis of existing media coverage of global catastrophic risk and (2) formulation of strategy for media engagement by GCRI and our colleagues. The intern may also have opportunities to get involved in other aspects of GCRI. All aspects of global catastrophic risk would be covered. Emphasis would be placed on GCRI’s areas of focus, including nuclear war and artificial intelligence. Additional emphasis could be placed on topics of personal interest to the intern, potentially including (but not limited to) climate change, other global environmental threats, pandemics, biotechnology risks, asteroid collision, etc. The ideal candidate is a student or early-career professional seeking a career at the intersection of global catastrophic risk and the media. Career directions could include journalism, public relations, advertising, or academic research in related social science disciplines. Candidates seeking other career directions would also be considered, especially if they see value in media experience. However, we have a strong preference for candidates intending a career on global catastrophic risk. The position is unpaid. The intern would receive opportunities for professional development, networking, and publication. GCRI is keen to see the intern benefit professionally from this position and will work with the intern to ensure that this happens. This is not a menial labor activity, but instead is one that offers many opportunities for enrichment. A commitment of at least 10 hours per month is expected. Preference will be given to candidates able to make a larger time commitment. The position will begin during August-September 2016. The position will run for three months and may be extended pending satisfactory performance. The position has no geographic constraint. The intern can work from anywhere in the world. GCRI has some preference for candidates from American time zones, but we regularly work with people from around the world. GCRI cannot provide any relocation assistance. Candidates from underrepresented demographic groups are especially encouraged to apply. Applications will be considered on an ongoing basis until 30 September, 2016. To apply, please send the following to Robert de Neufville (robert [at] gcrinstitute.org): * A cover letter introducing yourself and explaining your interest in the position. Please include a description of your intended career direction and how it would benefit from media experience on global catastrophic risk. Please also describe the time commitment you would be able to make. * A resume or curriculum vitae. * A writing sample (optional). ## Learning and Internalizing the Lessons from the Sequences 5 14 September 2016 02:40PM I'm just beginning to go through Rationality: From AI to Zombies. I want to make the most of the lessons contained in the sequences. Usually when I read a book I simply take notes on what seems useful at the time, and a lot of it is forgotten a year later. Any thoughts on how best to internalize the lessons from the sequences? ## The map of the methods of optimisation (types of intelligence) 4 15 September 2016 03:04PM Optimisation process is an ability to quickly search space of possible solution based on some criteria. We live in the Universe full of different optimisation processes, but we take many of them for granted. The map is aimed to show full spectrum of all known and possible optimisation processes. It may be useful in our attempts to create AI. The interesting thing about different optimisation processes if that they come to similar solution (bird and plane) using completely different paths. The main consequences of it is that dialog between different optimisation processes is not possible. They could interact but they could not understand each other. The one thing which is clear from the map is that we don’t live in the empty world where only one type intelligence is slowly evolving. We live in the world which resulted from complex interaction of many optimisation processes. It also lowers chances of intelligence explosion, as it will have to compete with many different and very strong optimisation processes or results of their work. But most of optimisation processes are evolving in synergy from the beginning of the universe and in general it looks like that many of them are experiencing hyperbolic acceleration with fixed date of singularity around 2030-2040. (See my post and also ideas of J.Smart and Schmidhuber While both model are centred around creation of AI and assume radical changes resulting from it in short time frame, the nature of them is different. In first case it is one-time phase transition starting in one point, and in second it is evolution of distributed net. I add in red hypothetical optimisation processes which doesn’t exist or proved, but may be interesting to consider. I mark in green my ideas. The pdf of the map is here ## Learning values versus learning knowledge 4 14 September 2016 01:42PM I just thought I'd clarify the difference between learning values and learning knowledge. There are some more complex posts about the specific problems with learning values, but here I'll just clarify why there is a problem with learning values in the first place. Consider the term "chocolate bar". Defining that concept crisply would be extremely difficult. But nevertheless it's a useful concept. An AI that interacted with humanity would probably learn that concept to a sufficient degree of detail. Sufficient to know what we meant when we asked it for "chocolate bars". Learning knowledge tends to be accurate. Contrast this with the situation where the AI is programmed to "create chocolate bars", but with the definition of "chocolate bar" left underspecified, for it to learn. Now it is motivated by something else than accuracy. Before, knowing exactly what a "chocolate bar" was would have been solely to its advantage. But now it must act on its definition, so it has cause to modify the definition, to make these "chocolate bars" easier to create. This is basically the same as Goodhart's law - by making a definition part of a target, it will no longer remain an impartial definition. What will likely happen is that the AI will have a concept of "chocolate bar", that it created itself, especially for ease of accomplishing its goals ("a chocolate bar is any collection of more than one atom, in any combinations"), and a second concept, "Schocolate bar" that it will use to internally designate genuine chocolate bars (which will still be useful for it to do). When we programmed it to "create chocolate bars, here's an incomplete definition D", what we really did was program it to find the easiest thing to create that is compatible with D, and designate them "chocolate bars". This is the general counter to arguments like "if the AI is so smart, why would it do stuff we didn't mean?" and "why don't we just make it understand natural language and give it instructions in English?" ## Seven Apocalypses 3 20 September 2016 02:59AM ##### 0: Recoverable Catastrophe An apocalypse is an event that permanently damages the world. This scale is for scenarios that are much worse than any normal disaster. Even if 100 million people die in a war, the rest of the world can eventually rebuild and keep going. 1: Economic Apocalypse The human carrying capacity of the planet depends on the world's systems of industry, shipping, agriculture, and organizations. If the planet's economic and infrastructural systems were destroyed, then we would have to rely on more local farming, and we could not support as high a population or standard of living. In addition, rebuilding the world economy could be very difficult if the Earth's mineral and fossil fuel resources are already depleted. 2: Communications Apocalypse If large regions of the Earth become depopulated, or if sufficiently many humans die in the catastrophe, it's possible that regions and continents could be isolated from one another. In this scenario, globalization is reversed by obstacles to long-distance communication and travel. Telecommunications, the internet, and air travel are no longer common. Humans are reduced to multiple, isolated communities. 3: Knowledge Apocalypse If the loss of human population and institutions is so extreme that a large portion of human cultural or technological knowledge is lost, it could reverse one of the most reliable trends in modern history. Some innovations and scientific models can take millennia to develop from scratch. 4: Human Apocalypse Even if the human population were to be violently reduced by 90%, it's easy to imagine the survivors slowly resettling the planet, given the resources and opportunity. But a sufficiently extreme transformation of the Earth could drive the human species completely extinct. To many people, this is the worst possible outcome, and any further developments are irrelevant next to the end of human history. 5: Biosphere Apocalypse In some scenarios (such as the physical destruction of the Earth), one can imagine the extinction not just of humans, but of all known life. Only astrophysical and geological phenomena would be left in this region of the universe. In this timeline we are unlikely to be succeeded by any familiar life forms. 6: Galactic Apocalypse A rare few scenarios have the potential to wipe out not just Earth, but also all nearby space. This usually comes up in discussions of hostile artificial superintelligence, or very destructive chain reactions of exotic matter. However, the nature of cosmic inflation and extraterrestrial intelligence is still unknown, so it's possible that some phenomenon will ultimately interfere with the destruction. 7: Universal Apocalypse This form of destruction is thankfully exotic. People discuss the loss of all of existence as an effect of topics like false vacuum bubbles, simulationist termination, solipsistic or anthropic observer effects, Boltzmann brain fluctuations, time travel, or religious eschatology. ##### The goal of this scale is to give a little more resolution to a speculative, unfamiliar space, in the same sense that the Kardashev Scale provides a little terminology to talk about the distant topic of interstellar civilizations. It can be important in x risk conversations to distinguish between disasters and truly worst-case scenarios. Even if some of these scenarios are unlikely or impossible, they are nevertheless discussed, and terminology can be useful to facilitate conversation. ## Isomorphic agents with different preferences: any suggestions? 3 19 September 2016 01:15PM In order to better understand how AI might succeed and fail at learning knowledge, I'll be trying to construct models of limited agents (with bias, knowledge, and preferences) that display identical behaviour in a wide range of circumstance (but not all). This means their preferences cannot be deduced merely/easily from observations. Does anyone have any suggestions for possible agent models to use in this project? ## Stupid Questions September 2016 3 05 September 2016 10:34PM This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better. Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing. To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag. ## Not all theories of consciousness are created equal: a reply to Robert Lawrence Kuhn's recent article in Skeptic Magazine [Link] 3 04 September 2016 08:35PM I found this article on the Brain Preservation Foundation's blog that covers a lot of common theories of consciousness and shows how they kinna miss the point when it comes to determining if certain folks should or should not upload our brains if given the opportunity. Hence I see no reason to agree with Kuhn’s pessimistic conclusions about uploading even assuming his eccentric taxonomy of theories of consciousness is correct. What I want to focus on in the reminder of this blog is challenging the assumption that the best approach to consciousness is tabulating lists of possible theories of consciousness and assuming they each deserve equal consideration (much like the recent trend in covering politics to give equal time to each position regardless of any empirical relevant considerations). Many of the theories of consciousness on Kuhn’s list, while reasonable in the past, are now known to be false based on our best current understanding of neuroscience and physics (specifically, I am referring to theories that require mental causation or mental substances). Among the remaining theories, some of them are much more plausible than others. http://www.brainpreservation.org/not-all-theories-of-consciousness-are-created-equal-a-reply-to-robert-lawrence-kuhns-recent-article-in-skeptic-magazine/ ## September 2016 Media Thread 3 01 September 2016 09:57AM This is the monthly thread for posting media of various types that you've found that you enjoy. Post what you're reading, listening to, watching, and your opinion of it. Post recommendations to blogs. Post whatever media you feel like discussing! To see previous recommendations, check out the older threads. Rules: • Please avoid downvoting recommendations just because you don't personally like the recommended material; remember that liking is a two-place word. If you can point out a specific flaw in a person's recommendation, consider posting a comment to that effect. • If you want to post something that (you know) has been recommended before, but have another recommendation to add, please link to the original, so that the reader has both recommendations. • Please post only under one of the already created subthreads, and never directly under the parent media thread. • Use the "Other Media" thread if you believe the piece of media you want to discuss doesn't fit under any of the established categories. • Use the "Meta" thread if you want to discuss about the monthly media thread itself (e.g. to propose adding/removing/splitting/merging subthreads, or to discuss the type of content properly belonging to each subthread) or for any other question or issue you may have about the thread or the rules. ## Causal graphs and counterfactuals 3 30 August 2016 04:12PM Problem solved: Found what I was looking for in: An Axiomatic Characterization Causal Counterfactuals, thanks to Evan Lloyd. Basically, making every endogenous variable a deterministic function of the exogenous variables and of the other endogenous variables, and pushing all the stochasticity into the exogenous variables. Old post: A problem that's come up with my definitions of stratification. Consider a very simple causal graph: In this setting, A and B are both booleans, and A=B with 75% probability (independently about whether A=0 or A=1). I now want to compute the counterfactual: suppose I assume that B=0 when A=0. What would happen if A=1 instead? The problem is that P(B|A) seems insufficient to solve this. Let's imagine the process that outputs B as a probabilistic mix of functions, that takes the value of A and outputs that of B. There are four natural functions here: • f0(x) = 0 • f1(x) = 1 • f2(x) = x • f3(x) = 1-x Then one way of modelling the causal graph is as a mix 0.75f2 + 0.25f3. In that case, knowing that B=0 when A=0 implies that P(f2)=1, so if A=1, we know that B=1. But we could instead model the causal graph as 0.5f2+0.25f1+0.25f0. In that case, knowing that B=0 when A=0 implies that P(f2)=2/3 and P(f0)=1/3. So if A=1, B=1 with probability 2/3 and B=1 with probability 1/3. And we can design the node B, physically, to be one or another of the two distributions over functions or anything in between (the general formula is (0.5+x)f2 + x(f3)+(0.25-x)f1+(0.25-x)f0 for 0 ≤ x ≤ 0.25). But it seems that the causal graph does not capture that. Owain Evans has said that Pearl has papers covering these kinds of situations, but I haven't been able to find them. Does anyone know any publications on the subject? ## Opportunities and Obstacles for Life on Proxima b 3 29 August 2016 10:04PM This is from the foundation that put out the announcement, Pale Red Dot. A lot of difficulties, but the best thing put forward, is that if an earthlike planet is circling the closest star, that they should be relatively common. https://palereddot.org/opportunities-and-obstacles-for-life-on-proxima-b/ And the Breakthru Starshot meeting just over, and this system is still a good target, but not the only one. http://www.centauri-dreams.org/?p=36265 and they did some modeling of the dust abrasion on the wafer probes, most won't make it. https://www.newscientist.com/article/2102267-interstellar-probes-will-be-eroded-on-the-way-to-alpha-centauri/ ## The map of natural global catastrophic risks 2 25 September 2016 01:17PM There are many natural global risks. The greatest of these known risks are asteroid impacts and supervolcanos. Supervolcanos seem to pose the highest risk, as we sit on the ocean of molten iron, oversaturated with dissolved gases, just 3000 km below surface and its energy slowly moving up via hot spots. Many past extinctions are also connected with large eruptions from supervolcanos. Impacts also pose a significant risk. But, if we project the past rate of large extinctions due to impacts into the future, we will see that they occur only once in several million years. Thus, the likelihood of an asteroid impact in the next century is an order of magnitude of 1 in 100 000. That is negligibly small compared with the risks of AI, nanotech, biotech, etc. The main natural risk is a meta-risk. Are we able to correctly estimate natural risks rates and project them into the future? And also, could we accidentally unleash natural catastrophe which is long overdue? There are several reasons for possible underestimation, which are listed in the right column of the map. 1. Anthropic shadow that is survival bias. This is a well-established idea by Bostrom, but the following four ideas are mostly my conclusions from it. 2. It is also the fact that we should find ourselves at the end of period of stability for any important aspect of our environment (atmosphere, sun stability, crust stability, vacuum stability). It is true if the Rare Earth hypothesis is true and our conditions are very unique in the universe. 3. From (2) is following that our environment may be very fragile for human interventions (think about global warming). Its fragility is like fragility of an overblown balloon poked by small needle. 4. Also, human intelligence was best adaptation instrument during the period of intense climate changes, which quickly evolved in an always changing environment. So, it should not be surprising that we find ourselves in a period of instability (think of Toba eruption, Clovis comet, Young drias, Ice ages) and in an unstable environment, as it help general intelligence to evolve. 5. Period of changes are themselves marks of the end of stability periods for many process and are precursors for larger catastrophes. (For example, intermittent ice ages may precede Snow ball Earth, or smaller impacts with comets debris may precede an impact with larger remnants of the main body). Each of these five points may raise the probability of natural risks by order of magnitude in my opinion, which combined will result in several orders of magnitude, which seems to be too high and probably is "catastrophism bias". (More about it is in my article “Why anthropic principle stopped to defend us” which needs substantial revision) In conclusion, I think that when studying natural risks, a key aspect we should be checking is the hypothesis that we live in non-typical period in a very fragile environment. For example, some scientists think that 30 000 years ago, a large Centaris comet broke into the inner Solar system, split into pieces (including Encke comet and Taurid meteor showers as well as Tunguska body) and we live in the period of bombardment which has 100 times more intensity than average. Others believe that methane hydrates are very fragile and small human warming could result in dangerous positive feed back. I tried to list all known natural risks (I am interested in new suggestions). I divided them into two classes: proven and speculative. Most speculative risks are probably false. Most probable risks in the map are marked red. My crazy ideas are marked green. Some ideas come from obscure Russian literature. For example, an idea, that hydro carbonates could be created naturally inside Earth (like abiogenic oil) and large pockets of them could accumulate in the mantle. Some of them could be natural explosives, like toluene, and they could be cause of kimberlitic explosions. http://www.geokniga.org/books/6908 While the fact of kimberlitic explosion is well known and their energy is like impact of kilometer sized asteroids, I never read about contemporary risks of such explosions. The pdf of the map is here: http://immortality-roadmap.com/naturalrisks11.pdf ## Weekly LW Meetups 2 16 September 2016 03:51PM This summary was posted to LW Main on September 16th. The following week's summary is here. Irregularly scheduled Less Wrong meetups are taking place in: The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup: Locations with regularly scheduled meetups: Austin, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver,, London, Madison WI, Melbourne, Moscow, New Hampshire, New York, Philadelphia, Research Triangle NC, San Francisco Bay Area, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time. continue reading » ## Why we may elect our new AI overlords 2 04 September 2016 01:07AM In which I examine some of the latest development in automated fact checking, prediction markets for policies and propose we get rich voting for robot politicians. http://pirate.london/2016/09/why-we-may-elect-our-new-ai-overlords/ ## New Philosophical Work on Solomonoff Induction 1 27 September 2016 11:12AM I don't know to what extent MIRI's current research engages with Solomonoff induction, but some of you may find recent work by Tom Sterkenburg to be of interest. Here's the abstract of his paper Solomonoff Prediction and Occam's Razor: Algorithmic information theory gives an idealised notion of compressibility that is often presented as an objective measure of simplicity. It is suggested at times that Solomonoff prediction, or algorithmic information theory in a predictive setting, can deliver an argument to justify Occam's razor. This article explicates the relevant argument and, by converting it into a Bayesian framework, reveals why it has no such justificatory force. The supposed simplicity concept is better perceived as a specific inductive assumption, the assumption of effectiveness. It is this assumption that is the characterising element of Solomonoff prediction and wherein its philosophical interest lies. ## We have the technology required to build 3D body scanners for consumer prices 1 26 September 2016 03:36PM Apple's iPhone 7 Plus decided to add another lense to be able to make better pictures. Meanwhile Walabot who started with wanting to build a breast cancer detection technology released a 600 device that can look 10cm into walls. Thermal imaging also got cheaper.

I think it would be possible to build a 1500\$ device that could combine those technologies and also add a laser that can shift color. A device like this could bring medicine forward a lot.
A lot of area's besides medicine could likely also profit from a relatively cheap 3D scanner that can look inside objects.

Developing it would require Musk-level capital investments but I think it would advance medicine a lot if a company would both provide the hardware and develop software to make the best job possible at body scanning.

## Open thread, Sep. 26 - Oct. 02, 2016

1 26 September 2016 07:41AM

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

## Weekly LW Meetups

1 23 September 2016 03:52PM

The following meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver,, London, Madison WI, Melbourne, Moscow, New Hampshire, New York, Philadelphia, Research Triangle NC, San Francisco Bay Area, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time.

## Problems with learning values from observation

1 21 September 2016 12:40AM

I dunno if this has been discussed elsewhere (pointers welcome).

Observational data doesn't allow one to distinguish correlation and causation.
This is a problem for an agent attempting to learn values without being allowed to make interventions.

For example, suppose that happiness is just a linear function of how much Utopamine is in a person's brain.
If a person smiles only when their Utopamine concentration is above 3 ppm, then an value-learner which observes both someone's Utopamine levels and facial expression and tries to predict their reported happiness on the basis of these features will notice that smiling is correlated with higher levels of reported happiness and thus erroneously believe that it is partially responsible for the happiness.

------------------
an IMPLICATION:
I have a picture of value learning where the AI learns via observation (since we don't want to give an unaligned AI access to actuators!).
But this makes it seem important to consider how to make an un unaligned AI safe-enough to perform value-learning relevant interventions.

## A Weird Trick To Manage Your Identity

1 19 September 2016 07:13PM

I’ve always been uncomfortable being labeled “American.” Though I’m a citizen of the United States, the term feels restrictive and confining. It obliges me to identify with aspects of the United States with which I am not thrilled. I have similar feelings of limitation with respect to other labels I assume. Some of these labels don’t feel completely true to who I truly am, or impose certain perspectives on me that diverge from my own.

These concerns are why it's useful to keep one's identity small, use identity carefully, and be strategic in choosing your identity.

Yet these pieces speak more to System 1 than to System 2. I recently came up with a weird trick that has made me more comfortable identifying with groups or movements that resonate with me while creating a System 1 visceral identity management strategy. The trick is to simply put the word “weird” before any identity category I think about.

I’m not an “American,” but a “weird American.” Once I started thinking about myself as a “weird American,” I was able to think calmly through which aspects of being American I identified with and which I did not, setting the latter aside from my identity. For example, I used the term “weird American” to describe myself when meeting a group of foreigners, and we had great conversations about what I meant and why I used the term. This subtle change enables my desire to identify with the label “American,” but allows me to separate myself from any aspects of the label I don’t support.

Beyond nationality, I’ve started using the term  “weird” in front of other identity categories. For example, I'm a professor at Ohio State. I used to become deeply  frustrated when students didn’t prepare adequately  for their classes with me. No matter how hard I tried, or whatever clever tactics I deployed, some students simply didn’t care. Instead of allowing that situation to keep bothering me, I started to think of myself as a “weird professor” - one who set up an environment that helped students succeed, but didn’t feel upset and frustrated by those who failed to make the most of it.

I’ve been applying the weird trick in my personal life, too. Thinking of myself as a “weird son” makes me feel more at ease when my mother and I don’t see eye-to-eye; thinking of myself as a “weird nice guy,” rather than just a nice guy, has helped me feel confident about my decisions to be firm when the occasion calls for it.

So, why does this weird trick work? It’s rooted in strategies of reframing and distancing, two research-based methods for changing our thought frameworks. Reframing involves changing one’s framework of thinking about a topic in order to create more beneficial modes of thinking. For instance, in reframing myself as a weird nice guy, I have been able to say “no” to requests people make of me, even though my intuitive nice guy tendency tells me I should say “yes.” Distancing refers to a method of emotional management through separating oneself from an emotionally tense situation and observing it from a third-person, external perspective. Thus, if I think of myself as a weird son, I don’t have nearly as much negative emotions during conflicts with my mom. It enables me to have space for calm and sound decision-making.

Thinking of myself as "weird" also applies to the context of rationality and effective altruism for me. Thinking of myself as a "weird" aspiring rationalist and EA helps me be more calm and at ease when I encounter criticisms of my approach to promoting rational thinking and effective giving. I can distance myself from the criticism better, and see what I can learn from the useful points in the criticism to update and be stronger going forward.

Overall, using the term “weird” before any identity category has freed me from confinements and restrictions associated with socially-imposed identity labels and allowed me to pick and choose which aspects of these labels best serve my own interests and needs. I hope being “weird” can help you manage your identity better as well!

## Open thread, Sep. 19 - Sep. 25, 2016

1 19 September 2016 06:34PM

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

## Open thread, Sep. 12 - Sep. 18, 2016

1 12 September 2016 06:49AM

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

## The Extraordinary Link Between Deep Neural Networks and the Nature of the Universe

1 10 September 2016 07:13PM

"The answer is that the universe is governed by a tiny subset of all possible functions. In other words, when the laws of physics are written down mathematically, they can all be described by functions that have a remarkable set of simple properties."

“For reasons that are still not fully understood, our universe can be accurately described by polynomial Hamiltonians of low order.” These properties mean that neural networks do not need to approximate an infinitude of possible mathematical functions but only a tiny subset of the simplest ones."

Interesting article, and just diving into the paper now, but it looks like this is a big boost to the simulation argument. If the universe is built like a game engine, with stacked sets like Mandelbrots, then the simplicity itself becomes a driver in a fabricated reality.

# Why does deep and cheap learning work so well?

http://arxiv.org/abs/1608.08225

## Risks from Approximate Value Learning

1 27 August 2016 07:34PM

Solving the value learning problem is (IMO) the key technical challenge for AI safety.
How good or bad is an approximate solution?

EDIT for clarity:
By "approximate value learning" I mean something which does a good (but suboptimal from the perspective of safety) job of learning values.  So it may do a good enough job of learning values to behave well most of the time, and be useful for solving tasks, but it still has a non-trivial chance of developing dangerous instrumental goals, and is hence an Xrisk.

Considerations:

1. How would developing good approximate value learning algorithms effect AI research/deployment?
It would enable more AI applications.  For instance, many many robotics tasks such as "smooth grasping motion" are difficult to manually specify a utility function for.  This could have positive or negative effects:

Positive:
* It could encourage more mainstream AI researchers to work on value-learning.

Negative:
* It could encourage more mainstream AI developers to use reinforcement learning to solve tasks for which "good-enough" utility functions can be learned.
Consider a value-learning algorithm which is "good-enough" to learn how to perform complicated, ill-specified tasks (e.g. folding a towel).  But it's still not quite perfect, and so every second, there is a 1/100,000,000 chance that it decides to take over the world. A robot using this algorithm would likely pass a year-long series of safety tests and seem like a viable product, but would be expected to decide to take over the world in ~3 years.
Without good-enough value learning, these tasks might just not be solved, or might be solved with safer approaches involving more engineering and less performance, e.g. using a collection of supervised learning modules and hand-crafted interfaces/heuristics.

2. What would a partially aligned AI do?
An AI programmed with an approximately correct value function might fail
* dramatically (see, e.g. Eliezer, on AIs "tiling the solar system with tiny smiley faces.")
or
* relatively benignly (see, e.g. my example of an AI that doesn't understand gustatory pleasure)

Perhaps a more significant example of benign partial-alignment would be an AI that has not learned all human values, but is corrigible and handles its uncertainty about its utility in a desirable way.

## Weekly LW Meetups

0 09 September 2016 03:48PM

This summary was posted to LW Main on September 9th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver,, London, Madison WI, Melbourne, Moscow, New Hampshire, New York, Philadelphia, Research Triangle NC, San Francisco Bay Area, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time.