[Link] Suffering-focused AI safety: Why “fail-safe” measures might be particularly promising
The Foundational Research Institute just published a new paper: "Suffering-focused AI safety: Why “fail-safe” measures might be our top intervention".
It is important to consider that [AI outcomes] can go wrong to very different degrees. For value systems that place primary importance on the prevention of suffering, this aspect is crucial: the best way to avoid bad-case scenarios specifically may not be to try and get everything right. Instead, it makes sense to focus on the worst outcomes (in terms of the suffering they would contain) and on tractable methods to avert them. As others are trying to shoot for a best-case outcome (and hopefully they will succeed!), it is important that some people also take care of addressing the biggest risks. This perspective to AI safety is especially promising both because it is currently neglected and because it is easier to avoid a subset of outcomes rather than to shoot for one highly specific outcome. Finally, it is something that people with many different value systems could get behind.
2016 LessWrong Diaspora Survey Analysis: Part Two (LessWrong Use, Successorship, Diaspora)
2016 LessWrong Diaspora Survey Analysis
Overview
- Results and Dataset
- Meta
- Demographics
- LessWrong Usage and Experience
- LessWrong Criticism and Successorship
- Diaspora Community Analysis (You are here)
- Mental Health Section
- Basilisk Section/Analysis
- Blogs and Media analysis
- Politics
- Calibration Question And Probability Question Analysis
- Charity And Effective Altruism Analysis
Introduction
Before it was the LessWrong survey, the 2016 survey was a small project I was working on as market research for a website I'm creating called FortForecast. As I was discussing the idea with others, particularly Eliot he made the suggestion that since he's doing LW 2.0 and I'm doing a site that targets the LessWrong demographic, why don't I go ahead and do the LessWrong Survey? Because of that, this years survey had a lot of questions oriented around what you would want to see in a successor to LessWrong and what you think is wrong with the site.
LessWrong Usage and Experience
How Did You Find LessWrong?
Been here since it was started in the Overcoming Bias days: 171 8.3%
Referred by a link: 275 13.4%
HPMOR: 542 26.4%
Overcoming Bias: 80 3.9%
Referred by a friend: 265 12.9%
Referred by a search engine: 131 6.4%
Referred by other fiction: 14 0.7%
Slate Star Codex: 241 11.7%
Reddit: 55 2.7%
Common Sense Atheism: 19 0.9%
Hacker News: 47 2.3%
Gwern: 22 1.1%
Other: 191 9.308%
How do you use Less Wrong?
I lurk, but never registered an account: 1120 54.4%
I've registered an account, but never posted: 270 13.1%
I've posted a comment, but never a top-level post: 417 20.3%
I've posted in Discussion, but not Main: 179 8.7%
I've posted in Main: 72 3.5%
[54.4% lurkers.]
How often do you comment on LessWrong?
I have commented more than once a week for the past year.: 24 1.2%
I have commented more than once a month for the past year but less than once a week.: 63 3.1%
I have commented but less than once a month for the past year.: 225 11.1%
I have not commented this year.: 1718 84.6%
[You could probably snarkily title this one "LW usage in one statistic". It's a pretty damning portrait of the sites vitality. A whopping 84.6% of people have not commented this year a single time.]
How Long Since You Last Posted On LessWrong?
I wrote one today.: 12 0.637%
Within the last three days.: 13 0.69%
Within the last week.: 22 1.168%
Within the last month.: 58 3.079%
Within the last three months.: 75 3.981%
Within the last six months.: 68 3.609%
Within the last year.: 84 4.459%
Within the last five years.: 295 15.658%
Longer than five years.: 15 0.796%
I've never posted on LW.: 1242 65.924%
[Supermajority of people have never commented on LW, 5.574% have within the last month.]
About how much of the Sequences have you read?
Never knew they existed until this moment: 215 10.3%
Knew they existed, but never looked at them: 101 4.8%
Some, but less than 25% : 442 21.2%
About 25%: 260 12.5%
About 50%: 283 13.6%
About 75%: 298 14.3%
All or almost all: 487 23.3%
[10.3% of people taking the survey have never heard of the sequences. 36.3% have not read a quarter of them.]
Do you attend Less Wrong meetups?
Yes, regularly: 157 7.5%
Yes, once or a few times: 406 19.5%
No: 1518 72.9%
[However the in-person community seems to be non-dead.]
Is physical interaction with the Less Wrong community otherwise a part of your everyday life, for example do you live with other Less Wrongers, or you are close friends and frequently go out with them?
Yes, all the time: 158 7.6%
Yes, sometimes: 258 12.5%
No: 1652 79.9%
About the same number say they hang out with LWers 'all the time' as say they go to meetups. I wonder if people just double counted themselves here. Or they may go to meetups and have other interactions with LWers outside of that. Or it could be a coincidence and these are different demographics. Let's find out.
P(Community part of daily life | Meetups) = 40%
Significant overlap, but definitely not exclusive overlap. I'll go ahead and chalk this one up up to coincidence.
Have you ever been in a romantic relationship with someone you met through the Less Wrong community?
Yes: 129 6.2%
I didn't meet them through the community but they're part of the community now: 102 4.9%
No: 1851 88.9%
LessWrong Usage Differences Between 2016 and 2014 Surveys
How do you use Less Wrong?
I lurk, but never registered an account: +19.300% 1125 54.400%
I've registered an account, but never posted: -1.600% 271 13.100%
I've posted a comment, but never a top-level post: -7.600% 419 20.300%
I've posted in Discussion, but not Main: -5.100% 179 8.700%
I've posted in Main: -3.300% 73 3.500%
About how much of the sequences have you read?
Never knew they existed until this moment: +3.300% 217 10.400%
Knew they existed, but never looked at them: +2.100% 103 4.900%
Some, but less than 25%: +3.100% 442 21.100%
About 25%: +0.400% 260 12.400%
About 50%: -0.400% 284 13.500%
About 75%: -1.800% 299 14.300%
All or almost all: -5.000% 491 23.400%
Do you attend Less Wrong meetups?
Yes, regularly: -2.500% 160 7.700%
Yes, once or a few times: -2.100% 407 19.500%
No: +7.100% 1524 72.900%
Is physical interaction with the Less Wrong community otherwise a part of your everyday life, for example do you live with other Less Wrongers, or you are close friends and frequently go out with them?
Yes, all the time: +0.200% 161 7.700%
Yes, sometimes: -0.300% 258 12.400%
No: +2.400% 1659 79.800%
Have you ever been in a romantic relationship with someone you met through the Less Wrong community?
Yes: +0.800% 132 6.300%
I didn't meet them through the community but they're part of the community now: -0.400% 102 4.900%
No: +1.600% 1858 88.800%
Write Ins
In a bit of a silly oversight I forgot to ask survey participants what was good about the community, so the following is going to be a pretty one sided picture. Below are the complete write ins respondents submitted
Issues With LessWrong At It's Peak
Philosophical Issues With LessWrong At It's Peak[Part One]
Philosophical Issues With LessWrong At It's Peak[Part Two]
Community Issues With LessWrong At It's Peak[Part One]
Community Issues With LessWrong At It's Peak[Part Two]
Issues With LessWrong Now
Philosophical Issues With LessWrong Now[Part One]
Philosophical Issues With LessWrong Now[Part Two]
Community Issues With LessWrong Now[Part One]
Community Issues With LessWrong Now[Part Two]
Peak Philosophy Issue Tallies
| Label | Code | Tally |
|---|---|---|
| Arrogance | A | 16 |
| Bad Aesthetics | BA | 3 |
| Bad Norms | BN | 3 |
| Bad Politics | BP | 5 |
| Bad Tech Platform | BTP | 1 |
| Cultish | C | 5 |
| Cargo Cult | CC | 3 |
| Doesn't Accept Criticism | DAC | 3 |
| Don't Know Where to Start | DKWS | 5 |
| Damaged Me Mentally | DMM | 1 |
| Esoteric | E | 3 |
| Eliezer Yudkowsky | EY | 6 |
| Improperly Indexed | II | 7 |
| Impossible Mission | IM | 4 |
| Insufficient Social Support | ISS | 1 |
| Jargon | ||
| Literal Cult | LC | 1 |
| Lack of Rigor | LR | 14 |
| Misfocused | M | 13 |
| Mixed Bag | MB | 3 |
| Nothing | N | 13 |
| Not Enough Jargon | NEJ | 1 |
| Not Enough Roko's Basilisk | NERB | 1 |
| Not Enough Theory | NET | 1 |
| No Intuition | NI | 6 |
| Not Progressive Enough | NPE | 7 |
| Narrow Scholarship | NS | 20 |
| Other | O | 3 |
| Personality Cult | PC | 10 |
| None of the Above | ||
| Quantum Mechanics Sequence | QMS | 2 |
| Reinvention | R | 10 |
| Rejects Expertise | RE | 5 |
| Spoiled | S | 7 |
| Small Competent Authorship | SCA | 6 |
| Suggestion For Improvement | SFI | 1 |
| Socially Incompetent | SI | 9 |
| Stupid Philosophy | SP | 4 |
| Too Contrarian | TC | 2 |
| Typical Mind | TM | 1 |
| Too Much Roko's Basilisk | TMRB | 1 |
| Too Much Theory | TMT | 14 |
| Too Progressive | TP | 2 |
| Too Serious | TS | 2 |
| Unwelcoming | U | 8 |
Well, those are certainly some results. Top answers are:
Narrow Scholarship: 20
Arrogance: 16
Too Much Theory: 14
Lack of Rigor: 14
Misfocused: 13
Nothing: 13
Reinvention (reinvents the wheel too much): 10
Personality Cult: 10
So condensing a bit: Pay more attention to mainstream scholarship and ideas, try to do better about intellectual rigor, be more practical and focus on results, be more humble. (Labeled Dataset)
Peak Community Issue Tallies
| Label | Code | Tally |
|---|---|---|
| Arrogance | A | 7 |
| Assumes Reader Is Male | ARIM | 1 |
| Bad Aesthetics | BA | 1 |
| Bad At PR | BAP | 5 |
| Bad Norms | BN | 5 |
| Bad Politics | BP | 2 |
| Cultish | C | 9 |
| Cliqueish Tendencies | CT | 1 |
| Diaspora | D | 1 |
| Defensive Attitude | DA | 1 |
| Doesn't Accept Criticism | DAC | 3 |
| Dunning Kruger | DK | 1 |
| Elitism | E | 3 |
| Eliezer Yudkowsky | EY | 2 |
| Groupthink | G | 11 |
| Insufficiently Indexed | II | 9 |
| Impossible Mission | IM | 1 |
| Imposter Syndrome | IS | 1 |
| Jargon | J | 2 |
| Lack of Rigor | LR | 1 |
| Mixed Bag | MB | 1 |
| Nothing | N | 5 |
| ??? | NA | 1 |
| Not Big Enough | NBE | 3 |
| Not Enough of A Cult | NEAC | 1 |
| Not Enough Content | NEC | 7 |
| Not Enough Community Infrastructure | NECI | 10 |
| Not Enough Meetups | NEM | 5 |
| No Goals | NG | 2 |
| Not Nerdy Enough | NNE | 3 |
| None Of the Above | NOA | 1 |
| Not Progressive Enough | NPE | 3 |
| Not Rational | NR | 3 |
| NRx (Neoreaction) | NRx | 1 |
| Narrow Scholarship | NS | 4 |
| Not Stringent Enough | NSE | 3 |
| Parochialism | P | 1 |
| Pickup Artistry | PA | 2 |
| Personality Cult | PC | 7 |
| Reinvention | R | 1 |
| Recurring Arguments | RA | 3 |
| Rejects Expertise | RE | 2 |
| Sequences | S | 2 |
| Small Competent Authorship | SCA | 5 |
| Suggestion For Improvement | SFI | 1 |
| Spoiled Issue | SI | 9 |
| Socially INCOMpetent | SINCOM | 2 |
| Too Boring | TB | 1 |
| Too Contrarian | TC | 10 |
| Too COMbative | TCOM | 4 |
| Too Cis/Straight/Male | TCSM | 5 |
| Too Intolerant of Cranks | TIC | 1 |
| Too Intolerant of Politics | TIP | 2 |
| Too Long Winded | TLW | 2 |
| Too Many Idiots | TMI | 3 |
| Too Much Math | TMM | 1 |
| Too Much Theory | TMT | 12 |
| Too Nerdy | TN | 6 |
| Too Rigorous | TR | 1 |
| Too Serious | TS | 1 |
| Too Tolerant of Cranks | TTC | 1 |
| Too Tolerant of Politics | TTP | 3 |
| Too Tolerant of POSers | TTPOS | 2 |
| Too Tolerant of PROGressivism | TTPROG | 2 |
| Too Weird | TW | 2 |
| Unwelcoming | U | 12 |
| UTILitarianism | UTIL | 1 |
Top Answers:
Unwelcoming: 12
Too Much Theory: 12
Groupthink: 11
Not Enough Community Infrastructure: 10
Too Contrarian: 10
Insufficiently Indexed: 9
Cultish: 9
Again condensing a bit: Work on being less intimidating/aggressive/etc to newcomers, spend less time on navel gazing and more time on actually doing things and collecting data, work on getting the structures in place that will onboard people into the community, stop being so nitpicky and argumentative, spend more time on getting content indexed in a form where people can actually find it, be more accepting of outside viewpoints and remember that you're probably more likely to be wrong than you think. (Labeled Dataset)
One last note before we finish up, these tallies are a very rough executive summary. The tagging process basically involves trying to fit points into clusters and is prone to inaccuracy through laziness, adding another category being undesirable, square-peg into round-hole fitting, and my personal political biases. So take these with a grain of salt, if you really want to know what people wrote in my advice would be to read through the write in sets I have above in HTML format. If you want to evaluate for yourself how well I tagged things you can see the labeled datasets above.
I won't bother tallying the "issues now" sections, all you really need to know is that it's basically the same as the first sections except with lots more "It's dead." comments and from eyeballing it a higher proportion of people arguing that LessWrong has been taken over by the left/social justice and complaints about effective altruism. (I infer that the complaints about being taken over by the left are mostly referring to effective altruism.)
Traits Respondents Would Like To See In A Successor Community
Philosophically
Attention Paid To Outside Sources
More: 1042 70.933%
Same: 414 28.182%
Less: 13 0.885%
Self Improvement Focus
More: 754 50.706%
Same: 598 40.215%
Less: 135 9.079%
AI Focus
More: 184 12.611%
Same: 821 56.271%
Less: 454 31.117%
Political
More: 330 22.837%
Same: 770 53.287%
Less: 345 23.875%
Academic/Formal
More: 455 31.885%
Same: 803 56.272%
Less: 169 11.843%
In summary, people want a site that will engage with outside ideas, acknowledge where it borrows from, focus on practical self improvement, less on AI and AI risk, and tighten its academic rigor. They could go either way on politics but the epistemic direction is clear.
Community
Intense Environment
More: 254 19.644%
Same: 830 64.192%
Less: 209 16.164%
Focused On 'Real World' Action
More: 739 53.824%
Same: 563 41.005%
Less: 71 5.171%
Experts
More: 749 55.605%
Same: 575 42.687%
Less: 23 1.707%
Data Driven/Testing Of Ideas
More: 1107 78.344%
Same: 291 20.594%
Less: 15 1.062%
Social
More: 583 43.507%
Same: 682 50.896%
Less: 75 5.597%
This largely backs up what I said about the previous results. People want a more practical, more active, more social and more empirical LessWrong with outside expertise and ideas brought into the fold. They could go either way on it being more intense but the epistemic trend is still clear.
Write Ins
Diaspora Communities
So where did the party go? We got twice as many respondents this year as last when we opened up the survey to the diaspora, which means that the LW community is alive and kicking it's just not on LessWrong.
LessWrong
Yes: 353 11.498%
No: 1597 52.02%
LessWrong Meetups
Yes: 215 7.003%
No: 1735 56.515%
LessWrong Facebook Group
Yes: 171 5.57%
No: 1779 57.948%
LessWrong Slack
Yes: 55 1.792%
No: 1895 61.726%
SlateStarCodex
Yes: 832 27.101%
No: 1118 36.417%
[SlateStarCodex by far has the highest proportion of active LessWrong users, over twice that of LessWrong itself, and more than LessWrong and Tumblr combined.]
Rationalist Tumblr
Yes: 350 11.401%
No: 1600 52.117%
[I'm actually surprised that Tumblr doesn't just beat LessWrong itself outright, They're only a tenth of a percentage point behind though, and if current trends continue I suspect that by 2017 Tumblr will have a large lead over the main LW site.]
Rationalist Facebook
Yes: 150 4.886%
No: 1800 58.632%
[Eliezer Yudkowsky currently resides here.]
Rationalist Twitter
Yes: 59 1.922%
No: 1891 61.596%
Effective Altruism Hub
Yes: 98 3.192%
No: 1852 60.326%
FortForecast
Yes: 4 0.13%
No: 1946 63.388%
[I included this as a 'troll' option to catch people who just check every box. Relatively few people seem to have done that, but having the option here lets me know one way or the other.]
Good Judgement(TM) Open
Yes: 29 0.945%
No: 1921 62.573%
PredictionBook
Yes: 59 1.922%
No: 1891 61.596%
Omnilibrium
Yes: 8 0.261%
No: 1942 63.257%
Hacker News
Yes: 252 8.208%
No: 1698 55.309%
#lesswrong on freenode
Yes: 76 2.476%
No: 1874 61.042%
#slatestarcodex on freenode
Yes: 36 1.173%
No: 1914 62.345%
#hplusroadmap on freenode
Yes: 4 0.13%
No: 1946 63.388%
#chapelperilous on freenode
Yes: 10 0.326%
No: 1940 63.192%
[Since people keep asking me, this is a postrational channel.]
/r/rational
Yes: 274 8.925%
No: 1676 54.593%
/r/HPMOR
Yes: 230 7.492%
No: 1720 56.026%
[Given that the story is long over, this is pretty impressive. I'd have expected it to be dead by now.]
/r/SlateStarCodex
Yes: 244 7.948%
No: 1706 55.57%
One or more private 'rationalist' groups
Yes: 192 6.254%
No: 1758 57.264%
[I almost wish I hadn't included this option, it'd have been fascinating to learn more about these through write ins.]
Of all the parties who seem like plausible candidates at the moment, Scott Alexander seems most capable to undiaspora the community. In practice he's very busy, so he would need a dedicated team of relatively autonomous people to help him. Scott could court guest posts and start to scale up under the SSC brand, and I think he would fairly easily end up with the lions share of the free floating LWers that way.
Before I call a hearse for LessWrong, there is a glimmer of hope left:
Would you consider rejoining LessWrong?
I never left: 668 40.6%
Yes: 557 33.8%
Yes, but only under certain conditions: 205 12.5%
No: 216 13.1%
A significant fraction of people say they'd be interested in an improved version of the site. And of course there were write ins for conditions to rejoin, what did people say they'd need to rejoin the site?
Rejoin Condition Write Ins [Part One]
Rejoin Condition Write Ins [Part Two]
Rejoin Condition Write Ins [Part Three]
Rejoin Condition Write Ins [Part Four]
Rejoin Condition Write Ins [Part Five]
Feel free to read these yourselves (they're not long), but I'll go ahead and summarize: It's all about the content. Content, content, content. No amount of usability improvements, A/B testing or clever trickery will let you get around content. People are overwhelmingly clear about this; they need a reason to come to the site and right now they don't feel like they have one. That means priority number one for somebody trying to revitalize LessWrong is how you deal with this.
Let's recap.
Future Improvement Wishlist Based On Survey Results
Philosophical
- Pay more attention to mainstream scholarship and ideas.
- Improved intellectual rigor.
- Acknowledge sources borrowed from.
- Be more practical and focus on results.
- Be more humble.
Community
- Less intimidating/aggressive/etc to newcomers,
- Structures that will onboard people into the community.
- Stop being so nitpicky and argumentative.
- Spend more time on getting content indexed in a form where people can actually find it.
- More accepting of outside viewpoints.
While that list seems reasonable, it's quite hard to put into practice. Rigor, as the name implies requires high-effort from participants. Frankly, it's not fun. And getting people to do un-fun things without paying them is difficult. If LessWrong is serious about it's goal of 'advancing the art of human rationality' then it needs to figure out a way to do real investigation into the subject. Not just have people 'discuss', as though the potential for Rationality is within all of us just waiting to be brought out by the right conversation.
I personally haven't been a LW regular in a long time. Assuming the points about pedanticism, snipping, "well actually"-ism and the like are true then they need to stop for the site to move forward. Personally, I'm a huge fan of Scott Alexander's comment policy: All comments must be at least two of true, kind, or necessary.
-
True and kind - Probably won't drown out the discussion signal, will help significantly decrease the hostility of the atmosphere.
-
True and necessary - Sometimes what you have to say isn't nice, but it needs to be said. This is the common core of free speech arguments for saying mean things and they're not wrong. However, something being true isn't necessarily enough to make it something you should say. In fact, in some situations saying mean things to people entirely unrelated to their arguments is known as the ad hominem fallacy.
-
Kind and necessary - The infamous 'hugbox' is essentially a place where people go to hear things which are kind but not necessarily true. I don't think anybody wants a hugbox, but occasionally it can be important to say things that might not be true but are needed for the sake of tact, reconciliation, or to prevent greater harm.
If people took that seriously and really gave it some thought before they used their keyboard, I think the on-site LessWrong community would be a significant part of the way to not driving people off as soon as they arrive.
More importantly, in places like the LessWrong Slack I see this sort of happy go lucky attitude about site improvement. "Oh that sounds nice, we should do that." without the accompanying mountain of work to actually make 'that' happen. I'm not sure people really understand the dynamics of what it means to 'revive' a website in severe decay. When you decide to 'revive' a dying site, what you're really doing once you're past a certain point is refounding the site. So the question you should be asking yourself isn't "Can I fix the site up a bit so it isn't quite so stale?". It's "Could I have founded this site?" and if the answer is no you should seriously question whether to make the time investment.
Whether or not LessWrong lives to see another day basically depends on the level of ground game its last users and administrators can muster up. And if it's not enough, it won't.
Virtus junxit mors non separabit!
Google Deepmind and FHI collaborate to present research at UAI 2016

Oxford academics are teaming up with Google DeepMind to make artificial intelligence safer. Laurent Orseau, of Google DeepMind, and Stuart Armstrong, the Alexander Tamas Fellow in Artificial Intelligence and Machine Learning at the Future of Humanity Institute at the University of Oxford, will be presenting their research on reinforcement learning agent interruptibility at UAI 2016. The conference, one of the most prestigious in the field of machine learning, will be held in New York City from June 25-29. The paper which resulted from this collaborative research will be published in the Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI).
Orseau and Armstrong’s research explores a method to ensure that reinforcement learning agents can be repeatedly safely interrupted by human or automatic overseers. This ensures that the agents do not “learn” about these interruptions, and do not take steps to avoid or manipulate the interruptions. When there are control procedures during the training of the agent, we do not want the agent to learn about these procedures, as they will not exist once the agent is on its own. This is useful for agents that have a substantially different training and testing environment (for instance, when training a Martian rover on Earth, shutting it down, replacing it at its initial location and turning it on again when it goes out of bounds—something that may be impossible once alone unsupervised on Mars), for agents not known to be fully trustworthy (such as an automated delivery vehicle, that we do not want to learn to behave differently when watched), or simply for agents that need continual adjustments to their learnt behaviour. In all cases where it makes sense to include an emergency “off” mechanism, it also makes sense to ensure the agent doesn’t learn to plan around that mechanism.
Interruptibility has several advantages as an approach over previous methods of control. As Dr. Armstrong explains, “Interruptibility has applications for many current agents, especially when we need the agent to not learn from specific experiences during training. Many of the naive ideas for accomplishing this—such as deleting certain histories from the training set—change the behaviour of the agent in unfortunate ways.”
In the paper, the researchers provide a formal definition of safe interruptibility, show that some types of agents already have this property, and show that others can be easily modified to gain it. They also demonstrate that even an ideal agent that tends to the optimal behaviour in any computable environment can be made safely interruptible.
These results will have implications in future research directions in AI safety. As the paper says, “Safe interruptibility can be useful to take control of a robot that is misbehaving… take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform….” As Armstrong explains, “Machine learning is one of the most powerful tools for building AI that has ever existed. But applying it to questions of AI motivations is problematic: just as we humans would not willingly change to an alien system of values, any agent has a natural tendency to avoid changing its current values, even if we want to change or tune them. Interruptibility and the related general idea of corrigibility, allow such changes to happen without the agent trying to resist them or force them. The newness of the field of AI safety means that there is relatively little awareness of these problems in the wider machine learning community. As with other areas of AI research, DeepMind remains at the cutting edge of this important subfield.”
On the prospect of continuing collaboration in this field with DeepMind, Stuart said, “I personally had a really illuminating time writing this paper—Laurent is a brilliant researcher… I sincerely look forward to productive collaboration with him and other researchers at DeepMind into the future.” The same sentiment is echoed by Laurent, who said, “It was a real pleasure to work with Stuart on this. His creativity and critical thinking as well as his technical skills were essential components to the success of this work. This collaboration is one of the first steps toward AI Safety research, and there’s no doubt FHI and Google DeepMind will work again together to make AI safer.”
For more information, or to schedule an interview, please contact Kyle Scott at fhipa@philosophy.ox.ac.uk
A Second Year of Spaced Repetition Software in the Classroom
This is a follow-up to last year's report. Here, I will talk about my successes and failures using Spaced Repetition Software (SRS) in the classroom for a second year. The year's not over yet, but I have reasons for reporting early that should become clear in a subsequent post. A third post will then follow, and together these will constitute a small sequence exploring classroom SRS and the adjacent ideas that bubble up when I think deeply about teaching.
Summary
I experienced net negative progress this year in my efforts to improve classroom instruction via spaced repetition software. While this is mostly attributable to shifts in my personal priorities, I have also identified a number of additional failure modes for classroom SRS, as well as additional shortcomings of Anki for this use case. My experiences also showcase some fundamental challenges to teaching-in-general that SRS depressingly spotlights without being any less susceptible to. Regardless, I am more bullish than ever about the potential for classroom SRS, and will lay out a detailed vision for what it can be in the next post.
[LINK] Updating Drake's Equation with values from modern astronomy
A paper published in Astrobiology: A New Empirical Constraint on the Prevalence of Technological Species in the Universe (PDF), A. Frank and W.T. Sullivan.
From the abstract:
Recent advances in exoplanet studies provide strong constraints on all astrophysical terms in the Drake equation. [...] We find that as long as the probability that a habitable zone planet develops a technological species is larger than ~ 10-24, humanity is not the only time technological intelligence has evolved.
They say we now know with reasonable certainty the total number of stars ever to exist (in the observable universe), and the average number of planets in the habitable zone. But we still don't know the probabilities of life, intelligence, and technology arising. They call this cumulative unknown factor fbt.
Their result: for technological civilization to arise no more than once, with probability 0.01, in the lifetime of the observable universe, fbt should be no greater than ~ 2.5 x 10-24.
Discussion
It's convenient that they calculate the chance technological civilization ever arose, rather than the chance one exists now. This is just the number we need to estimate the likelihood of a Great Filter.
They state their result as "[if we set fbt ≤ 2.5 x 10-24, then] at in a statistical sense were we to rerun the history of the Universe 100 times, only once would a lone technological species occur". But I don't know what rerunning the Universe means. I also can't formulate this as saying "if we hadn't already observed the Universe to be apparently empty of life, we would expect it to contain or to have once contained life with a probability of 1024", because that would ignore the chance that another civilization (if it counterfactually existed) would have affected or prevented the rise of life on Earth. Can someone help reformulate this?
I don't know if their modern values for star and planet formation have been used in previous discussions of the Fermi paradox or the Great Filter. (The papers they cite for their values date from 2012, 2013 and 2015.) I also don't know if these values should be trusted, or what concrete values had been used previously. People on top of the Great Filter discussion probably already updated when the astronomical data came in.
JFK was not assassinated: prior probability zero events
A lot of my work involves tweaking the utility or probability of an agent to make it believe - or act as if it believed - impossible or almost impossible events. But we have to be careful about this; an agent that believes the impossible may not be so different from one that doesn't.
Consider for instance an agent that assigns a prior probability of zero to JFK ever having been assassinated. No matter what evidence you present to it, it will go on disbelieving the "non-zero gunmen theory".
Initially, the agent will behave very unusually. If it was in charge of JFK's security in Dallas before the shooting, it would have sent all secret service agents home, because no assassination could happen. Immediately after the assassination, it would have disbelieved everything. The films would have been faked or misinterpreted; the witnesses, deluded; the dead body of the president, that of twin or an actor. It would have had huge problems with the aftermath, trying to reject all the evidence of death, seeing a vast conspiracy to hide the truth of JFK's non-death, including the many other conspiracy theories that must be false flags, because they all agree with the wrong statement that the president was actually assassinated.
But as time went on, the agent's behaviour would start to become more and more normal. It would realise the conspiracy was incredibly thorough in its faking of the evidence. All avenues it pursued to expose them would come to naught. It would stop expecting people to come forward and confess the joke, it would stop expecting to find radical new evidence overturning the accepted narrative. After a while, it would start to expect the next new piece of evidence to be in favour of the assassination idea - because if a conspiracy has been faking things this well so far, then they should continue to do so in the future. Though it cannot change its view of the assassination, its expectation for observations converge towards the norm.
If it does a really thorough investigation, it might stop believing in a conspiracy at all. At some point, the probability of a miracle will start to become more likely than a perfect but undetectable conspiracy. It is very unlikely that Lee Harvey Oswald shot at JFK, missed, and the president's head exploded simultaneously for unrelated natural causes. But after a while, such a miraculous explanation will start to become more likely than anything else the agent can consider. This explanation opens the possibility of miracles; but again, if the agent is very thorough, it will fail to find evidence of other miracles, and will probably settle on "an unrepeatable miracle caused JFK's death in a way that is physically undetectable".
But then note that such an agent will have a probability distribution over future events that is almost indistinguishable from a normal agent that just believes the standard story of JFK being assassinated. The zero-prior has been negated, not in theory but in practice.
How to do proper probability manipulation
This section is still somewhat a work in progress.
So the agent believes one false fact about the world, but its expectation is otherwise normal. This can be both desirable and undesirable. The negative is if we try and control the agent forever by giving it a false fact.
To see the positive, ask why would we want an agent to believe impossible things in the first place? Well, one example was an Oracle design where the Oracle didn't believe its output message would ever be read. Here we wanted the Oracle to believe the message wouldn't be read, but not believe anything else too weird about the world.
In terms of causality, if X designates the message being read at time t, and B and A are event before and after t, respectively, we want P(B|X)≈P(B) (probabilities about current facts in the world shouldn't change much) while P(A|X)≠P(A) is fine and often expected (the future should be different if the message is read or not).
In the JFK example, the agent eventually concluded "a miracle happened". I'll call this miracle a scrambling point. It's kind of a breakdown in causality: two futures are merged into one, given two different pasts. The two pasts are "JFK was assassinated" and "JFK wasn't assassinated", and their common scrambled future is "everything appears as if JFK was assassinated". The non-assassination belief has shifted the past but not the future.
For the Oracle, we want to do the reverse: we want the non-reading belief to shift the future but not the past. However, unlike the JFK assassination, we can try and build the scrambling point. That's why I always talk about messages going down noisy wires, or specific quantum events, or chaotic processes. If the past goes through a truly stochastic event (it doesn't matter whether there is true randomness or just that the agent can't figure out the consequences), we can get what we want.
The Oracle idea will go wrong if the Oracle conclude that non-reading must imply something is different about the past (maybe it can see through chaos in ways we thought it couldn't), just as the JFK assassination denier will continue to be crazy if can't find a route to reach "everything appears as if JFK was assassinated".
But there is a break in the symmetry: the JFK assassination denier will eventually reach that point as long as the world is complex and stochastic enough. While the Oracle requires that the future probabilities be the same in all (realistic) past universes.
Now, once the Oracle's message has been read, the Oracle will find itself in the same situation as the other agent: believing an impossible thing. For Oracles, we can simply reset them. Other agents might have to behave more like the JFK assassination disbeliever. Though if we're careful, we can quantify things more precisely, as I attempted to do here.
What is up with carbon dioxide and cognition? An offer
One or two research groups have published work on carbon dioxide and cognition. The state of the published literature is confusing.
Here is one paper on the topic. The authors investigate a proprietary cognitive benchmark, and experimentally manipulate carbon dioxide levels (without affecting other measures of air quality). They find implausibly large effects from increased carbon dioxide concentrations.
If the reported effects are real and the suggested interpretation is correct, I think it would be a big deal. To put this in perspective, carbon dioxide concentrations in my room vary between 500 and 1500 ppm depending on whether I open the windows. The experiment reports on cognitive effects for moving from 600 and 1000 ppm, and finds significant effects compared to interindividual differences.
I haven't spent much time looking into this (maybe 30 minutes, and another 30 minutes to write this post). I expect that if we spent some time looking into indoor CO2 we could have a much better sense of what was going on, by some combination of better literature review, discussion with experts, looking into the benchmark they used, and just generally thinking about it.
So, here's a proposal:
- If someone looks into this and writes a post that improves our collective understanding of the issue, I will be willing to buy part of an associated certificate of impact, at a price of around $100*N, where N is my own totally made up estimate of how many hours of my own time it would take to produce a similarly useful writeup. I'd buy up to 50% of the certificate at that price.
- Whether or not they want to sell me some of the certificate, on May 1 I'll give a $500 prize to the author of the best publicly-available analysis of the issue. If the best analysis draws heavily on someone else's work, I'll use my discretion: I may split the prize arbitrarily, and may give it to the earlier post even if it is not quite as excellent.
Some clarifications:
- The metric for quality is "how useful it is to Paul." I hope that's a useful proxy for how useful it is in general, but no guarantees. I am generally a pretty skeptical person. I would care a lot about even a modest but well-established effect on performance.
- These don't need to be new analyses, either for the prize or the purchase.
- I reserve the right to resolve all ambiguities arbitrarily, and in the end to do whatever I feel like. But I promise I am generally a nice guy.
- I posted this 2 weeks ago on the EA forum and haven't had serious takers yet.
AlphaGo versus Lee Sedol
There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.
The original paper in Nature about AlphaGo.
Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.
Discussion on Hacker News after AlphaGo's win of the first game.
Lesswrong Survey - invitation for suggestions
Given that it's been a while since the last survey (http://lesswrong.com/lw/lhg/2014_survey_results/)
It's now time to open the floor to suggestions of improvements to the last survey. If you have a question you think should be on the survey (perhaps with reasons why, predictions as to the result, or other useful commentary about a survey question)
Alternatively questions that should not be included in the next survey, with similar reasons as to why...
survey is now up (2016-03-26) http://lesswrong.com/lw/nfk/lesswrong_2016_survey/
Require contributions in advance
If you are a person who finds it difficult to tell "no" to their friends, this one weird trick may save you a lot of time!
Scenario 1
Alice: "Hi Bob! You are a programmer, right?"
Bob: "Hi Alice! Yes, I am."
Alice: "I have this cool idea, but I need someone to help me. I am not good with computers, and I need someone smart whom I could trust, so they wouldn't steal my idea. Would you have a moment to listen to me?"
Alice explains to Bob her idea that would completely change the world. Well, at the least the world of bicycle shopping.
Instead of having many shops for bicycles, there could be one huge e-shop that would collect all the information about bicycles from all the existing shops. The customers would specify what kind of a bike they want (and where they live), and the system would find all bikes that fit the specification, and display them ordered by lowest price, including the price of delivery; then it would redirect them to the specific page of the specific vendor. Customers would love to use this one website, instead of having to visit multiple shops and compare. And the vendors would have to use this shop, because that's where the customers would be. Taking a fraction of a percent from the sales could make Alice (and also Bob, if he helps her) incredibly rich.
Bob is skeptical about it. The project suffers from the obvious chicken-and-egg problem: without vendors already there, the customers will not come (and if they come by accident, they will quickly leave, never to return again); and without customers already there, there is no reason for the vendors to cooperate. There are a few ways how to approach this problem, but the fact that Alice didn't even think about it is a red flag. She also has no idea who are the big players in the world of bicycle selling; and generally she didn't do her homework. But after pointing out all these objections, Alice still remains super enthusiastic about the project. She promises she will take care about everything -- she just cannot write code, and she needs Bob's help for this part.
Bob believes strongly in the division of labor, and that friends should help each other. He considers Alice his friend, and he will likely need some help from her in the future. Fact is, with perfect specification, he could make the webpage in a week or two. But he considers bicycles to be an extremely boring topic, so he wants to spend as little time as possible on this project. Finally, he has an idea:
"Okay, Alice, I will make the website for you. But first I need to know exactly how the page will look like, so that I don't have to keep changing it over and over again. So here is the homework for you -- take a pen and paper, and make a sketch of how exactly the web will look like. All the dialogs, all the buttons. Don't forget logging in and logging out, editing the customer profile, and everything else that is necessary for the website to work as intended. Just look at the papers and imagine that you are the customer: where exactly would you click to register, and to find the bicycle you want? Same for the vendor. And possibly a site administrator. Also give me the list of criteria people will use to find the bike they want. Size, weight, color, radius of wheels, what else? And when you have it all ready, I will make the first version of the website. But until then, I am not writing any code."
Alice leaves, satisfied with the outcome.
This happened a year ago.
No, Alice doesn't have the design ready, yet. Once in a while, when she meets Bob, she smiles at him and apologizes that she didn't have the time to start working on the design. Bob smiles back and says it's okay, he'll wait. Then they change the topic.
Scenario 2
Cyril: "Hi Diana! You speak Spanish, right?"
Diana: "Hi Cyril! Yes, I do."
Cyril: "You know, I think Spanish is the most cool language ever, and I would really love to learn it! Could you please give me some Spanish lessons, once in a while? I totally want to become fluent in Spanish, so I could travel to Spanish-speaking countries and experience their culture and food. Would you please help me?"
Diana is happy that someone takes interest in her favorite hobby. It would be nice to have someone around she could practice Spanish conversation with. The first instinct is to say yes.
But then she remembers (she knows Cyril for some time; they have a lot of friends in common, so they meet quite regularly) that Cyril is always super enthusiastic about something he is totally going to do... but when she meets him next time, he is super enthusiastic about something completely different; and she never heard about him doing anything serious about his previous dreams.
Also, Cyril seems to seriously underestimate how much time does it take to learn a foreign language fluently. Some lessons, once in a while will not do it. He also needs to study on his own. Preferably every day, but twice a week is probably a minimum, if he hopes to speak the language fluently within a year. Diana would be happy to teach someone Spanish, but not if her effort will most likely be wasted.
Diana: "Cyril, there is this great website called Duolingo, where you can learn Spanish online completely free. If you give it about ten minutes every day, maybe after a few months you will be able to speak fluently. And anytime we meet, we can practice the vocabulary you have already learned."
This would be the best option for Diana. No work, and another opportunity to practice. But Cyril insists:
"It's not the same without the live teacher. When I read something from the textbook, I cannot ask additional questions. The words that are taught are often unrelated to the topics I am interested in. I am afraid I will just get stuck with the... whatever was the website that you mentioned."
For Diana this feels like a red flag. Sure, textbooks are not optimal. They contain many words that the student will not use frequently, and will soon forget them. On the other hand, the grammar is always useful; and Diana doesn't want to waste her time explaining the basic grammar that any textbook could explain instead. If Cyril learns the grammar and some basic vocabulary, then she can teach him all the specialized vocabulary he is interested in. But now it feels like Cyril wants to avoid all work. She has to draw a line:
"Cyril, this is the address of the website." She takes his notebook and writes 'www.duolingo.com'. "You register there, choose Spanish, and click on the first lesson. It is interactive, and it will not take you more than ten minutes. If you get stuck there, write here what exactly it was that you didn't understand; I will explain it when we meet. If there is no problem, continue with the second lesson, and so on. When we meet next time, tell me which lessons you have completed, and we will talk about them. Okay?"
Cyril nods reluctantly.
This happened a year ago.
Cyril and Diana have met repeatedly during the year, but Cyril never brought up the topic of Spanish language again.
Scenario 3
Erika: "Filip, would you give me a massage?"
Filip: "Yeah, sure. The lotion is in the next room; bring it to me!"
Erika brings the massage lotion and lies on the bed. Filip massages her back. Then they make out and have sex.
This happened a year ago. Erika and Filip are still a happy couple.
Filip's previous relationships didn't work well, in long term. In retrospect, they all followed a similar scenario. At the beginning, everything seemed great. Then at some moment the girl started acting... unreasonably?... asking Filip to do various things for her, and then acting annoyed when Filip did exactly what he was asked to do. This happened more and more frequently, and at some moment she broke up with him. Sometimes she provided explanation for breaking up that Filip was unable to decipher.
Filip has a friend who is a successful salesman. Successful both professionally and with women. When Filip admitted to himself that he is unable to solve the problem on his own, he asked his friend for advice.
"It's because you're a f***ing doormat," said the friend. "The moment a woman asks you to do anything, you immediately jump and do it, like a well-trained puppy. Puppies are cute, but not attractive. Have you ready any of those books I sent you, like, ten years ago? I bet you didn't. Well, it's all there."
Filip sighed: "Look, I'm not trying to become a pick-up artist. Or a salesman. Or anything. No offense, but I'm not like you, personality-wise, I never have been, and I don't want to become your - or anyone else's - copy. Even if it would mean greater success in anything. I prefer to treat other people just like I would want them to treat me. Most people reciprocate nice behavior; and those who don't, well, I avoid them as much as possible. This works well with my friends. It also works with the girls... at the beginning... but then somehow... uhm... Anyway, all your books are about manipulating people, which is ethically unacceptable for me. Isn't there some other way?"
"All human interaction is manipulation; the choice is between doing it right or wrong, acting consciously or driven by your old habits..." started the friend, but then he gave up. "Okay, I see you're not interested. Just let me show you the most obvious mistake you make. You believe that when you are nice to people, they will perceive you as nice, and most of them will reciprocate. And when you act like an asshole, it's the other way round. That's correct, on some level; and in a perfect world this would be the whole truth. But on a different level, people also perceive nice behavior as weakness; especially if you do it habitually, as if you don't have any other option. And being an asshole obviously signals strength: you are not afraid to make other people angry. Also, in long term, people become used to your behavior, good or bad. The nice people don't seem so nice anymore, but they still seem weak. Then, ironicaly, if the person well-known to be nice refuses to do something once, people become really angry, because their expectations were violated. And if the asshole decides to do something nice once, they will praise him, because he surprised them pleasantly. You should be an asshole once in a while, to make people see that you have a choice, so they won't take your niceness for granted. Or if your girlfriend wants something from you, sometimes just say no, even if you could have done it. She will respect you more, and then she will enjoy more the things you do for her."
Filip: "Well, I... probably couldn't do that. I mean, what you say seems to make sense, however much I hate to admit it. But I can't imagine doing it myself, especially to a person I love. It's just... uhm... wrong."
"Then, I guess, the very least you could do is to ask her to do something for you first. Even if it's symbolic, that doesn't matter; human relationships are mostly about role-playing anyway. Don't jump immediately when you are told to; always make her jump first, if only a little. That will demonstrate strength without hurting anyone. Could you do that?"
Filip wasn't sure, but at the next opportunity he tried it, and it worked. And it kept working. Maybe it was all just a coincidence, maybe it was a placebo effect, but Filip doesn't mind. At first it felt kinda artificial, but then it became natural. And later, to his surprise, Filip realized that practicing these symbolic demands actually makes it easier to ask when he really needed something. (In which case sometimes he was asked to do something first, because his girlfriend -- knowingly or not? he never had the courage to ask -- copied the pattern; or maybe she has already known it long before. But he didn't mind that either.)
The lesson is: If you find yourself repeatedly in situations where people ask you to do something for them, but at the end they don't seem to appreciate what you did for them, or don't even care about the thing they asked you to do... and yet you find it difficult to say "no"... ask them to contribute to the project first.
This will help you get rid of the projects they don't care about (including the ones they think they care about in far mode, but do not care about enough to actually work on them in near mode) without being the one who refuses cooperation. Also, the act of asking the other person to contribute, after being asked to do something for them, mitigates the status loss inherent in working for them.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)