Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[moderator action] The_Lion and The_Lion2 are banned

51 Viliam_Bur 30 January 2016 02:09AM

Accounts "The_Lion" and "The_Lion2" are banned now. Here is some background, mostly for the users who weren't here two years ago:


User "Eugine_Nier" was banned for retributive downvoting in July 2014. He keeps returning to the website using new accounts, such as "Azathoth123", "Voiceofra", "The_Lion", and he keeps repeating the behavior that got him banned originally.

The original ban was permanent. It will be enforced on all future known accounts of Eugine. (At random moments, because moderators sometimes feel too tired to play whack-a-mole.) This decision is not open to discussion.


Please note that the moderators of LW are the opposite of trigger-happy. Not counting spam, there is on average less than one account per year banned. I am writing this explicitly, to avoid possible misunderstanding among the new users. Just because you have read about someone being banned, it doesn't mean that you are now at risk.

Most of the time, LW discourse is regulated by the community voting on articles and comments. Stupid or offensive comments get downvoted; you lose some karma, then everyone moves on. In rare cases, moderators may remove specific content that goes against the rules. The account ban is only used in the extreme cases (plus for obvious spam accounts). Specifically, on LW people don't get banned for merely not understanding something or disagreeing with someone.


What does "retributive downvoting" mean? Imagine that in a discussion you write a comment that someone disagrees with. Then in a few hours you will find that your karma has dropped by hundreds of points, because someone went through your entire comment history and downvoted all comments you ever wrote on LW; most of them completely unrelated to the debate that "triggered" the downvoter.

Such behavior is damaging to the debate and the community. Unlike downvoting a specific comment, this kind of mass downvoting isn't used to correct a faux pas, but to drive a person away from the website. It has especially strong impact on new users, who don't know what is going on, so they may mistake it for a reaction of the whole community. But even in experienced users it creates an "ugh field" around certain topics known to invoke the reaction. Thus a single user has achieved disproportional control over the content and the user base of the website. This is not desired, and will be punished by the site owners and the moderators.

To avoid rules lawyering, there is no exact definition of how much downvoting breaks the rules. The rule of thumb is that you should upvote or downvote each comment based on the value of that specific comment. You shouldn't vote on the comments regardless of their content merely because they were written by a specific user.

Upcoming LW Changes

41 Vaniver 03 February 2016 05:34AM

Thanks to the reaction to this article and some conversations, I'm convinced that it's worth trying to renovate and restore LW. Eliezer, Nate, and Matt Fallshaw are all on board and have empowered me as an editor to see what we can do about reshaping LW to meet what the community currently needs. This involves a combination of technical changes and social changes, which we'll try to make transparently and non-intrusively.

continue reading »

Anxiety and Rationality

31 helldalgo 19 January 2016 06:30PM

Recently, someone on the Facebook page asked if anyone had used rationality to target anxieties.  I have, so I thought I’d share my LessWrong-inspired strategies.  This is my first post, so feedback and formatting help are welcome.  

First things first: the techniques developed by this community are not a panacea for mental illness.  They are way more effective than chance and other tactics at reducing normal bias, and I think many mental illnesses are simply cognitive biases that are extreme enough to get noticed.  In other words, getting a probability question about cancer systematically wrong does not disrupt my life enough to make the error obvious.  When I believe (irrationally) that I will get fired because I asked for help at work, my life is disrupted.  I become non-functional, and the error is clear.

Second: the best way to attack anxiety is to do the things that make your anxieties go away.  That might seem too obvious to state, but I’ve definitely been caught in an “analysis loop,” where I stay up all night reading self-help guides only to find myself non-functional in the morning because I didn’t sleep.  If you find that attacking an anxiety with Bayesian updating is like chopping down the Washington monument with a spoon, but getting a full night’s sleep makes the monument disappear completely, consider the sleep.  Likewise for techniques that have little to no scientific evidence, but are a good placebo.  A placebo effect is still an effect.

Finally, like all advice, this comes with Implicit Step Zero:  “Have enough executive function to give this a try.”  If you find yourself in an analysis loop, you may not yet have enough executive function to try any of the advice you read.  The advice for functioning better is not always identical to the advice for functioning at all.  If there’s interest in an “improving your executive function” post, I’ll write one eventually.  It will be late, because my executive function is not impeccable.

Simple updating is my personal favorite for attacking specific anxieties.  A general sense of impending doom is a very tricky target and does not respond well to reality.  If you can narrow it down to a particular belief, however, you can amass evidence against it. 

Returning to my example about work: I alieved that I would get fired if I asked for help or missed a day due to illness.  The distinction between believe and alieve is an incredibly useful tool that I immediately integrated when I heard of it.  Learning to make beliefs pay rent is much easier than making harmful aliefs go away.  The tactics are similar: do experiments, make predictions, throw evidence at the situation until you get closer to reality.  Update accordingly.  

The first thing I do is identify the situation and why it’s dysfunctional.  The alief that I’ll get fired for asking for help is not actually articulated when it manifests as an anxiety.  Ask me in the middle of a panic attack, and I still won’t articulate that I am afraid of getting fired.  So I take the anxiety all the way through to its implication.  The algorithm is something like this:

  1.       Notice sense of doom
  2.       Notice my avoidance behaviors (not opening my email, walking away from my desk)
  3.       Ask “What am I afraid of?”
  4.       Answer (it's probably silly)
  5.       Ask “What do I think will happen?”
  6.       Make a prediction about what will happen (usually the prediction is implausible, which is why we want it to go away in the first place)

In the “asking for help” scenario, the answer to “what do I think will happen” is implausible.  It’s extremely unlikely that I’ll get fired for it!  This helps take the gravitas out of the anxiety, but it does not make it go away.*  After (6), it’s usually easy to do an experiment.  If I ask my coworkers for help, will I get fired?  The only way to know is to try. 

…That’s actually not true, of course.  A sense of my environment, my coworkers, and my general competence at work should be enough.  But if it was, we wouldn’t be here, would we?

So I perform the experiment.  And I wait.  When I receive a reply of any sort, even if it’s negative, I make a tick mark on a sheet of paper.  I label it “didn’t get fired.”  Because again, even if it’s negative, I didn’t get fired. 

This takes a lot of tick marks.  Cutting down the Washington monument with a spoon, remember?

The tick marks don’t have to be physical.  I prefer it, because it makes the “updating” process visual.  I’ve tried making a mental note and it’s not nearly as effective.  Play around with it, though.  If you’re anything like me, you have a lot of anxieties to experiment with. 

Usually, the anxiety starts to dissipate after obtaining several tick marks.  Ideally, one iteration of experiments should solve the problem.  But we aren’t ideal; we’re mentally ill.  Depending on the severity of the anxiety, you may need someone to remind you that doom will not occur.  I occasionally panic when I have to return to work after taking a sick day.  I ask my husband to remind me that I won’t get fired.  I ask him to remind me that he’ll still love me if I do get fired.  If this sounds childish, it’s because it is.  Again: we’re mentally ill.  Even if you aren’t, however, assigning value judgements to essentially harmless coping mechanisms does not make sense.  Childish-but-helpful is much better than mature-and-harmful, if you have to choose.

I still have tiny ugh fields around my anxiety triggers.  They don’t really go away.  It’s more like learning not to hit someone you’re angry at.  You notice the impulse, accept it, and move on.  Hopefully, your harmful alief starves to death.

If you perform your experiment and doom does occur, it might not be you.  If you can’t ask your boss for help, it might be your boss.  If you disagree with your spouse and they scream at you for an hour, it might be your spouse.  This isn’t an excuse to blame your problems on the world, but abusive situations can be sneaky.  Ask some trusted friends for a sanity check, if you’re performing experiments and getting doom as a result.  This is designed for situations where your alief is obviously silly.  Where you know it’s silly, and need to throw evidence at your brain to internalize it.  It’s fine to be afraid of genuinely scary things; if you really are in an abusive work environment, maybe you shouldn’t ask for help (and start looking for another job instead). 



*using this technique for several months occasionally stops the anxiety immediately after step 6.  

To contribute to AI safety, consider doing AI research

25 Vika 16 January 2016 08:42PM

Among those concerned about risks from advanced AI, I've encountered people who would be interested in a career in AI research, but are worried that doing so would speed up AI capability relative to safety. I think it is a mistake for AI safety proponents to avoid going into the field for this reason (better reasons include being well-positioned to do AI safety work, e.g. at MIRI or FHI). This mistake contributed to me choosing statistics rather than computer science for my PhD, which I have some regrets about, though luckily there is enough overlap between the two fields that I can work on machine learning anyway. I think the value of having more AI experts who are worried about AI safety is far higher than the downside of adding a few drops to the ocean of people trying to advance AI. Here are several reasons for this:

  1. Concerned researchers can inform and influence their colleagues, especially if they are outspoken about their views.
  2. Studying and working on AI brings understanding of the current challenges and breakthroughs in the field, which can usefully inform AI safety work (e.g. wireheading in reinforcement learning agents).
  3. Opportunities to work on AI safety are beginning to spring up within academia and industry, e.g. through FLI grants. In the next few years, it will be possible to do an AI-safety-focused PhD or postdoc in computer science, which would hit two birds with one stone.

To elaborate on #1, one of the prevailing arguments against taking long-term AI safety seriously is that not enough experts in the AI field are worried. Several prominent researchers have commented on the potential risks (Stuart Russell, Bart Selman, Murray Shanahan, Shane Legg, and others), and more are concerned but keep quiet for reputational reasons. An accomplished, strategically outspoken and/or well-connected expert can make a big difference in the attitude distribution in the AI field and the level of familiarity with the actual concerns (which are not about malevolence, sentience, or marching robot armies). Having more informed skeptics who have maybe even read Superintelligence, and fewer uninformed skeptics who think AI safety proponents are afraid of Terminators, would produce much needed direct and productive discussion on these issues. As the proportion of informed and concerned researchers in the field approaches critical mass, the reputational consequences for speaking up will decrease.

A year after FLI's Puerto Rico conference, the subject of long-term AI safety is no longer taboo among AI researchers, but remains rather controversial. Addressing AI risk on the long term will require safety work to be a significant part of the field, and close collaboration between those working on safety and capability of advanced AI. Stuart Russell makes the apt analogy that "just as nuclear fusion researchers consider the problem of containment of fusion reactions as one of the primary problems of their field, issues of control and safety will become central to AI as the field matures". If more people who are already concerned about AI safety join the field, we can make this happen faster, and help wisdom win the race with capability.

(Cross-posted from my blog. Thanks to Janos Kramar for his help with editing this post.)

A Rationalist Guide to OkCupid

22 Jacobian 03 February 2016 08:50PM

There's a lot of data and research on what makes people successful at online dating, but I don't know anyone who actually tried to wholeheartedly apply this to themselves. I decided to be that person: I implemented lessons from data, economics, game theory and of course rationality in my profile and strategy and OkCupid. Shockingly, it worked! I got a lot of great dates, learned a ton and found the love of my life. I didn't expect dating to be my "rationalist win", but it happened.

Here's the first part of the story, I hope you'll find some useful tips and maybe a dollop of inspiration among all the silly jokes.


Does anyone know who curates the "Latest on rationality blogs" toolbar? What are the requirements to be included?


How did my baby die and what is the probability that my next one will?

21 deprimita_patro 19 January 2016 06:24AM

Summary: My son was stillborn and I don't know why. My wife and I would like to have another child, but would very much not like to try if the probability of this occurring again is above a certain threshold (of which we have already settled on one). All 3 doctors I have consulted were unable to give a definitive cause of death, nor were any willing to give a numerical estimate of the probability (whether for reasons of legal risk, or something else) that our next baby will be stillborn. I am likely too mind-killed to properly evaluate my situation and would very much appreciate an independent (from mine) probability estimate of what caused my son to die, and given that cause, what is the recurrence risk?

Background: V (L and my only biologically related living son) had no complications during birth, nor has he showed any signs of poor health whatsoever. L has a cousin who has had two miscarriages, and I have an aunt who had several stillbirths followed by 3 live births of healthy children. We know of no other family members that have had similar misfortunes.

J (my deceased son) was the product of a 31 week gestation. L (my wife and J's mother) is 28 years old, gravida 2, para 1. L presented to the physicians office for routine prenatal care and noted that she had not felt any fetal movement for the last five to six days. No fetal heart tones were identified. It was determined that there was an intrauterine fetal demise. L was admitted on 11/05/2015 for induction and was delivered of a nonviable, normal appearing, male fetus at approximately 1:30 on 11/06/2015.

Pro-Con Reasoning: According to a leading obstetrics textbook1, causes of stillbirth are commonly classified into 8 categories: obstetrical complications, placental abnormalities, fetal malformations, infection, umbilical cord abnormalities, hypertensive disorders, medical complications, and undetermined. Below, I'll list the percentage of stillbirths in each category (which may be used as prior probabilities) along with some reasons for or against.

Obstetrical complications (29%)

  • Against: No abruption detected. No multifetal gestation. No ruptured preterm membranes at 20-24 weeks.

Placental abnormalities (24%)

  • For: Excessive fibrin deposition (as concluded in the surgical pathology report). Early acute chorioamnionitis (as conclused in the surgical pathology report, but Dr. M claimed this was caused by the baby's death, not conversely). L has gene variants associated with deep vein thrombosis (AG on rs2227589 per 23andme raw data).
  • Against: No factor V Leiden mutation (GG on rs6025 per 23andme raw data and confirmed via independent lab test). No prothrombin gene mutation (GG on l3002432 per 23andme raw data and confirmed via independent lab test). L was negative for prothrombin G20210A mutation (as determined by lab test). Anti-thrombin III activity results were within normal reference ranges (as determined by lab test). Protein C activity results were withing normal reference ranges (as determined by lab test). Protein S activity results were within normal reference ranges (as determined by lab test). Protein S antigen (free and total) results were within normal references ranges (as determined by lab test).

Infection (13%)

  • For: L visited a nurse's home during the last week of August that works in a hospital we now know had frequent cases of CMV infection. CMV antibody IgH, CMV IgG, and Parvovirus B-19 Antibody IgG values were outside of normal reference ranges.
  • Against: Dr. M discounted the viral test results as the cause of death, since the levels suggested the infection had occurred years ago, and therefore could not have caused J's death. Dr. F confirmed Dr. M's assessment.

Fetal malformations (14%)

  • Against: No major structural abnormalities. No genetic abnormalities detected (CombiSNP Array for Pregnancy Loss results showed a normal male micro array profile).

Umbilical cord abnormalities (10%)

  • Against: No prolapse. No stricture. No thrombosis.

Hypertensive disorder (9%)

  • Against: No preeclampsia. No chronic hypertension.

Medical complications (8%)

  • For: L experienced 2 nights of very painful abdominal pains that could have been contractions on 10/28 and 10/29. L remembers waking up on her back a few nights between 10/20 and 11/05 (it is unclear if this belongs in this category or somewhere else).
  • Against: No antiphospholipid antibody syndrome detected (determined via Beta-2 Glycoprotein I Antibodies [IgG, IgA, IgM] test). No maternal diabetes detected (determined via glucose test on 10/20).

Undetermined (24%)

What is the most likely cause of death? How likely is that cause? Given that cause, if we choose to have another child, then how likely is it to survive its birth? Are there any other ways I could reduce uncertainty (additional tests, etc...) that I haven't listed here? Are there any other forums where these questions are more likely to get good answers? Why won't doctors give probabilities? Help with any of these questions would be greatly appreciated. Thank you.

If your advice to me is to consult another expert (in addition to the 2 obstetricians and 1 high-risk obstetrician I already have consulted), please also provide concrete tactics as to how to find such an expert and validate their expertise.

Contact Information: If you would like to contact me, but don't want to create an account here, you can do so at

[1] Cunningham, F. (2014). Williams obstetrics. New York: McGraw-Hill Medical.

EDIT 1: Updated to make clear that both V and J are mine and L's biological sons.

EDIT 2: Updated to add information on family history.

EDIT 3: On PipFoweraker's advice, I added contact info.

EDIT 4: I've cross-posted this on Health Stack Exchange.

EDIT 5: I've emailed the list of authors of the most recent meta-analysis concerning causes of stillbirth. Don't expect much.

A Medical Mystery: Thyroid Hormones, Chronic Fatigue and Fibromyalgia

15 johnlawrenceaspden 31 January 2016 01:27PM


  Chronic Fatigue and Fibromyalgia look very like Hypothyroidism
  Thyroid Patients aren't happy with the diagnosis and treatment of Hypothyroidism
  It's possible that it's not too difficult to fix CFS/FMS with thyroid hormones
  I believe that there's been a stupendous cock-up that's hurt millions.
  Less Wrong should be interested, because it could be a real example of how bad inference can cause
  the cargo cult sciences to come to false conclusions.


I believe that I've come across a genuine puzzle, and I wonder if you can help me solve it. This problem is complicated, and subtle, and has confounded and defeated good people for forty years. And yet there are huge and obvious clues. No-one seems to have conducted the simple experiments which the clues suggest, even though many clever people have thought hard about it, and the answer to the problem would be very valuable. And so I wonder what it is that I am missing.


I am going to tell a story which rather extravagantly privileges a hypothesis that I have concocted from many different sources, but a large part of it is from the work of the late Doctor John C Lowe, an American chiropractor who claimed that he could cure Fibromyalgia.


I myself am drowning in confirmation bias to the point where I doubt my own sanity. Every time I look for evidence to disconfirm my hypothesis, I find only new reasons to believe. But I am utterly unqualified to judge. Three months ago I didn't know what an amino acid was. And so I appeal to wiser heads for help.


Crocker's Rules on this. I suspect that I am being the most spectacular fool, but I can't see why, and I'd like to know.


Setting the Scene


Chronic Fatigue Syndrome, Myalgic Encephalitis, and Fibromyalgia are 'new diseases'. There is considerable dispute as to whether they even exist, and if so how to diagnose them. They all seem to have a large number of possible symptoms, and in any given case, these symptoms may or may not occur with varying severity.


As far as I can tell, if someone claims that they're 'Tired All The Time', then a competent doctor will first of all check that they're getting enough sleep and are not unduly stressed, then rule out all of the known diseases that cause fatigue (there are a very lot!), and finally diagnose one of the three 'by exclusion', which means that there doesn't appear to be anything wrong, except that you're ill.


If widespread pain is one of the symptoms, it's Fibromyalgia Syndrome (FMS). If there's no pain, then it's CFS or ME. These may or may not be the same thing, but Myalgic Encephalitis is preferred by patients because it's greek and so sounds like a disease. Unfortunately Myalgic Encephalitis means 'hurty muscles brain inflammation', and if one had hurty muscles, it would be Fibromyalgia, and if one had brain inflammation, it would be something else entirely.


Despite the widespread belief that these are 'somatoform' diseases (all in the mind), the severity of them ranges from relatively mild (tired all the time, can't think straight), to devastating (wheelchair bound, can't leave the house, can't open one eye because the pain is too great).


All three seem to have come spontaneously into existence in the 1970s, and yet searches for the responsible infective agent have proved fruitless. Neither have palliative measures been discovered, apart from the tried and true method of telling the sufferers that it's all in their heads.


The only treatments that have proved effective are Cognitive Behavioural Therapy / Graded Exercise. A Cochrane Review reckoned that they do around 15% over placebo in producing a measurable alleviation of symptoms. I'm not very impressed. CBT/GE sound a lot like 'sports coaching', and I'm pretty sure that if we thought of 'Not Being Very Good at Rowing' as a somatoform disorder, then I could produce an improvement over placebo in a measurable outcome in ten percent of my victims without too much trouble.


But any book on CFS will tell you that the disease was well known to the Victorians, under the name of neurasthenia. The hypothesis that God lifted the curse of neurasthenia from the people of the Earth as a reward for their courage during the wars of the early twentieth century, while well supported by the clinical evidence, has a low prior probability.


We face therefore something of a mystery, and in the traditional manner of my people, a mystery requires a Just-So Story:


How It Was In The Beginning


In the dark days of Victoria, the brilliant physician William Miller Ord noticed large numbers of mainly female patients suffering from late-onset cretinism.


These patients, exhausted, tired, stupid, sad, cold, fat and emotional, declined steeply, and invariably died.


As any man of decent curiosity would, Dr Ord cut their corpses apart, and in the midst of the carnage noticed that the thyroid, a small butterfly-shaped gland in the throat, was wasted and shrunken.


One imagines that he may have thought to himself: "What has killed them may cure them."


After a few false starts and a brilliant shot in the dark by the brave George Redmayne Murray, Dr Ord secured a supply of animal thyroid glands (cheaply available at any butcher, sautée with nutmeg and basil) and fed them to his remaining patients, who were presumably by this time too weak to resist.


They recovered miraculously, and completely.


I'm not sure why Dr Ord isn't better known, since this appears to have been the first time in recorded history that something a doctor did had a positive effect.


Dr Ord's syndrome was named Ord's Thyroiditis, and it is now known to be an autoimmune disease where the patient's own antibodies attack and destroy the thyroid gland. In Ord's thyroiditis, there is no goiter.


A similar disease, where the thyroid swells to form a disfiguring deformity of the neck (goiter), was described by Hakaru Hashimoto in 1912 (who rather charmingly published in German), and as part of the war reparations of 1946 it was decided to confuse the two diseases under the single name of Hashimoto's Thyroiditis. Apart from the goiter, both conditions share a characteristic set of symptoms, and were easily treated with animal thyroid gland, with no complications.


Many years before, in 1835, a fourth physician, Robert James Graves, had described a different syndrome, now known as Graves' Disease, which has as its characteristic symptoms irritability, muscle weakness, sleeping problems, a fast heartbeat, poor tolerance of heat, diarrhoea, and weight loss. Unfortunately Dr Graves could not think how to cure his eponymous horror, and so the disease is still named after him.


The Horror Spreads


Victorian medicine being what it was, we can assume that animal glands were sprayed over and into any wealthy person unwise enough to be remotely ill in the vicinity of a doctor. I seem to remember a number of jokes about "monkey glands" in PG Wodehouse, and indeed a man might be tempted to assume that chimpanzee parts would be a good substitute for humans. Supply issues seem to have limited monkey glands to a few millionaires worried about impotence, and it may be that the corresponding procedure inflicted on their wives has come down to us as Hormone Replacement Therapy.


Certainly anyone looking a bit cold, tired, fat, stupid, sad or emotional is going to have been eating thyroids. We can assume that in a certain number of cases, this was just the thing, and I think it may also be safe to assume that a fair number of people who had nothing wrong with them at all died as a result of treatment, although the fact that animal thyroid is still part of the human food chain suggests it can't be that dangerous.


I mean seriously, these people use high pressure hoses to recover the last scraps of meat from the floors of slaughterhouses, they're not going to carefully remove all the nasty gristly throat-bits before they make ready meals, are they?


The Armour Sausage company, owner of extensive meat-packing facilities in Chicago, Illinois, and thus in possession of a large number of pig thyroids which, if not quite surplus to requirements, at the very least faced a market sluggish to non-existent as foodstuffs, brilliantly decided to sell them in freeze-dried form as a cure for whatever ails you.



Some Sort of Sanity Emerges, in a Decade not Noted for its Sanity


Around the time of the second world war, doctors became interested in whether their treatments actually helped, and an effort was made to determine what was going on with thyroids and the constellation of sadness that I will henceforth call 'hypometabolism', which is the set of symptoms associated with Ord's thyroiditis. Jumping the gun a little, I shall also define 'hypermetabolism' as the set of symptoms associated with Graves' disease.


The thyroid gland appeared to be some sort of metabolic regulator, in some ways analogous to a thermostat. In hypometabolism, every system of the body is running slow, and so it produces a vast range of bad effects, affecting almost every organ. Different sufferers can have very different symptoms, and so diagnosis is very difficult.


Dr Broda Barnes decided that the key symptom of hypometabolism was a low core body temperature. By careful experiment he established that in patients with no symptoms of hypometabolism the average temperature of the armpit on waking was 98 degrees Fahrenheit (or 36.6 Celsius). He believed that temperature variation of +/- 0.2 degrees Fahrenheit was unusual enough to merit diagnosis. He also seems to have believed, in the manner of the proverbial man with a hammer, that all human ailments without exception were caused by hypometabolism, and to have given freeze-dried thyroid to almost everyone he came into contact with, to see if it helped. A true scientist. Doctor Barnes became convinced that fully 40% of the population of America suffered from hypometabolism, and recommended Armour's Freeze Dried Pig Thyroid to cure America's ills.


In a brilliant stroke, Freeze Dried Pig's Thyroid was renamed 'Natural Dessicated Thyroid', which almost sounds like the sort of thing you might take in sound mind. I love marketing. It's so clever.


America being infested with religious lunatics, and Chicago being infested with nasty useless gristly bits of cow's throat, led almost inevitably to a second form of 'Natural Dessicated Thyroid' on the market.


Dr Barnes' hypometabolism test never seems to have caught on. There are several ways your temperature can go outside his 'normal' range, including fever (too hot), starvation (too cold), alcohol (too hot), sleeping under too many duvets (too hot), sleeping under too few duvets (too cold). Also mercury thermometers are a complete pain in the neck, and take ten minutes to get a sensible reading, which is a long time to lie around in bed carefully doing nothing so that you don't inadvertently raise your body temperature. To make the situation even worse, while men's temperature is reasonably constant, the body temperature of healthy young women goes up and down like the Assyrian Empire.


Several other tests were proposed. One of the most interesting is the speed of the Achilles Tendon Reflex, which is apparently super-fast in hypermetabolism, and either weirdly slow or has a freaky pause in it if you're running a bit cold. Drawbacks of this test include 'It's completely subjective, give me something with numbers in it', and 'I don't seem to have one, where am I supposed to tap the hammer-thing again?'.


By this time, neurasthenia was no longer a thing. In the same way that spiritualism was no longer a thing, and the British Empire was no longer a thing.


As far as we know, Chronic Fatigue Syndrome was not a thing either, and neither was Fibromyalgia (which is just Chronic Fatigue Syndrome but it hurts), nor Myalgic Encephalitis. There was something called 'Myalgic Neurasthenia' in 1934, but it seems to have been a painful infectious disease and they thought it was polio.



Finally, Science


It turned out that the purpose of the thyroid gland is to make hormones which control the metabolism. It takes in the amino acid tyrosine, and it takes in iodine. It releases Thyroglobulin, mono-iodo-tyrosine (MIT), di-iodo-tyrosine (DIT), thyroxine (T4) and triiodothyronine (T3) into the blood. The chemistry is interesting but too complicated to explain in a just-so story.


I believe that we currently think that thyroglobulin, MIT and DIT are simply by-products of the process that makes T3 and T4.


T3 is the hormone. It seems to control the rate of metabolism in all cells. T4 has something of the same effect, but is much less active, and called a 'prohormone'. Its main purpose seems to be to be deiodinated to make more T3. This happens outside the thyroid gland, in the other parts of the body ('peripheral conversion'). I believe mainly in the liver, but to some extent in all cells.


Our forefathers knew about thyroxine (T4, or thyronine-with-four-iodines-attached), and triiodothyronine (T3, or thyronine-with-three-iodines-attached)


It seems to me that just from the names, thyroxine was the first one to be discovered. But I'm not sure about that. You try finding a history-of-endocrinology website. At any rate they seem to have known about T4 and T3 fairly early on.


The mystery of Graves', Ord's and Hashimoto's thyroid diseases was explained.


Ord's and Hashimoto's are diseases where the thryoid gland under-produces (hypothyroidism). The metabolism of all cells slows down. As might be expected, this causes a huge number of effects, which seem to manifest differently in different sufferers.


Graves' disease is caused by the thyroid gland over-producing (hyperthyroidism). The metabolism of all cells speeds up. Again, there are a lot of possible symptoms.


All three are thought to be autoimmune diseases. Some people think that they may be different manifestations of the same disease. They are all fairly common.


Dessicated thryoid cures hypothyroidism because the ground-up thyroids contain T4 and T3, as well as lots of thyroglobulin, MIT and DIT, and they are absorbed by the stomach. They get into the blood and speed up the metabolism of all cells. By titrating the dose carefully you can restore roughly the correct levels of the thyroid hormones in all tissues, and the patient gets better. (Titration is where you change something carefully until you get it right)


The theory has considerable explanatory power. It explains cretinism, which is caused either by a genetic disease, or by iodine deficiency in childhood. If you grow up in an iodine deficient area, then your growth is stunted, your brain doesn't develop properly, and your thyroid gland may become hugely enlarged. Presumably because the brain is desperately trying to get it to produce more thyroid hormones, and it responds by swelling.


Once upon a time, this swelling (goitre) was called 'Derbyshire Neck'. I grew up near Derbyshire, and I remember an old rhyme: "Derbyshire born, Derbyshire bred, strong in the arm, and weak in the head". I always thought it was just an insult. Maybe not. Cretinism was also popular in the Alps, and there is a story of an English traveller in Switzerland of whom it was remarked that he would have been quite handsome if only he had had a goitre. So it must have been very common there.


But at this point I am *extremely suspicious*. The thyroid/metabolic regulation system is ancient (universal in vertebrates, I believe), crucial to life, and it really shouldn't just go wrong. We should suspect either an infectious cause, or a recent environmental influence which we haven't had time to adjust to, an evolved defence against an infectious disease, or just possibly, a recently evolved but as yet imperfect defence against a less recent environmental change.


(Cretinism in particular is very strange. Presumably animals in iodine-deficient areas aren't cretinous, and yet they should be. Perhaps a change to a farming from a hunter-gatherer lifestyle has increased our dependency on iodine from crops, which crops have sucked what little iodine occurs naturally out of the soil?)


It's also not entirely clear to me what the thyroid system is *for*. If there's just a particular rate that cells are supposed to run at, then why do they need a control signal to tell them that? I could believe that it was a literal thermostat, designed to keep the body temperature constant at the best speed for the various biological reactions, but it's universal in *vertebrates*. There are plenty of vertebrates which don't keep a constant temperature.



The Fall of Dessicated Thyroid


There turned out to be some problems with Natural Dessicated Thyroid (NDT).


Firstly, there were many competing brands and types, and even if you stuck to one brand the quality control wasn't great, so the dose you'd be taking would have been a bit variable.


Secondly, it's fucking pig's thyroid from an abattoir. It could have all sorts of nasty things in it. Also, ick.


Thirdly, it turned out that pigs made quite a lot more T3 in their thyroids than humans do. It also seems that T3 is better absorbed by the gut than T4 is, so someone taking NDT to compensate for their own underproduction will have too much of the active hormone compared to the prohormone. That may not be good news.


With the discovery of 'peripheral conversion', and the possibility of cheap clean synthesis, it was decided that modern scientific thyroid treatment would henceforth be by synthetic T4 (thyroxine) alone. The body would make its own T3 from the T4 supply.


Alarm bells should be ringing at this point. Apart from the above points, I'm not aware of any great reason for the switch from NDT to thyroxine in the treatment of hypothyroidism, but it seems to have been pretty much universal, and it seems to have worked.


Aware of the lack of T3, doctors compensated by giving people more T4 than was in their pig-thyroid doses. And there don't seem to have been any complaints.


Over the years, NDT seems to have become a crazy fringe treatment despite there not being any evidence against it. It's still a legal prescription drug, but in America it's only prescribed by eccentrics. In England a doctor prescribing it would be, at the very least, summoned to explain himself before the GMC.


However, since it was (a) sold over the counter for so many years, and (b) part of the food chain, it is still perfectly legal to sell as a food supplement in both countries, as long as you don't make any medical claims for it. And the internet being what it is, the prescription-only synthetic hormones T3 and T4 are easily obtained without a prescription. These are extremely powerful hormones which have an effect on metabolism. If 'body-builders' and sports cheats aren't consuming all three in vast quantities, I am a Dutchman.


The Clinical Diagnosis of Hypothyroidism


We pass now to the beginning of the 1970s.


Hypothyroidism is ferociously difficult to diagnose. People complain of 'Tired All The Time' well, ... all the time, and it has literally hundreds of causes.


And it must be diagnosed correctly! If you miss a case of hypothyroidism, your patient is likely to collapse and possibly die at some point in the medium-term future. If you diagnose hypothyroidism where it isn't, you'll start giving the poor bugger powerful hormones which he doesn't need and *cause* hypermetabolism.


The last word in 'diagnosis by symptoms' was the absolutely excellent paper:


Statistical Methods Applied To The Diagnosis Of Hypothyroidism by W. Z. Billewicz et al.


Connoisseurs will note the clever and careful application of 'machine learning' techniques, before there were machines to learn!


One important thing to note is that this is a way of separating hypothyroid cases from other cases of tiredness at the point where people have been referred by their GP to a specialist at a hospital on suspicion of hypothyroidism. That changes the statistics remarkably. This is *not* a way of diagnosing hypothyroidism in the general population. But if someone's been to their GP (general practitioner, the doctor that a British person likely makes first contact with) and their GP has suspected their thryoid function might be inadequate, this test should probably still work.


For instance, they consider Physical Tiredness, Mental Lethargy, Slow Cerebration, Dry Hair, and Muscle Pain, the classic symptoms of hypothyroidism, present in most cases, to be indications *against* the disease.


That's because if you didn't have these things, you likely wouldn't have got that far. So in the population they're seeing (of people whose doctor suspects they might be hypothyroid), they're not of great value either way, but their presence is likely the reason why the person's GP has referred them even though they've really got iron-deficiency anaemia or one of the other causes of fatigue.


In their population, the strongest indicators are 'Ankle Jerk' and 'Slow Movements', subtle hypothyroid symptoms which aren't likely to be present in people who are fatigued for other reasons.


But this absolutely isn't a test you should use for population screening! In the general population, the classic symptoms are strong indicators of hypothyroidism.


Probability Theory is weird, huh?


Luckily, there were lab tests for hypothyroidism too, but they were expensive, complicated, annoying and difficult to interpret. Billewicz et al used them to calibrate their test, and recommend them for the difficult cases where their test doesn't give a clear answer.


And of course, the final test is to give them thyroid treatment and see whether they get better. If you're not sure, go slow, watch very carefully and look for hyper symptoms.


Overconfidence is definitely the way to go. If you don't diagnose it and it is, that's catastrophe. If it isn't, but you diagnose it anyway, then as long as you're paying attention the hyper symptoms are easy enough to spot, and you can pull back with little harm done.


A Better Way


It should be obvious from the above that the diagnosis of hypothyroidism by symptoms is absolutely fraught with complexity, and very easy to get wrong, and if you get it wrong the bad way, it's a disaster. Doctors were absolutely screaming for a decisive way to test for hypothyroidism.


Unfortunately, testing directly for the levels of thyroid hormones is very difficult, and the tests of the 1960s weren't accurate enough to be used for diagnosis.


The answer came from an understanding of how the thyroid regulatory system works, and the development of an accurate blood test for a crucial signalling hormone.


Three structures control the level of thyroid hormones in the blood.


The thyroid gland produces the hormones and secretes them into the blood.


Its activity is controlled by the hormone thyrotropin, or Thyroid Signalling Hormone (TSH). Lots of TSH works the thyroid hard. In the absence of TSH the thyroid relaxes but doesn't switch off entirely. However the basal level of thyroid activity in the absence of TSH is far too low.


TSH is controlled by the pituitary gland, a tiny structure attached to the brain.


The pituitary itself is controlled, via Thyroid Releasing Hormone (TRH), by the hypothalamus, which is part of the brain.


This was thought to be a classic example of a feedback control system.




It turns out that the level of thyrotropin TSH in the blood is exquisitely sensitive to the levels of thyroid hormones in the blood.


Administer thyroid hormone to a patient and their TSH level will rapidly adjust downwards by an easily detectable amount.




In hypothyroidism, where the thyroid has failed, the body will be desperately trying to produce more thyroid hormones, and the TSH level will be extremely high.


In Graves' Disease, this theory says, where the thyroid has grown too large, and the metabolism is running damagingly fast, the body will be, like a central bank trying to stimulate growth in a deflationary economy by reducing interest rates, 'pushing on a piece of string'. TSH will be undetectable.


The original TSH test was developed in 1965, by the startlingly clever method of radio-immuno-assay.


[For reasons that aren't clear to me, rather than being expressed in grams/litre, or mols/litre, the TSH test is expressed in 'international units/liter'. But I don't think that that's important]


A small number of people in whom there was no suspicion of thyroid disease were assessed, and the 'normal range' of TSH was calculated.


Again, 'endocrinology history' resources are not easy to find, but the first test was not terribly sensitive, and I think originally hyperthyroidism was thought to result in a complete absence of TSH, and that the highest value considered normal was about 4 (milli-international-units/liter).


This apparently pretty much solved the problem of diagnosing thyroid disorders.




It's no longer necessary to diagnose hypo- and hyper-thyroidism by symptoms. It was error prone anyway, and the question is easily decided by a cheap and simple test.


Natural Dessicated Thyroid is one with Nineveh and Tyre.


No doctor trained since the 1980s knows much about hypothyroid symptoms.


Medical textbooks mention them only in passing, as an unweighted list of classic symptoms. You couldn't use that for diagnosis of this famously difficult disease.


If you suspect hypothyroidism, you order a TSH test. If the value of TSH is very low, that's hyperthyroidism. If the value is very high then that's hypothyroidism. Otherwise you're 'euthyroid' (greek again, good-thyroid), and your symptoms are caused by some other problem.


The treatment for hyperthyroidism is to damage the thyroid gland. There are various ways. This often results in hypothyroidism. *For reasons that are not terribly well understood*.


The treatment for hypothyroidism is to give the patient sufficient thyroxine (T4) to cause TSH levels to come back into their normal range.


The conditions hyperthyroidism and hypothyroidism are now *defined* by TSH levels.


Hypothyroidism, in particular, a fairly common disease, is considered to be such a solved problem that it's usually treated by the GP, without involving any kind of specialist.



Present Day


It was found that the traditional amount of thyroxine (T4) administered to cure hypothyroid patients, was in fact too high. The amount of T4 that had always been used to replace the hormones that had once been produced by a thyroid gland now dead, destroyed, or surgically removed appeared now to be too much. That amount causes suppression of TSH to below its normal range. The brain, theory says, is asking for the level to be reduced.


The amount of T4 administered in such cases (there are many) has been reduced by a factor of around two, to the level where it produces 'normal' TSH levels in the blood. Treatment is now titrated to produce the normal levels of TSH.


TSH tests have improved enormously since their introduction, and are on their third or fourth generation. The accuracy of measurement is very good indeed.


It's now possible to detect the tiny remaining levels of TSH in overtly hyperthyroid patients, so hyperthyroidism is also now defined by the TSH test.


In England, the normal range is 0.35 to 5.5. This is considered to be the definition of 'euthyroidism'. If your levels are normal, you're fine.


If you have hypothyroid symptoms but a normal TSH level, then your symptoms are caused by something else. Look for Anaemia, look for Lyme Disease. There are hundreds of other possible causes. Once you rule out all the other causes, then it's the mysterious CFS/FMS/ME, for which there is no cause and no treatment.


If your doctor is very good, very careful and very paranoid, he might order tests of the levels of T4 and T3 directly. But actually the direct T4 and T3 tests, although much more accurate than they were in the 1960s, are quite badly standardised, and there's considerable controversy about what they actually measure. Different assay techniques can produce quite different readings. They're expensive. It's fairly common, and on the face of it perfectly reasonable, for a lab to refuse to conduct the T3 and T4 tests if the TSH level is normal.


It's been discovered that quite small increases in TSH actually predict hypothyroidism. Minute changes in thyroid hormone levels, which don't produce symptoms, cause detectable changes in the TSH levels. Normal, but slightly high values of TSH, especially in combination with the presence of thyroid related antibodies (there are several types), indicate a slight risk of one day developing hypothyroidism.


There's quite a lot of controversy about what the normal range for TSH actually is. Many doctors consider that the optimal range is 1-2, and target that range when administering thyroxine. Many think that just getting the value in the normal range is good enough. None of this is properly understood, to understate the case rather dramatically.


There are new categories, 'sub-clinical hypothyroidism' and 'sub-clinical hyperthyroidism', which are defined by abnormal TSH tests in the absence of symptoms. There is considerable controversy over whether it is a good idea to treat these, in order to prevent subtle hormonal imbalances which may cause difficult-to-detect long term problems.


Everyone is a little concerned about accidentally over-treating people, (remember that hyperthyroidism is now defined by TSH<0.35).


Hyperthyroidism has long been associated with Atrial Fibrillation (a heart problem), and Osteoporosis, both very nasty things. A large population study in Denmark recently revealed that there is a greater incidence of Atrial Fibrillation in sub-clinical hyperthyroidism, and that hypothyroidism actually has a 'protective effect' against Atrial Fibrillation.


It's known that TSH has a circadian rhythm, higher in the early morning, lower at night. This makes the test rather noisy, as your TSH level can be doubled or halved depending on what time of day you have the blood drawn.


But the big problems of the 1960s and 1970s are completely solved. We are just tidying up the details.




Many hypothyroid patients complain that they suffer from 'Tired All The Time', and have some of the classic hypothyroid symptoms, even though their TSH levels have been carefully adjusted to be in the normal range.


I've no idea how many, but opinions range from 'the great majority of patients are perfectly happy' to 'around half of hypothyroid sufferers have hypothyroid symptoms even though they're being treated'.


The internet is black with people complaining about it, and there are many books and alternative medicine practitioners trying to cure them, or possibly trying to extract as much money as possible from people in desperate need of relief from an unpleasant, debilitating and inexplicable malaise.




Not good data, to be sure. But if ten people mention to you in passing that the sun is shining, you are a damned fool if you think you know nothing about the weather.


It's known that TSH ranges aren't 'normally distributed' (in the sense of Gauss/the bell curve distribution) in the healthy population.


If you log-transform them, they do look a bit more normal.


The American Academy of Clinical Biochemists, in 2003, decided to settle the question once and for all. They carefully screened out anyone with even the slightest sign that there might be anything wrong with their thyroid at all, and measured their TSH very accurately.


In their report, they said (this is a direct quote):


In the future, it is likely that the upper limit of the serum TSH euthyroid reference range will be reduced to 2.5 mIU/L because >95% of rigorously screened normal euthyroid volunteers have serum TSH values between 0.4 and 2.5 mIU/L.


Many other studies disagree, and propose wider ranges for normal TSH.


But if the AACB report were taken seriously, it would lead to diagnosis of hypothyroidism in vast numbers of people who are perfectly healthy! In fact the levels of noise in the test would put people whose thyroid systems are perfectly normal in danger of being diagnosed and inappropriately treated.


For fairly obvious reasons, biochemists have been extremely, and quite properly, reluctant to take the report of their own professional body seriously. And yet it is hard to see where the AACB have gone wrong in their report.


Neurasthenia is back.


A little after the time of the introduction of the TSH test, new forms of 'Tired All The Time' were discovered.


As I said, CFS and ME are just two names for the same thing. Fibromyalgia Syndrome (FMS) is much worse, since it is CFS with constant pain, for which there is no known cause and from which there is no relief. Most drugs make it worse.


But if you combine the three things (CFS/ME/FMS), then you get a single disease, which has a large number of very non-specific symptoms.


These symptoms are the classic symptoms of 'hypometabolism'. Any doctor who has a patient who has CFS/ME/FMS and hasn't tested their thyroid function is *de facto* incompetent. I think the vast majority of medical people would agree with this statement.


And yet, when you test the TSH levels in CFS/ME/FMS sufferers, they are perfectly normal.


All three/two/one are appalling, crippling, terrible syndromes which ruin people's lives. They are fairly common. You almost certainly know one or two sufferers. The suffering is made worse by the fact that most people believe that they're psychosomatic, which is a polite word for 'imaginary'.


And the people suffering are mainly middle-aged women. Middle-aged women are easy to ignore. Especially stupid middle-aged women who are worried about being overweight and obviously faking their symptoms in order to get drugs which are popularly believed to induce weight loss. It's clearly their hormones. Or they're trying to scrounge up welfare benefits. Or they're trying to claim insurance. Even though there's nothing wrong with them and you've checked so carefully for everything that it could possibly be.


But it's not all middle aged women. These diseases affect men, and the young. Sometimes they affect little children. Exhaustion, stupidity, constant pain. Endless other problems as your body rots away. Lifelong. No remission and no cure.


And I have Doubts of my Own


And I can't believe that careful, numerate Billewicz and his co-authors would have made this mistake, but I can't find where the doctors of the 1970s checked for the sensitivity of the TSH test.


Specificity, yes. They tested a lot of people who hadn't got any sign of hypothyroidism for TSH levels. If you're well, then your TSH level will be in a narrow range, which may be 0-6, or it may be 1-2. Opinions are weirdly divided on this point in a hard to explain way.


But Sensitivity? Where's the bit where they checked for the other arm of the conditional?


The bit where they show that no-one who's suffering from hypometabolism, and who gets well when you give them Dessicated Thyroid, had, on first contact, TSH levels outside the normal range.


If you're trying to prove A <=> B, you can't just prove A => B and call it a day. You couldn't get that past an A-level maths student. And certainly anyone with a science degree wouldn't make that error. Surely? I mean you shouldn't be able to get that past anyone who can reason their way out of a paper bag.


I'm going to say this a third time, because I think it's important and maybe it's not obvious to everyone.


If you're trying to prove that two things are the same thing, then proving that the first one is always the second one is not good enough.




It's possible, of course, that I've missed this bit. As I say, 'History of Endocrinology' is not one of those popular, fashionable subjects that you can easily find out about.


I wonder if they just assumed that the thyroid system was a thermostat. The analogy is still common today.


But it doesn't look like a thermostat to me. The thyroid system with its vast numbers of hormones and transforming enzymes is insanely, incomprehensibly complicated. And very poorly understood. And evolutionarily ancient. It looks as though originally it was the system that coordinated metamorphosis. Or maybe it signalled when resources were high enough to undergo metamorphosis. But whatever it did originally in our most ancient ancestors, it looks as though the blind watchmaker has layered hack after hack after hack on top of it on the way to us.


Only the thyroid originally, controlling major changes in body plan in tiny creatures that metamorphose.


Of course, humans metamorphose too, but it's all in the womb, and who measures thyroid levels in the unborn when they still look like tiny fish?


And of course, humans undergo very rapid growth and change after we are born. Especially in the brain. Baby horses can walk seconds after they're born. Baby humans take months to learn to crawl. I wonder if that's got anything to do with cretinism.


And I'm told that baby humans have very high hormone levels. I wonder why they need to be so hot? If it's a thermostat, I mean.


But then on top of the thyroid, the pituitary. I wonder what that adds to the system? If the thyroid's just a thermostat, or just a device for keeping T4 levels constant, why can't it just do the sensing itself?


What evolutionary process created the pituitary control over the thyroid? Is that the thermostat bit?


And then the hypothalamus, controlling the pituitary. Why? Why would the brain need to set the temperature when the ideal temperature of metabolic reactions is always 37C in every animal? That's the temperature everything's designed for. Why would you dial it up or down, to a place where the chemical reactions that you are don't work properly?


I can think of reasons why. Perhaps you're hibernating. Many of our ancestors must have hibernated. Maybe it's a good idea to slow the metabolism sometimes. Perhaps to conserve your fat supplies. Your stored food.


Perhaps it's a good idea to slow the metabolism in times of famine?


Perhaps the whole calories in/calories out thing is wrong, and people whose energy expenditure goes over their calorie intake have slow metabolisms, slowly sacrificing every bodily function including immune defence in order to avoid starvation.


I wonder at the willpower that could keep an animal sane in that state. While its body does everything it can to keep its precious fat reserves high so that it can get through the famine.


And then I remember about Anorexia Nervosa, where young women who want to lose weight starve themselves to the point where they no longer feel hungry at all. Another mysterious psychological disease that's just put down to crazy females. We really need some female doctors.


And I remember about Seth Robert's Shangri-La Diet, that I tried, to see if it worked, some years ago, just because it was so weird, where by eating strange things, like tasteless oil and raw sugar, you can make your appetite disappear, and lose weight. It seemed to work pretty well, to my surprise. Seth came up with it while thinking about rats. And apparently it works on rats too. I wonder why it hasn't caught on.


It seems, my female friends tell me, that a lot of diets work well for a bit, but then after a few weeks the effect just stops. If we think of a particular diet as a meme, this would seem to be its infectious period, where the host enthusiastically spreads the idea.


And I wonder about the role of the thyronine de-iodinating enzymes, and the whole fantastically complicated process of stripping the iodines and the amino acid bits from thyroxine in various patterns that no-one understands, and what could be going on there if the thyroid system were just a simple thermostat.


And I wonder about reports I am reading where elite athletes are finding themselves suffering from hypothyroidism in numbers far too large to be credible, if it wasn't, say, a physical response to calorie intake less than calorie output.


I've been looking ever so hard to find out why the TSH test, or any of the various available thyroid blood tests are a good way to assess the function of this fantastically complicated and very poorly understood system.


But every time I look, I just come up with more reasons to believe that they don't tell you very much at all.



The Mystery


Can anyone convince me that the converse arm has been carefully checked?


That everyone who's suffering from hypometabolism, and who gets well when you give them Dessicated Thyroid, has, before you fix them, TSH levels outside the normal range.


In other words, that we haven't just thrown, though carelessness, a long standing, perfectly safe, well tested treatment, for a horrible disabling disease that often causes excruciating pain, that the Victorians knew how to cure, and that the people of the 1950s and 60s routinely cured, away.

Spreading rationality through engagement with secular groups

15 Gleb_Tsipursky 19 January 2016 11:19PM

The Less Wrong meetup in Columbus, OH is very oriented toward popularizing rationality for a broad audience (in fact, Intentional Insights sprang from this LW meetup). We've found that doing in-person presentations for secular groups is an excellent way of attracting new people to rationality, and have been doing that for a couple of years now, through a group called "Columbus Rationality" as part of the local branch of the American Humanist Association. Here's a blog post I just published about this topic.


Most importantly for anyone who is curious with experimenting doing something like this, we at Intentional Insights have put together a “Rationality” group starter package, which includes two blog posts describing “Rationality” events, three videos, a facilitator’s guide, an introduction guide, and a feedback sheet. We've been working on this starter package for about 9 months, and finally it's in a shape that we think it's ready for use. Hope this is helpful for any LWs who want to do something similar with a secular group where you live. You can also get in touch with us at to get connected to current participants in “Columbus Rationality” who can give you tips on setting up such a group in your own locale.

[Link] AlphaGo: Mastering the ancient game of Go with Machine Learning

14 ESRogs 27 January 2016 09:04PM

DeepMind's go AI, called AlphaGo, has beaten the European champion with a score of 5-0. A match against top ranked human, Lee Se-dol, is scheduled for March.


Games are a great testing ground for developing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. Creating programs that are able to play games better than the best humans has a long history


But one game has thwarted A.I. research thus far: the ancient game of Go.

Beware surprising and suspicious convergence

14 Thrasymachus 24 January 2016 07:13PM


Imagine this:

Oliver: … Thus we see that donating to the opera is the best way of promoting the arts.

Eleanor: Okay, but I’m principally interested in improving human welfare.

Oliver: Oh! Well I think it is also the case that donating to the opera is best for improving human welfare too.

Generally, what is best for one thing is usually not the best for something else, and thus Oliver’s claim that donations to opera are best for the arts and human welfare is surprising. We may suspect bias: that Oliver’s claim that the Opera is best for the human welfare is primarily motivated by his enthusiasm for opera and desire to find reasons in favour, rather than a cooler, more objective search for what is really best for human welfare.

The rest of this essay tries to better establish what is going on (and going wrong) in cases like this. It is in three parts: the first looks at the ‘statistics’ of convergence - in what circumstances is it surprising to find one object judged best by the lights of two different considerations? The second looks more carefully at the claim of bias: how it might be substantiated, and how it should be taken into consideration. The third returns to the example given above, and discusses the prevalence of this sort of error ‘within’ EA, and what can be done to avoid it.

Varieties of convergence

Imagine two considerations, X and Y, and a field of objects to be considered. For each object, we can score it by how well it performs by the lights of the considerations of X and Y. We can then plot each object on a scatterplot, with each axis assigned to a particular consideration. How could this look?


At one extreme, the two considerations are unrelated, and thus the scatterplot shows no association. Knowing how well an object fares by the lights of one consideration tells you nothing about how it fares by the lights of another, and the chance that the object that scores highest on consideration X also scores highest on consideration Y is very low. Call this no convergence.

At the other extreme, considerations are perfectly correlated, and the ‘scatter’ plot has no scatter, but rather a straight line. Knowing how well an object fares by consideration X tells you exactly how well it fares by consideration Y, and the object that scores highest on consideration X is certain to be scored highest on consideration Y. Call this strong convergence.

In most cases, the relationship between two considerations will lie between these extremes: call this weak convergence. One example is there being a general sense of physical fitness, thus how fast one can run and how far one can throw are somewhat correlated. Another would be intelligence: different mental abilities (pitch discrimination, working memory, vocabulary, etc. etc.) all correlate somewhat with one another.

More relevant to effective altruism, there also appears to be weak convergence between different moral theories and different cause areas. What is judged highly by (say) Kantianism tends to be judged highly by Utilitarianism: although there are well-discussed exceptions to this rule, both generally agree that (among many examples) assault, stealing, and lying are bad, whilst kindness, charity, and integrity are good.(1) In similarly broad strokes what is good for (say) global poverty is generally good for the far future, and the same applies for between any two ‘EA’ cause areas.(2)

In cases of weak convergence, points will form some some sort of elliptical scatter, and knowing how an object scores on X does tell you something about how well it scores on Y. If you know that something scores highest for X, your expectation of how it scores for Y should go upwards, and the chance of it also scores highest for Y should increase. However, the absolute likelihood of it being best for X and best for Y remains low, for two main reasons:


Trade-offs: Although consideration X and Y are generally positively correlated, there might be a negative correlation at the far tail, due to attempts to optimize for X or Y  at disproportionate expense for Y or X. Although in the general population running and throwing will be positively correlated with one another, elite athletes may optimize their training for one or the other, and thus those who specialize in throwing and those who specialize in running diverge. In a similar way, we may think believe there is scope for similar optimization when it comes to charities or cause selection.


Chance: (c.f.) Even in cases where there are no trade-offs, as long as the two considerations are somewhat independent, random fluctuations will usually ensure the best by consideration X will not be best by consideration Y. That X and Y only weakly converge implies other factors matter for Y besides X. For the single object that is best for X, there will be many more not best for X (but still very good), and out of this large number of objects it is likely one will do very well on these other factors to end up the best for Y overall. Inspection of most pairs of correlated variables confirms this: Those with higher IQ scores tend to be wealthier, but the very smartest aren’t the very wealthiest (and vice versa), serving fast is good for tennis, but the very fastest servers are not the best players (and vice versa), and so on. Graphically speaking, most scatter plots bulge in an ellipse rather than sharpen to a point.

The following features make a single object scoring highest on two considerations more likely:

  1. The smaller the population of objects. Were the only two options available to OIiver and Eleanor, “Give to the Opera” and “Punch people in the face”, it is unsurprising the former comes top for many considerations.
  2. The strength of their convergence. The closer the correlation moves to collinearity, the less surprising finding out something is best for both. It is less surprising the best at running 100m is best at running 200m, but much more surprising if it transpired they threw discus best too.
  3. The ‘wideness’ of the distribution. The heavier the tails, the more likely a distribution is to be stretched out and ‘sharpen’ to a point, and the less likely bulges either side of the regression line are to be populated. (I owe this to Owen Cotton-Barratt)

In the majority of cases (including those relevant to EA), there is a large population of objects, weak convergence and (pace the often heavy-tailed distributions implicated) it is uncommon for one thing to be best b the lights of two weakly converging considerations.

Proxy measures and prediction

In the case that we have nothing to go on to judge what is good for Y save knowing what is good for X. Our best guess for what is best for Y is what is best for X. Thus the Opera is the best estimate for what is good for human welfare, given only the information that it is best for the arts. In this case, we should expect our best guess to be very likely wrong. Although it is more likely than any similarly narrow alternative (“donations to the opera, or donations to X-factor?”) Its absolute likelihood relative to the rest of the hypothesis space is very low (“donations to the opera, or something else?”).

Of course, we usually have more information available. Why not search directly for what is good for human welfare, instead of looking at what is good for the arts? Often searching for Y directly rather than a weakly converging proxy indicator will do better: if one wants to select a relay team, selecting based on running speed rather than throwing distance looks a better strategy. Thus finding out a particular intervention (say the Against Malaria Foundation) comes top when looking for what is good for human welfare provides much stronger evidence it is best for human welfare than finding out the opera comes top when looking for what is good for a weakly converging consideration.(3)

Pragmatic defeat and Poor Propagation

Eleanor may suspect bias is driving Oliver’s claim on behalf of the opera. The likelihood of the opera being best for both the arts and human welfare is low, even taking their weak convergence into account. The likelihood of bias and motivated cognition colouring Oliver’s judgement is higher, especially if Oliver has antecedent commitments to the opera. Three questions: 1) Does this affect how she should regard Oliver’s arguments? 2) Should she keep talking to Oliver, and, if she does, should she suggest to him he is biased? 3) Is there anything she can do to help ensure she doesn’t make a similar mistake?

Grant Eleanor is right that Oliver is biased. So what? It entails neither he is wrong nor the arguments he offers in support are unsound: he could be biased and right. It would be a case of the genetic fallacy (or perhaps ad hominem) to argue otherwise. Yet this isn’t the whole story: informal ‘fallacies’ are commonly valuable epistemic tools; we should not only attend to the content of arguments offered, but argumentative ‘meta-data’ such as qualities of the arguer as well.(4)

Consider this example. Suppose you are uncertain whether God exists. A friendly local Christian apologist offers the reasons why (in her view) the balance of reason clearly favours Theism over Atheism. You would be unwise to judge the arguments purely ‘on the merits’: for a variety of reasons, the Christian apologist is likely to have slanted the evidence she presents to favour Theism; the impression she will give of where the balance of reason lies will poorly track where the balance of reason actually lies. Even if you find her arguments persuasive, you should at least partly discount this by what you know of the speaker.

In some cases it may be reasonable to dismiss sources ‘out of hand’ due to their bias without engaging on the merits: we may expect the probative value of the reasons they offer, when greatly attenuated by the anticipated bias, to not be worth the risks of systematic error if we mistake the degree of bias (which is, of course, very hard to calculate); alternatively, it might just be a better triage of our limited epistemic resources to ignore partisans and try and find impartial sources to provide us a better view of the balance of reason.

So: should Eleanor stop talking to Oliver about this topic? Often, no. First (or maybe zeroth), there is the chance she is mistaken about Oliver being biased, and further discussion would allow her to find this out. Second, there may be tactical reasons: she may want to persuade third parties to their conversation. Third, she may guess further discussion is the best chance of persuading Oliver, despite the bias he labours under. Fourth, it may still benefit Eleanor: although bias may undermine the strength of reasons Oliver offers, they may still provide her with valuable information. Being too eager to wholly discount what people say based on assessments of bias (which are usually partly informed by object level determinations of various issues) risks entrenching one’s own beliefs.

Another related question is whether it is wise for Eleanor to accuse Oliver of bias. There are some difficulties. Things that may bias are plentiful, thus counter-accusations are easy to make: (“I think you’re biased in favour of the opera due to your prior involvement”/”Well, I think you’re biased against the opera due to your reductionistic and insufficiently holistic conception of the good.”) They are apt to devolve into the personally unpleasant (“You only care about climate change because you are sleeping with an ecologist”) or the passive-aggressive (“I’m getting really concerned that people who disagree with me are offering really bad arguments as a smokescreen for their obvious prejudices”). They can also prove difficult to make headway on. Oliver may assert his commitment was after his good-faith determination that opera really was best for human welfare and the arts. Many, perhaps most, claims like these are mistaken, but it can be hard to tell (or prove) which.(5)

Eleanor may want to keep an ‘internal look out’ to prevent her making a similar mistake to Oliver. One clue is a surprising lack of belief propagation: we change our mind about certain matters, and yet our beliefs about closely related matters remain surprisingly unaltered. In most cases where someone becomes newly convinced of (for example) effective altruism, we predict this should propagate forward and effect profound changes to their judgements on where to best give money or what is the best career for them to pursue. If Eleanor finds in her case that this does not happen, that in her case her becoming newly persuaded by the importance of the far future does not propagate forward to change her career or giving, manifesting instead in a proliferation of ancillary reasons that support her prior behaviour, she should be suspicious of this surprising convergence between what she thought was best then, and what is best now under considerably different lights.

EA examples

Few Effective altruists seriously defend the opera as a leading EA cause. Yet the general problem of endorsing surprising and suspicious convergence remains prevalent. Here are some provocative examples:

  1. The lack of path changes. Pace personal fit, friction, sunk capital, etc. it seems people who select careers on ‘non EA grounds’ often retain them after ‘becoming’ EA, and then provide reasons why (at least for them) persisting in their career is the best option.
  2. The claim that, even granting the overwhelming importance of the far future, it turns out that animal welfare charities are still the best to give to, given their robust benefits, positive flow through effects, and the speculativeness of far future causes.
  3. The claim that, even granting the overwhelming importance of the far future, it turns out that global poverty charities are still the best to give to, given their robust benefits, positive flow through effects, and the speculativeness of far future causes.
  4. Claims from enthusiasts of Cryonics or anti-aging research that this, additional to being good for their desires for an increased lifespan, is also a leading ‘EA’ buy.
  5. A claim on behalf of veganism that it is the best diet for animal welfare and for the environment and for individual health and for taste.

All share similar features: one has prior commitments to a particular cause area or action. One becomes aware of a new consideration which has considerable bearing on these priors. Yet these priors don’t change, and instead ancillary arguments emerge to fight a rearguard action on behalf of these prior commitments - that instead of adjusting these commitments in light of the new consideration, one aims to co-opt the consideration to the service of these prior commitments.

Naturally, that some rationalize doesn’t preclude others being reasonable, and the presence of suspicious patterns of belief doesn’t make them unwarranted. One may (for example) work in global poverty due to denying the case for the far future (via a person affecting view, among many other possibilities) or aver there are even stronger considerations in favour (perhaps an emphasis on moral uncertainty and peer disagreement and therefore counting the much stronger moral consensus around stopping tropical disease over (e.g.) doing research into AI risk as the decisive consideration).

Also, for weaker claims, convergence is much less surprising. Were one to say on behalf of veganism: “It is best for animal welfare, but also generally better for the environment and personal health than carnivorous diets. Granted, it does worse on taste, but it is clearly superior all things considered”, this seems much less suspect (and also much more true) than the claim it is best by all of these metrics. It would be surprising if the optimal diet for personal health did not include at least some animal products.

Caveats aside, though, these lines of argument are suspect, and further inspection deepens these suspicions. In sketch, one first points to some benefits the prior commitment has by the lights of the new consideration (e.g. promoting animal welfare promotes antispeciesism, which is likely to make the far future trajectory go better), and second remarks about how speculative searching directly on the new consideration is (e.g. it is very hard to work out what we can do now which will benefit the far future).(6)

That the argument tends to end here is suggestive of motivated stopping. For although the object level benefits of (say) global poverty are not speculative, their putative flow-through benefits on the far future are speculative. Yet work to show that this is nonetheless less speculative than efforts to ‘directly’ work on the far future is left undone.(7) Similarly, even if it is the case the best way to make the far future go better is to push on a proxy indicator, which one? Work on why (e.g.) animal welfare is the strongest proxy out of competitors also tends to be left undone.(8) As a further black mark, it is suspect that those maintaining global poverty is the best proxy almost exclusively have prior commitments to global poverty causes, mutatis mutandis animal welfare, and so on.

We at least have some grasp of what features of (e.g.) animal welfare interventions make them good for the far future. If this (putatively) was the main value of animal welfare interventions due to the overwhelming importance of the far future, it would seem wise to try and pick interventions which maximize these features. So we come to a recursion: within animal welfare interventions, ‘object level’ and ‘far future’ benefits would be expected to only weakly converge. Yet (surprisingly and suspiciously) the animal welfare interventions recommended by the lights of the far future are usually the same as those recommended on ‘object level’ grounds.


If Oliver were biased, he would be far from alone. Most of us are (like it or not) at least somewhat partisan, and our convictions are in part motivated by extra-epistemic reasons: be it vested interests, maintaining certain relationships, group affiliations, etc. In pursuit of these ends we defend our beliefs against all considerations brought to bear against them. Few beliefs are indefatigable by the lights of any reasonable opinion, and few policy prescriptions are panaceas. Yet all of ours are.

It is unsurprising the same problems emerge within effective altruism: a particular case of ‘pretending to actually try’ is ‘pretending to take actually arguments seriously’.(9)These problems seem prevalent across the entirety of EA: that I couldn’t come up with good examples for meta or far future cause areas is probably explained by either bias on my part or a selection effect: were these things less esoteric, they would err more often.(10)

There’s no easy ‘in house’ solution, but I repeat my recommendations to Eleanor: as a rule, maintaining dialogue, presuming good faith, engaging on the merits, and listening to others seems a better strategy, even if we think bias is endemic. It is also worth emphasizing the broad (albeit weak) convergence between cause areas is fertile common ground, and a promising area for moral trade. Although it is unlikely that the best thing by the lights of one cause area is the best thing by the lights of another, it is pretty likely it will be pretty good. Thus most activities by EAs in a particular field should carry broad approbation and support from those working in others.

I come before you a sinner too. I made exactly the same sorts of suspicious arguments myself on behalf of global poverty. I’m also fairly confident my decision to stay in medicine doesn’t really track the merits either – but I may well end up a beneficiary of moral luck. I’m loath to accuse particular individuals of making the mistakes I identify here. But, insofar as readers think this may apply to them, I urge them to think again.(11)


  1. We may wonder why this is the case: the content of the different moral theories are pretty alien to one another (compare universalizable imperatives, proper functioning, and pleasurable experiences). I suggest the mechanism is implicit selection by folk or ‘commonsense’ morality. Normative theories are evaluated at least in part by how well they accord to our common moral intuitions, and they lose plausibility commensurate to how much violence they do to them. Although cases where a particular normative theory apparently diverges from common sense morality are well discussed (consider Kantianism and the inquiring murder, or Utilitarianism and the backpacker), moral theories that routinely contravene our moral intuitions are non-starters, and thus those that survive to be seriously considered somewhat converge with common moral intuitions, and therefore one another.
  2. There may be some asymmetry: on the object level we may anticipate the ‘flow forward’ effects of global health on x-risk to be greater than the ‘flow back’ benefits of x-risk work on global poverty. However (I owe this to Carl Shulman) the object level benefits are probably much smaller than more symmetrical ‘second order’ benefits, like shared infrastructure, communication and cross-pollination, shared expertise on common issues (e.g. tax and giving, career advice).
  3. But not always. Some things are so hard to estimate directly, and using proxy measures can do better. The key question is whether the correlation between our outcome estimates and the true values is greater than that between outcome and (estimates of) proxy measure outcome. If so, one should use direct estimation; if not, then the proxy measure. There may also be opportunities to use both sources of information in a combined model.
  4. One example I owe to Stefan Schubert: we generally take the fact someone says something as evidence it is true. Pointing out relevant ‘ad hominem’ facts (like bias) may defeat this presumption.
  5. Population data – epistemic epidemiology, if you will – may help. If we find that people who were previously committed to the operas much more commonly end up claiming the opera is best for human welfare than than other groups, this is suggestive of bias.

    A subsequent problem is how to disentangle bias from expertise or privileged access. Oliver could suggest that those involved in the opera gain ‘insider knowledge’, and their epistemically superior position explains why they disproportionately claim the opera is best for human welfare.

    Some features can help distinguish between bias and privileged access, between insider knowledge and insider beliefs. We might be able to look at related areas, and see if ‘insiders’ have superior performance which an insider knowledge account may predict (if insiders correctly anticipate movements in consensus, this is suggestive they have an edge). Another possibility is to look at migration of beliefs. If there is ‘cognitive tropism’, where better cognizers tend to move from the opera to AMF, this is evidence against donating to the opera in general and the claim of privileged access among opera-supporters in particular. Another is to look at ordering: if the population of those ‘exposed’ to the opera first and then considerations around human welfare are more likely to make Oliver’s claims than those exposed in reverse order, this is suggestive of bias on one side or the other.

  6. Although I restrict myself to ‘meta’-level concerns, I can’t help but suggest the ‘object level’ case for these things looks about as shaky as Oliver’s object level claims on behalf of the opera. In the same way we could question: “I grant that the arts is the an important aspect of human welfare, but is it the most important (compared to, say, avoiding preventable death and disability)?” or “What makes you so confident donations to the opera are the best for the arts - why not literature? or perhaps some less exoteric music?” We can post similarly tricky questions to proponents of 2-4: “I grant that (e.g.) antispeciesism is an important aspect of making the far future go well, but is it the most important aspect (compared to, say, extinction risks)?” or “What makes you so confident (e.g) cryonics is the best way of ensuring greater care for the future - what about militating for that directly? Or maybe philosophical research into whether this is the correct view in the first place?”

    It may well be that there are convincing answers to the object level questions, but I have struggled to find them. And, in honesty, I find the lack of public facing arguments in itself cause for suspicion.

  7. At least, undone insofar as I have seen. I welcome correction in the comments.
  8. The only work I could find taking this sort of approach is this.
  9. There is a tension between ‘taking arguments seriously’ and ‘deferring to common sense’. Effective altruism only weakly converges with common sense morality, and thus we should expect their recommendations to diverge. On the other hand, that something lies far from common sense morality is a pro tanto reason to reject it. This is better acknowledged openly: “I think the best action by the lights of EA is to research wild animal suffering, but all things considered I will do something else, as how outlandish this is by common sense morality is a strong reason against it”. (There are, of course, also tactical reasons that may speak against saying or doing very strange things.)
  10. This ‘esoteric selection effect’ may also undermine social epistemological arguments between cause areas:

    It seems to me that more people move from global poverty to far future causes than people move in the opposite direction (I suspect, but am less sure, the same applies between animal welfare and the far future). It also seems to me that (with many exceptions) far future EAs are generally better informed and cleverer than global poverty EAs.

    I don’t have great confidence in this assessment, but suppose I am right. This could be adduced as evidence in favour of far future causes: if the balance of reason favoured the far future over global poverty, this would explain the unbalanced migration and ‘cognitive tropism’ between the cause areas.

    But another plausible account explains this by selection. Global poverty causes are much more widely known that far future causes. Thus people who are ‘susceptible’ to be persuaded by far future causes were often previously persuaded by global poverty causes, whilst the reverse is not true - those susceptible to global poverty causes are unlikely to encounter far future causes first. Further, as far future causes are more esoteric, they will be disproportionately available to better-informed people. Thus, even if the balance of reason was against the far future, we would still see these trends and patterns of believers.

    I am generally a fan of equal-weight views, and of being deferential to group or expert opinion. However, selection effects like these make deriving the balance of reason from the pattern of belief deeply perplexing.

  11. Thanks to Stefan Schubert, Carl Shulman, Amanda MacAskill, Owen Cotton-Barratt and Pablo Stafforini for extensive feedback and advice. Their kind assistance should not be construed as either endorsement endorsement of the content, nor responsibility for any errors.

What's wrong with this picture?

13 CronoDAS 28 January 2016 01:30PM

Alice: "I just flipped a coin [large number] times. Here's the sequence I got:


(Alice presents her sequence.)


Bob: No, you didn't. The probability of having gotten that particular sequence is 1/2^[large number]. Which is basically impossible. I don't believe you.


Alice: But I had to get some sequence or other. You'd make the same claim regardless of what sequence I showed you.


Bob: True. But am I really supposed to believe you that a 1/2^[large number] event happened, just because you tell me it did, or because you showed me a video of it happening, or even if I watched it happen with my own eyes? My observations are always fallible, and if you make an event improbable enough, why shouldn't I be skeptical even if I think I observed it?


Alice: Someone usually wins the lottery. Should the person who finds out that their ticket had the winning numbers believe the opposite, because winning is so improbable?


Bob: What's the difference between finding out you've won the lottery and finding out that your neighbor is a 500 year old vampire, or that your house is haunted by real ghosts? All of these events are extremely improbable given what we know of the world.


Alice: There's improbable, and then there's impossible. 500 year old vampires and ghosts don't exist.


Bob: As far as you know. And I bet more people claim to have seen ghosts than have won more than 100 million dollars in the lottery.


Alice: I still think there's something wrong with your reasoning here.

Perhaps a better form factor for Meetups vs Main board posts?

13 lionhearted 28 January 2016 11:50AM

I like to read posts on "Main" from time to time, including ones that haven't been promoted. However, lately, these posts get drowned out by all the meetup announcements.

It seems like this could lead to a cycle where people comment less on recent non-promoted posts (because they fall off the Main non-promoted area quickly) which leads to less engagement, and less posts, etc.

Meetups are also very important, but here's the rub: I don't think a text-based announcement in the Main area is the best possible way to showcase meetups.

So here's an idea: how about creating either a calendar of upcoming meetups, or map with pins on it of all places having a meetup in the next three months?

This could be embedded on the front page of -- that'd let people find meetups easier (they can look either by timeframe or see if their region is represented), and would give more space to new non-promoted posts, which would hopefully promote more discussion, engagement, and new posts.


[link] "The Happiness Code" - New York Times on CFAR

13 Kaj_Sotala 15 January 2016 06:34AM

Long. Mostly quite positive, though does spend a little while rolling its eyes at the Eliezer/MIRI connection and the craziness of taking things like cryonics and polyamory seriously.

Conveying rational thinking about long-term goals to youth and young adults

10 Gleb_Tsipursky 07 February 2016 01:54AM
More than a year ago, I discussed here how we at Intentional Insights intended to convey rationality to young adults through our collaboration with the Secular Student Alliance. This international organization unites over 270 clubs at colleges and high schools in English-speaking countries, mainly the US, with its clubs spanning from a few students to a few hundred students. The SSA's Executive Director is an aspiring rationalist and CFAR alum who is on our Advisory Board.

Well, we've been working on a project with the SSA for the last 8 months to create and evaluate an event aimed to help its student members figure out and orient toward the long term, thus both fighting Moloch on a societal level and helping them become more individually rational as well (the long-term perspective is couched in the language of finding purpose using science) It's finally done, and here is the link to the event packet. The SSA will be distributing this packet broadly, but in the meantime, if you have any connections to secular student groups, consider encouraging them to hold this event. The event would also fit well for adult secular groups with minor editing, in case any of you are involved with them. It's also easy to strip the secular language from the packet, and just have it as an event for a philosophy/science club of any sort, at any level from youth to adult. Although I would prefer you cite Intentional Insights when you do it, I'm comfortable with you not doing so if circumstances don't permit it for some reason.

We're also working on similar projects with the SSA, focusing on being rational in the area of giving, so promoting Effective Altruism. I'll post it here when it's ready.  

Clearing An Overgrown Garden

10 Anders_H 29 January 2016 10:16PM

(tl;dr: In this post, I make some concrete suggestions for LessWrong 2.0.)

Less Wrong 2.0

A few months ago, Vaniver posted some ideas about how to reinvigorate Less Wrong. Based on comments in that thread and based on personal discussions I have had with other members of the community, I believe there are several different views on why Less Wrong is dying. The following are among the most popular hypotheses:

(1) Pacifism has caused our previously well-kept garden to become overgrown

(2) The aversion to politics has caused a lot of interesting political discussions to move away from the website

(3) People prefer posting to their personal blogs.

With this background, I suggest the following policies for Less Wrong 2.0.  This should be seen only as a starting point for discussion about the ideal way to implement a rationality forum. Most likely, some of my ideas are counterproductive. If anyone has better suggestions, please post them to the comments.

Moderation Policy:

There are four levels of users:  

  1. Users
  2. Trusted Users 
  3. Moderators
  4. Administrator
Users may post comments and top level posts, but their contributions must be approved by a moderator.

Trusted users may post comments and top level posts which appear immediately. Trusted user status is awarded by 2/3 vote among the moderators

Moderators may approve comments made by non-trusted users. There should be at least 10 moderators to ensure that comments are approved within an hour of being posted, preferably quicker. If there is disagreement between moderators, the matter can be discussed on a private forum. Decisions may be altered by a simple majority vote.

The administrator (preferably Eliezer or Nate) chooses the moderators.

Personal Blogs:

All users are assigned a personal subdomain, such as When publishing a top-level post, users may click a checkbox to indicate whether the post should appear only on their personal subdomain, or also in the Less Wrong discussion feed. The commenting system is shared between the two access pathways. Users may choose a design template for their subdomain. However, when the post is accessed from the discussion feed, the default template overrides the user-specific template. The personal subdomain may include a blogroll, an about page, and other information. Users may purchase a top-level domain as an alias for their subdomain

Standards of Discourse and Policy on Mindkillers:

All discussion in Less Wrong 2.0 is seen explicitly as an attempt to exchange information for the purpose of reaching Aumann agreement. In order to facilitate this goal, communication must be precise. Therefore, all users agree to abide by Crocker's Rules for all communication that takes place on the website.  

However, this is not a license for arbitrary rudeness.  Offensive language is permitted only if it is necessary in order to point to a real disagreement about the territory. Moreover, users may not repeatedly bring up the same controversial discussion outside of their original context.

Discussion of politics is explicitly permitted as long as it adheres to the rules outlined above. All political opinions are permitted (including opinions which are seen as taboo by society as large), as long as the discussion is conducted with civility and in a manner that is suited for dispassionate exchange of information, and suited for accurate reasoning about the consequences of policy choice. By taking part in any given discussion, all users are expected to pre-commit to updating in response to new information.


Only trusted users may vote. There are two separate voting systems.  Users may vote on whether the post raises a relevant point that will result in interesting discussion (quality of contribution) and also on whether they agree with the comment (correctness of comment). The first is a property both of the comment and of the user, and is shown in their user profile.  The second scale is a property only of the comment. 

All votes are shown publicly (for an example of a website where this is implemented, see for instance  Abuse of the voting system will result in loss of Trusted User Status. 

How to Implement This

After the community comes to a consensus on the basic ideas behind LessWrong 2.0, my preference is for MIRI to implement it as a replacement for Less Wrong. However, if for some reason MIRI is unwilling to do this, and if there is sufficient interest in going in this direction, I offer to pay server costs. If necessary, I also offer to pay some limited amount for someone to develop the codebase (based on Open Source solutions). 

Other Ideas:

MIRI should start a professionally edited rationality journal (For instance called "Rationality") published bi-monthly. Users may submit articles for publication in the journal. Each week, one article is chosen for publication and posted to a special area of Less Wrong. This replaces "main". Every two months, these articles are published in print in the journal.  

The idea behind this is as follows:
(1) It will incentivize users to compete for the status of being published in the journal.
(2) It will allow contributors to put the article on their CV.
(3) It may bring in high-quality readers who are unlikely to read blogs.  
(4) Every week, the published article may be a natural choice for discussion topic at Less Wrong Meetup

Require contributions in advance

9 Viliam 08 February 2016 12:55PM

If you are a person who finds it difficult to tell "no" to their friends, this one weird trick may save you a lot of time!


Scenario 1

Alice: "Hi Bob! You are a programmer, right?"

Bob: "Hi Alice! Yes, I am."

Alice: "I have this cool idea, but I need someone to help me. I am not good with computers, and I need someone smart whom I could trust, so they wouldn't steal my idea. Would you have a moment to listen to me?"

Alice explains to Bob her idea that would completely change the world. Well, at the least the world of bicycle shopping.

Instead of having many shops for bicycles, there could be one huge e-shop that would collect all the information about bicycles from all the existing shops. The customers would specify what kind of a bike they want (and where they live), and the system would find all bikes that fit the specification, and display them ordered by lowest price, including the price of delivery; then it would redirect them to the specific page of the specific vendor. Customers would love to use this one website, instead of having to visit multiple shops and compare. And the vendors would have to use this shop, because that's where the customers would be. Taking a fraction of a percent from the sales could make Alice (and also Bob, if he helps her) incredibly rich.

Bob is skeptical about it. The project suffers from the obvious chicken-and-egg problem: without vendors already there, the customers will not come (and if they come by accident, they will quickly leave, never to return again); and without customers already there, there is no reason for the vendors to cooperate. There are a few ways how to approach this problem, but the fact that Alice didn't even think about it is a red flag. She also has no idea who are the big players in the world of bicycle selling; and generally she didn't do her homework. But after pointing out all these objections, Alice still remains super enthusiastic about the project. She promises she will take care about everything -- she just cannot write code, and she needs Bob's help for this part.

Bob believes strongly in the division of labor, and that friends should help each other. He considers Alice his friend, and he will likely need some help from her in the future. Fact is, with perfect specification, he could make the webpage in a week or two. But he considers bicycles to be an extremely boring topic, so he wants to spend as little time as possible on this project. Finally, he has an idea:

"Okay, Alice, I will make the website for you. But first I need to know exactly how the page will look like, so that I don't have to keep changing it over and over again. So here is the homework for you -- take a pen and paper, and make a sketch of how exactly the web will look like. All the dialogs, all the buttons. Don't forget logging in and logging out, editing the customer profile, and everything else that is necessary for the website to work as intended. Just look at the papers and imagine that you are the customer: where exactly would you click to register, and to find the bicycle you want? Same for the vendor. And possibly a site administrator. Also give me the list of criteria people will use to find the bike they want. Size, weight, color, radius of wheels, what else? And when you have it all ready, I will make the first version of the website. But until then, I am not writing any code."

Alice leaves, satisfied with the outcome.


This happened a year ago.

No, Alice doesn't have the design ready, yet. Once in a while, when she meets Bob, she smiles at him and apologizes that she didn't have the time to start working on the design. Bob smiles back and says it's okay, he'll wait. Then they change the topic.


Scenario 2

Cyril: "Hi Diana! You speak Spanish, right?"

Diana: "Hi Cyril! Yes, I do."

Cyril: "You know, I think Spanish is the most cool language ever, and I would really love to learn it! Could you please give me some Spanish lessons, once in a while? I totally want to become fluent in Spanish, so I could travel to Spanish-speaking countries and experience their culture and food. Would you please help me?"

Diana is happy that someone takes interest in her favorite hobby. It would be nice to have someone around she could practice Spanish conversation with. The first instinct is to say yes.

But then she remembers (she knows Cyril for some time; they have a lot of friends in common, so they meet quite regularly) that Cyril is always super enthusiastic about something he is totally going to do... but when she meets him next time, he is super enthusiastic about something completely different; and she never heard about him doing anything serious about his previous dreams.

Also, Cyril seems to seriously underestimate how much time does it take to learn a foreign language fluently. Some lessons, once in a while will not do it. He also needs to study on his own. Preferably every day, but twice a week is probably a minimum, if he hopes to speak the language fluently within a year. Diana would be happy to teach someone Spanish, but not if her effort will most likely be wasted.

Diana: "Cyril, there is this great website called Duolingo, where you can learn Spanish online completely free. If you give it about ten minutes every day, maybe after a few months you will be able to speak fluently. And anytime we meet, we can practice the vocabulary you have already learned."

This would be the best option for Diana. No work, and another opportunity to practice. But Cyril insists:

"It's not the same without the live teacher. When I read something from the textbook, I cannot ask additional questions. The words that are taught are often unrelated to the topics I am interested in. I am afraid I will just get stuck with the... whatever was the website that you mentioned."

For Diana this feels like a red flag. Sure, textbooks are not optimal. They contain many words that the student will not use frequently, and will soon forget them. On the other hand, the grammar is always useful; and Diana doesn't want to waste her time explaining the basic grammar that any textbook could explain instead. If Cyril learns the grammar and some basic vocabulary, then she can teach him all the specialized vocabulary he is interested in. But now it feels like Cyril wants to avoid all work. She has to draw a line:

"Cyril, this is the address of the website." She takes his notebook and writes ''. "You register there, choose Spanish, and click on the first lesson. It is interactive, and it will not take you more than ten minutes. If you get stuck there, write here what exactly it was that you didn't understand; I will explain it when we meet. If there is no problem, continue with the second lesson, and so on. When we meet next time, tell me which lessons you have completed, and we will talk about them. Okay?"

Cyril nods reluctantly.


This happened a year ago.

Cyril and Diana have met repeatedly during the year, but Cyril never brought up the topic of Spanish language again.


Scenario 3

Erika: "Filip, would you give me a massage?"

Filip: "Yeah, sure. The lotion is in the next room; bring it to me!"

Erika brings the massage lotion and lies on the bed. Filip massages her back. Then they make out and have sex.


This happened a year ago. Erika and Filip are still a happy couple.

Filip's previous relationships didn't work well, in long term. In retrospect, they all followed a similar scenario. At the beginning, everything seemed great. Then at some moment the girl started acting... unreasonably?... asking Filip to do various things for her, and then acting annoyed when Filip did exactly what he was asked to do. This happened more and more frequently, and at some moment she broke up with him. Sometimes she provided explanation for breaking up that Filip was unable to decipher.

Filip has a friend who is a successful salesman. Successful both professionally and with women. When Filip admitted to himself that he is unable to solve the problem on his own, he asked his friend for advice.

"It's because you're a f***ing doormat," said the friend. "The moment a woman asks you to do anything, you immediately jump and do it, like a well-trained puppy. Puppies are cute, but not attractive. Have you ready any of those books I sent you, like, ten years ago? I bet you didn't. Well, it's all there."

Filip sighed: "Look, I'm not trying to become a pick-up artist. Or a salesman. Or anything. No offense, but I'm not like you, personality-wise, I never have been, and I don't want to become your - or anyone else's - copy. Even if it would mean greater success in anything. I prefer to treat other people just like I would want them to treat me. Most people reciprocate nice behavior; and those who don't, well, I avoid them as much as possible. This works well with my friends. It also works with the girls... at the beginning... but then somehow... uhm... Anyway, all your books are about manipulating people, which is ethically unacceptable for me. Isn't there some other way?"

"All human interaction is manipulation; the choice is between doing it right or wrong, acting consciously or driven by your old habits..." started the friend, but then he gave up. "Okay, I see you're not interested. Just let me show you the most obvious mistake you make. You believe that when you are nice to people, they will perceive you as nice, and most of them will reciprocate. And when you act like an asshole, it's the other way round. That's correct, on some level; and in a perfect world this would be the whole truth. But on a different level, people also perceive nice behavior as weakness; especially if you do it habitually, as if you don't have any other option. And being an asshole obviously signals strength: you are not afraid to make other people angry. Also, in long term, people become used to your behavior, good or bad. The nice people don't seem so nice anymore, but they still seem weak. Then, ironicaly, if the person well-known to be nice refuses to do something once, people become really angry, because their expectations were violated. And if the asshole decides to do something nice once, they will praise him, because he surprised them pleasantly. You should be an asshole once in a while, to make people see that you have a choice, so they won't take your niceness for granted. Or if your girlfriend wants something from you, sometimes just say no, even if you could have done it. She will respect you more, and then she will enjoy more the things you do for her."

Filip: "Well, I... probably couldn't do that. I mean, what you say seems to make sense, however much I hate to admit it. But I can't imagine doing it myself, especially to a person I love. It's just... uhm... wrong."

"Then, I guess, the very least you could do is to ask her to do something for you first. Even if it's symbolic, that doesn't matter; human relationships are mostly about role-playing anyway. Don't jump immediately when you are told to; always make her jump first, if only a little. That will demonstrate strength without hurting anyone. Could you do that?"

Filip wasn't sure, but at the next opportunity he tried it, and it worked. And it kept working. Maybe it was all just a coincidence, maybe it was a placebo effect, but Filip doesn't mind. At first it felt kinda artificial, but then it became natural. And later, to his surprise, Filip realized that practicing these symbolic demands actually makes it easier to ask when he really needed something. (In which case sometimes he was asked to do something first, because his girlfriend -- knowingly or not? he never had the courage to ask -- copied the pattern; or maybe she has already known it long before. But he didn't mind that either.)


The lesson is: If you find yourself repeatedly in situations where people ask you to do something for them, but at the end they don't seem to appreciate what you did for them, or don't even care about the thing they asked you to do... and yet you find it difficult to say "no"... ask them to contribute to the project first.

This will help you get rid of the projects they don't care about (including the ones they think they care about in far mode, but do not care about enough to actually work on them in near mode) without being the one who refuses cooperation. Also, the act of asking the other person to contribute, after being asked to do something for them, mitigates the status loss inherent in working for them.

[Link] Lifehack article promoting rationality-themed ideas, namely long-term orientation, mere-exposure effect, consider-the-alternative, and agency

9 Gleb_Tsipursky 11 January 2016 08:14PM

Here's my article in Lifehack, one of the most prominent self-improvement websites, bringing rationality-style ideas to a broad audience, specifically long-term orientation, mere-exposure effect, consider-the-alternative, and agency :-)


P.S. Based on feedback from the LessWrong community, I made sure to avoid mentioning LessWrong or rationality in the article.

[LINK] How A Lamp Took Away My Reading And A Box Brought It Back

8 CronoDAS 30 January 2016 04:55PM

By Ferrett Steinmetz

Ferrett isn't officially a Rationality Blogger, but he posts things that seem relevant fairly often. This one is in the spirit of "Beware Trivial Inconveniences". It's the story of how he realized that a small change in his environment led to a big change in his behavior...

[Stub] Ontological crisis = out of environment behaviour?

8 Stuart_Armstrong 13 January 2016 03:10PM

One problem with AI is the possibility of ontological crises - of AIs discovering their fundamental model of reality is flawed, and being unable to cope safely with that change. Another problem is the out-of-environment behaviour - that an AI that has been trained to behave very well in a specific training environment, messes up when introduced to a more general environment.

It suddenly occurred to me that these might in fact be the same problem in disguise. In both cases, the AI has developed certain ways of behaving in reaction to certain regular features of their environment. And suddenly they are placed in a situation where these regular features are absent - either because they realised that these features are actually very different from what they thought (ontological crisis) or because the environment is different and no longer supports the same regularities (out-of-environment behaviour).

In a sense, both these errors may be seen as imperfect extrapolation from partial training data.

[Link] How I Escaped The Darkness of Mental Illness

7 Gleb_Tsipursky 04 February 2016 11:08PM
A deeply personal account by aspiring rationalist Agnes Vishnevkin, who shares the broad overview of how she used rationality-informed strategies to recover from mental illness. She will also appear on the Unbelievers Radio podcast today live at 10:30 PM EST (-5 UTC), together with JT Eberhard, to speak about mental illness and recovery.

**EDIT** Based on feedback from gjm below, I want to clarify that Agnes is my wife and fellow co-founder of Intentional Insights.

Identifying bias. A Bayesian analysis of suspicious agreement between beliefs and values.

7 Stefan_Schubert 31 January 2016 11:29AM

Here is a new paper of mine (12 pages) on suspicious agreement between belief and values. The idea is that if your empirical beliefs systematically support your values, then that is evidence that you arrived at those beliefs through a biased belief-forming process. This is especially so if those beliefs concern propositions which aren’t probabilistically correlated with each other, I argue.

I have previously written several LW posts on these kinds of arguments (here and here; see also mine and ClearerThinking’s political bias test) but here the analysis is more thorough. See also Thrasymachus' recent post on the same theme.

Yoshua Bengio on AI progress, hype and risks

7 V_V 30 January 2016 01:45AM


Yoshua Bengio, one the world's leading expert on machine learning, and neural networks in particular, explains his view on these issues in an interview. Relevant quotes:

There are people who are grossly overestimating the progress that has been made. There are many, many years of small progress behind a lot of these things, including mundane things like more data and computer power. The hype isn’t about whether the stuff we’re doing is useful or not—it is. But people underestimate how much more science needs to be done. And it’s difficult to separate the hype from the reality because we are seeing these great things and also, to the naked eye, they look magical

[ Recursive self-improvement ] It’s not how AI is built these days. Machine learning means you have a painstaking, slow process of acquiring information through millions of examples. A machine improves itself, yes, but very, very slowly, and in very specialized ways. And the kind of algorithms we play with are not at all like little virus things that are self-programming. That’s not what we’re doing.

Right now, the way we’re teaching machines to be intelligent is that we have to tell the computer what is an image, even at the pixel level. For autonomous driving, humans label huge numbers of images of cars to show which parts are pedestrians or roads. It’s not at all how humans learn, and it’s not how animals learn. We’re missing something big. This is one of the main things we’re doing in my lab, but there are no short-term applications—it’s probably not going to be useful to build a product tomorrow.

We ought to be talking about these things [ AI risks ]. The thing I’m more worried about, in a foreseeable future, is not computers taking over the world. I’m more worried about misuse of AI. Things like bad military uses, manipulating people through really smart advertising; also, the social impact, like many people losing their jobs. Society needs to get together and come up with a collective response, and not leave it to the law of the jungle to sort things out.

I think it's fair to say that Bengio has joined the ranks of AI researchers like his colleagues Andrew Ng and Yann LeCun who publicly express skepticism towards imminent human-extinction-level AI.

[LINK] Common fallacies in probability (when numbers aren't used)

7 Stuart_Armstrong 15 January 2016 08:29AM

Too many people attempt to use logic when they should be using probabilities - in fact, when they are using probabilities, but don't mention it. Here are some of the major fallacies caused by misusing logic and probabilities this way:

  1. "It's not certain" does not mean "It's impossible" (and vice versa).
  2. "We don't know" absolutely does not imply "It's impossible".
  3. "There is evidence against it" doesn't mean much on its own.
  4. Being impossible *in a certain model*, does not mean being impossible: it changes the issue to the probability of the model.

Common fallacies in probability

Request for help with economic analysis related to AI forecasting

6 ESRogs 06 February 2016 01:27AM

[Cross-posted from FB]

I've got an economic question that I'm not sure how to answer.

I've been thinking about trends in AI development, and trying to get a better idea of what we should expect progress to look like going forward.

One important question is: how much do existing AI systems help with research and the development of new, more capable AI systems?

The obvious answer is, "not much." But I think of AI systems as being on a continuum from calculators on up. Surely AI researchers sometimes have to do arithmetic and other tasks that they already outsource to computers. I expect that going forward, the share of tasks that AI researchers outsource to computers will (gradually) increase. And I'd like to be able to draw a trend line. (If there's some point in the future when we can expect most of the work of AI R&D to be automated, that would be very interesting to know about!)

So I'd like to be able to measure the share of AI R&D done by computers vs humans. I'm not sure of the best way to measure this. You could try to come up with a list of tasks that AI researchers perform and just count, but you might run into trouble as the list of tasks to changes over time (e.g. suppose at some point designing an AI system requires solving a bunch of integrals, and that with some later AI architecture this is no longer necessary).

What seems more promising is to abstract over the specific tasks that computers vs human researchers perform and use some aggregate measure, such as the total amount of energy consumed by the computers or the human brains, or the share of an R&D budget spent on computing infrastructure and operation vs human labor. Intuitively, if most of the resources are going towards computation, one might conclude that computers are doing most of the work.

Unfortunately I don't think that intuition is correct. Suppose AI researchers use computers to perform task X at cost C_x1, and some technological improvement enables X to be performed more cheaply at cost C_x2. Then, all else equal, the share of resources going towards computers will decrease, even though their share of tasks has stayed the same.

On the other hand, suppose there's some task Y that the researchers themselves perform at cost H_y, and some technological improvement enables task Y to be performed more cheaply at cost C_y. After the team outsources Y to computers the share of resources going towards computers has gone up. So it seems like it could go either way -- in some cases technological improvements will lead to the share of resources spent on computers going down and in some cases it will lead to the share of resources spent on computers going up.

So here's the econ part -- is there some standard economic analysis I can use here? If both machines and human labor are used in some process, and the machines are becoming both more cost effective and more capable, is there anything I can say about how the expected share of resources going to pay for the machines changes over time?

AI safety in the age of neural networks and Stanislaw Lem 1959 prediction

6 turchin 31 January 2016 07:08PM

Tl;DR: Neural networks will result in slow takeoff and arm race between two AIs. It has some good and bad consequences to the problem of AI safety. Hard takeoff may happen after it anyway.

Summary: Neural networks based AI can be built; it will be relatively safe, not for a long time though.

The neuro AI era (since 2012) feature an exponential growth of the total AI expertise, with a doubling period of about 1 year, mainly due to data exchange among diverse agents and different processing methods. It will probably last for about 10 to 20 years, after that, hard takeoff of strong AI or creation of Singleton based on integration of different AI systems can take place.

Neural networks based AI implies slow takeoff, which can take years and eventually lead to AI’s evolutionary integration into the human society. A similar scenario was described by Stanisław Lem in 1959: the arms race between countries would cause power race between AIs. The race is only possible if the self-enhancement rate is rather slow and there is data interchange between the systems. The slow takeoff will result in a world system with two competitive AI-countries. Its major risk will be a war between AIs and corrosion of value system of competing AIs.

The hard takeoff implies revolutionary changes within days or weeks. The slow takeoff can transform into the hard takeoff at some stage. The hard takeoff is only possible if one AI considerably surpasses its peers (OpenAI project wants to prevent it).


Part 1. Limitations of explosive potential of neural nets

Everyday now we hear about success of neural networks, and we could conclude that human level AI is near the corner. But such type of AI is not fit for explosive self-improvement.

If AI is based on neural net, it is not easy for it to undergo quick self-improvement for several reasons:

1. A neuronet’s executable code is not fully transparent because of theoretical reasons, as knowledge is not explicitly present within it. So even if one can read neuron weight values, it’s not easy to understand how they can be changed to improve something.

2. Educating a new neural network is a resource-consuming task. If a neuro AI decides to go the way of self-enhancement, but is unable to understand its source code, a logical solution would be to ‘deliver a child’, i.e. to teach a new neural network. However, educating neural networks requires much more resources than their executing; it requires huge databases and has high failure probability. All those factors will lead to rather slow AI self-enhancement.

3. Neural network education depends on big data volumes and new ideas coming from the external world. It means that a single AI will hardly break away, if it has stopped free information exchange with the external world; its level will not surpass the rest of the world considerably.

4. The neural network power has relatively linear dependence on the power of the computer it’s run on, so with a neuro AI, the hardware power is limiting to its self-enhancement ability.

5. Neuro AI would be a rather big program of about 1 TByte, so it can hardly leak into the network unnoticed (at current internet speeds).

6. Even if a neuro AI reaches the human level, it will not get self-enhancement ability (because no one person can understand all scientific aspects). For this end, a big lab with numerous experts in different branches is needed. Additionally, it should be able to launch such virtual laboratory at a rate at least 10 -100 times higher than that of a human being to get an edge as compared to the rest of mankind. That is, it has to be as powerful as 10,000 people or more to surpass the rest part of the mankind in terms of enhancement rate. This is a very high requirement. As a result, the neural net era can lead to building a human, or even a bit superhuman level AI, which is unable to self-enhance or does it so slowly that lags behind the technical progress.

The civilization-level intelligence is the total IQ that the civilization possesses for 100 years of its history, which is defined as a complexity of scientific and engineering tasks it can solve. For example, during the 20th century, nuclear weapon was created, but problems of cancer, aging and AI creation failed to be solved. It means, those tasks have superior complexity.

For a strong AI to be able to change the human destiny, its IQ should 100 – 1000 times surpass that of the entire civilization. In this case, it can create a remedy against aging, treat cancer etc. within a year. (To destroy mankind, it would need a much lower level of intelligence like that of a sophisticated computer virus, which can,

7. The richest neural net project will get decisive advantage (or not?). Neuronets based AI’s power is approximately proportional to the power of computers it’s run on, time, and a size of the base it was educated on, and number of researchers in the project. The project from the basement may only succeed if it has superior master algorithm of intelligence. That said, small project can hardly experience exponential self-improvement, because the first AI will already have had better computing power and, more important, better databases and researchers. And as now we live in AI-summer period and there is a lot of hype about AI, big companies will invest to buy the best ideas about master algorithm (by buying startups).

Objection here is that now everybody who spent 1 year in machine learning and has GPU could reach meaningful results in creation his own RNN. So may be collective intelligence of private researchers may overweight big companies.

A detailed roadmap of AI failure modes can be found here; and the roadmap of the ways of creating a safe AI here.

See also machine learning expert opinion and recent FHI post


Part 2. Some ideas about ensuring safety of neural nets based AI

The mentioned concepts can be helpful in creating extra safety measures (while not absolute, they can reduce chances of AI getting out of control.)

1. Program memory access restriction. In state-of-the-art computers, a program code and data share the same memory, which is known as von Neumann architecture. However, there is also Harvard architecture, where data and programs are stored separately.

This kind of computer can be physically produced and provided with source-code reading attempt indicators. (However, there are neural networks at the moment, which generate a new neural network for every task, so they have higher efficiency, and this approach will not work for them or limit AI power).

2. A compiled executable code and its source written in a high-level programming language should be stored separately. If AI tries to reach it, it will be a signal that it is getting out of control. With neural networks, a compiled code, its source, and a database used to educate the neural network should be stored separately.

3. AI should be limited in knowledge useful to build AI. To improve itself, AI should have very precise understanding of its internal construction (the same applies to a human being). It enables us to keep it under control: if not provided with this data, it will have problems with self-improvement. That is AI, which is not an AI expert, is safer to some extent. 

4. We can use “foggy architecture”, which is very difficult for optimizing because the AI’s values and major features are hidden among heaps of other details. Neural networks are the architecture of this kind. But it will make it also difficult to us to put our values in it.

5. We can physically restrict the ability of the AI to create “AI children”; their bringing-up requires huge resources, time, and data prior they become more effective than their parents. AI can create another AI, but then it will need to test it for a long time to find out if the latter observes the parent’s values, if it has any mistakes, and if it is really superior to the original version. This kind of testing is time-consuming, generally nontrivial, and even incomputable. Especially, this kind of task is difficult for neural networks which contain a huge amount of random elements, are not exact in principle, and their faultlessness can not be proven. (Moreover, there are numerous examples when even a small image shift will cause a recognition error.) However, this problem can be solved because this is the way that the human society goes: children are born, tested, and educated.

Creation of AI-children may be restricted by restricting needed infrastructure: memory, datasets, processing power.

6. A neuronet-based agent (like a robot) will be anthropomorphous in terms of its brain structure. Even if we shall not imitate the human brain intentionally, we shall get approximately the same thing. In a sense, it’s may be good as even if these AIs supplant people, they still will be almost people who are different from normal people like one generation from another. And being anthropomorphous they may be more compaterble with human value systems.  Along with that, there may exist absolutely humanless AI architecture types (for example, if evolution is regarded as an inventor.)

But neural net world will be not EM-dominated world of Hanson. EM-world may appear on later stage, but I think that exact uploads still will not be dominating form of AI.


Part 3. Transition from slow to hard takeoff

In a sense, neuronet-based AI is like a chemical fuel rocket: they do fly and can fly even across the entire solar system, but they are limited in terms of their development potential, bulky, and clumsy.

Sooner or later, using the same principle or another one, completely different AI can be built, which will be less resource-consuming and faster in terms of self-improvement ability.

If a certain superagent will be built, which can create neural networks, but is not a neural network itself, it can be of a rather small size and, partly due to this, experience faster evolution. Neural networks have rather poor intelligence per code concentration. Probably, the same thing could be done in a more optimum way by reducing its size by an order of magnitude, for example, by creating a program to analyze an already educated neural network and get all necessary information from it.

When, in 10 – 20 years, hardware will improve, multiple neuronets will be able to evolve within the same computer simultaneously or be transmitted via the Internet, which will boost their development.

Smart neuro AI can analyze all available data analysis methods and create new AI architecture able to speed up faster.

Launch of quantum-computer-based networks can boost their optimization drastically.

There are many other promising AI directions which did not pop up yet: Bayesian networks, genetic algorithms.

The neuro AI era will feature exponential growth of the total humanity intelligence, with a doubling period of about 1 year, mainly due to the data exchange among diverse agents and different processing methods. It will last for about 10 to 20 years (2025-2035) and, after that, hard take-off of strong AI can take place.

That is, the slow take-off period will be the period of collective evolution of both computer science and mankind, which will enable us to adapt to changes under way and adjust them.

Just like there are Mac and PC in the computer world or democrats and republicans in politics, it is likely that two big competing AI systems will appear (plus, ecology consisting of smaller ones). It could be Google and Facebook or USA and China, depending on whether the world will choose the way of economical competition or military opposition. That is, the slow take-off hinders the world consolidation under the single control, but rather promotes a bipolar model. While a bipolar system can remain stable for a long period of time, there are always risks of a real war between the AIs (see Lem’s quote below).


Part 4. In the course of the slow takeoff, AI will go through several stages, that we can figure out now

 While the stages can be passed rather fast or be diluted, we still can track them like milestones. The dates are only estimates.

1. AI autopilot. Tesla has it already.

2. AI home robot. All prerequisites are available to build it by 2020 maximum. This robot will be able to understand and fulfill an order like ‘Bring my slippers from the other room’. On its basis, something like “mind-brick” may be created, which is a universal robot brain able to navigate in natural space and recognize speech. Then, this mind-brick can be used to create more sophisticated systems.

3. AI intellectual assistant. Searching through personal documentation, possibility to ask questions in a natural language and receive wise answers. 2020-2030.

4. AI human model. Very vague as yet. Could be realized by means of a robot brain adaptation. Will be able to simulate 99% of usual human behavior, probably, except for solving problems of consciousness, complicated creative tasks, and generating innovations. 2030.

5. AI as powerful as an entire research institution and able to create scientific knowledge and get self-upgraded. Can be made of numerous human models. 100 simulated people, each working 100 times faster than a human being, will be probably able to create AI capable to get self-improved faster, than humans in other laboratories can do it. 2030-2100

   5a Self-improving threshold. AI becomes able to self-improve independently and quicker than all humanity

   5b Consciousness and qualia threshold. AI is able not only pass Turing test in all cases, but has experiences and has understanding why and what it is.

6. Mankind-level AI. AI possessing intelligence comparable to that of the whole mankind. 2040-2100 

7. AI with the intelligence 10 – 100 times bigger than that of the whole mankind. It will be able to solve problems of aging, cancer, solar system exploration, nanorobots building, and radical improvement of life of all people. 2050-2100

8. Jupiter brain – huge AI using the entire planet’s mass for calculations. It can reconstruct dead people, create complex simulations of the past, and dispatch von Neumann probes. 2100-3000 

9. Galactic kardashov level 3 AI. Several million years from now.

10. All-Universe AI. Several billion years from now


Part 5. Stanisław Lem on AI, 1959, Investigation

In his novel «Investigation» Lem's character discusses future of arm race and AI:


- Well, it was somewhere in 46th, A nuclear race had started. I knew that when the limit would be reached (I mean maximum destruction power), development of vehicles to transport the bomb would start. .. I mean missiles. And here is where the limit would be reached, that is both parts would have nuclear warhead missiles at their disposal. And there would arise desks with notorious buttons thoroughly hidden somewhere. Once the button is pressed, missiles take off. Within about 20 minutes, finis mundi ambilateralis comes - the mutual end of the world. <…> Those were only prerequisites. Once started, the arms race can’t stop, you see? It must go on. When one part invents a powerful gun, the other responds by creating a harder armor. Only a collision, a war is the limit. While this situation means finis mundi, the race must go on. The acceleration, once applied, enslaves people. But let’s assume they have reached the limit. What remains? The brain. Command staff’s brain. Human brain can not be improved, so some automation should be taken on in this field as well. The next stage is an automated headquarters or strategic computers. And here is where an extremely interesting problem arises. Namely, two problems in parallel. Mac Cat has drawn my attention to it. Firstly, is there any limit for development of this kind of brain? It is similar to chess-playing devices. A device, which is able to foresee the opponent’s actions ten moves in advance, always wins against the one, which foresees eight or nine moves ahead. The deeper the foresight, the more perfect the brain is. This is the first thing. <…> Creation of devices of increasingly bigger volume for strategic solutions means, regardless of whether we want it or not, the necessity to increase the amount of data put into the brain, It in turn means increasing dominating of those devices over mass processes within a society. The brain can decide that the notorious button should be placed otherwise or that the production of a certain sort of steel should be increased – and will request loans for the purpose. If the brain like this has been created, one should submit to it. If a parliament starts discussing whether the loans are to be issued, the time delay will occur. The same minute, the counterpart can gain the lead. Abolition of parliament decisions is inevitable in the future. The human control over solutions of the electronic brain will be narrowing as the latter will concentrate knowledge. Is it clear? On both sides of the ocean, two continuously growing brains appear. What is the first demand of a brain like this, when, in the middle of an accelerating arms race, the next step will be needed? <…> The first demand is to increase it – the brain itself! All the rest is derivative.

- In a word, your forecast is that the earth will become a chessboard, and we – the pawns to be played by two mechanical players during the eternal game?

Sisse’s face was radiant with proud.

- Yes. But this is not a forecast. I just make conclusions. The first stage of a preparatory process is coming to the end; the acceleration grows. I know, all this sounds unlikely. But this is the reality. It really exists!

— <…> And in this connection, what did you offer at that time?

- Agreement at any price. While it sounds strange, but the ruin is a less evil than the chess game. This is awful, lack of illusions, you know.


Part 6. The primary question is: Will strong AI be built during our lifetime?

That is, is this a question of future generations’ good (the question that an efficient altruist, not a common person, is concerned about) or a question of my near term planning?

If AI will be built during my lifetime, it may lead to either the radical life extension by means of different technologies and realization of all sorts of good things not to be numbered here or my death and probably pain, if this AI is unfriendly.

It depends on the time when AI is built and my expected lifetime (with the account for the life extension to be obtained from weaker AI versions and scientific progress on one hand, and its reduction due to global risks irrelevant to AI.)

Note that we should consider different dates for different events. If we would like to avoid AI risks, we should take the earliest date of its possible appearance (for example, the first 10%). And if we count on its good, then – the median.

Since the moment of neuro-revolution, an approximate rate of doubling AI algorithms efficiency (mainly in image recognition area) is about 1 year. It is difficult to quantify this process as the task complexity does not change linearly, and it is always more difficult to recognize recent patterns. 

Now, an important factor is a radical change in attitude towards AI research. Winter is over, the unstrained summer with all its overhype has begun. It caused huge investments to AI research (chart), more enthusiasts and employees in this field, and bold researches. It’s a shame to have no own AI project now. Even KAMAZ develops a friendly AI system. The entry threshold has dropped: one can learn basic neuronet adjustment skills within one year; heaps of tutorial programs are available. Supercomputer hardware got cheaper. Also, a guaranteed market of AIs in form of autopilot cars and, in the future, home robots has emerged.

If the algorithm improvement keeps the pace of about one doubling per year, it means 1,000,000 during 20 years, which certainly will be equal to creating a strong AI beyond a self-improvement threshold. In this case, a lot of people (and me) have good chances to live till the moment  and get immortality.



Even not self-improving neural AI system may be unsafe if it get global domination (and will have bad values) or if it will go into confrontation with equally large opposing system. Such confrontation may result in nuclear or nanotech based war, and human population may be hostage especially if both systems have pro-human value system (blackmail).

Anyway slow takeoff AI risks of human extinction are not inevitable and are manageable in ad hoc basis. Slow takeoff does not prevent hard takeoff on later stage of AI development.

Hard takeoff is probably the next logical stage of soft takeoff, as it will continue the trend of accelerating progress. During biological evolution we could witness the same process: slow process of brain enlargement of mammalian species in last tens of million years was replace by almost hard takeoff of Homo sapience intelligence which threatens ecological balance.

Hardtake off is a global catastrophe almost by definition, which needs extraordinary measures to be put into safe way. Maybe the period of almost human level neural net based AI will help us to create instruments of AI control. Maybe we could use simpler neural AIs to control self-improving system.

Another option is that neural AI age will be very short and it is already almost over. In 2016 Google Deep Mind beats Go using complex approach of several AI architectures combined. If such trend continue we could get Strong AI before 2020 and we are completely not ready for it.



The Charity Impact Calculator

6 Gleb_Tsipursky 26 January 2016 05:01AM

This will be of interest mainly to EA-friendly LWs, and is cross-posted on the EA Forum, The Life You Can Save, and Intentional Insights


The Life You Can Save has an excellent tool to help people easily visualize and quantify the impact of their giving: the Impact Calculator. It enables people to put in any amount of money they want, then click on a charity, and see how much of an impact their money can have. It's a really easy way to promote effective giving to non-EAs, but even EAs who didn't see it before can benefit. I certainly did, when I first played around with it. So I wrote a blog post, copy-pasted below, for The Life You Can Save and for Intentional Insights, to help people learn about the Impact Calculator. If you like the blog, please share this link to The Life You Can Save blog, as opposed to this post. Any feedback on the blog post itself is welcomed!




How a Calculator Helped Me Multiply My Giving

It feels great to see hope light up in the eyes of a beggar in the street as you stop to look at them when others pass them by without a glance. Their faces widen in a smile as you reach into your pocket and take out your wallet. "Thank you so much" is such a heartwarming phrase to hear from them as you pull out five bucks and put the money in the hat in front of them. You walk away with your heart beaming as you imagine them getting a nice warm meal at McDonalds due to your generosity.

Yet with the help of a calculator, I learned how to multiply that positive experience manifold! Imagine that when you give five dollars, you don’t give just to one person, but to seven people. When you reach into your pocket, you see seven smiles. When you put the money in the hat, you hear seven people say “Thank you so much.”

The Life You Can Save has an Impact Calculator that helps you calculate the impact of your giving. You can put in any amount of money you want, then click on a charity of your choice, and see how much of an impact your money can have.

When I learned about this calculator, I decided to check out how far $5 can take me. I went through various charities listed there and saw the positive difference that my money can make.

I was especially struck by one charity, GiveDirectly is a nonprofit that enables you to give directly to people in East Africa. When I put in $5, I saw that what GiveDirectly does is transfers that money directly to poor people who live on an average of $.65 per day. You certainly can’t buy a McDonald’s meal for that, but $.65 goes far in East Africa.

That really struck me. I realized I can get a really high benefit from giving directly to people in the developing world, much more than I would from giving to one person in the street here in the US. I don’t see those seven people in front of me and thus don’t pay attention to the impact I can have on them, a thinking error called attentional bias. Yet if I keep in mind this thinking error, I can solve what is known as the “drowning child problem” in charitable giving, namely not intuitively valuing the children who are drowning out of my sight. If I keep in my mind that there are poor people in the developing world, just like the poor person I see on the street in front of me, I can remember that my generosity can make a very high impact, much more impact per dollar than in the US, in developing countries through my direct giving.

GiveDirectly bridges that gap between me and the poor people across the globe. This organization locates poor people who can benefit most from cash transfers, enrolls them in its program, and then provides each household with about a thousand dollars to spend as it wishes. The large size of this cash transfer results in a much bigger impact than a small donation. Moreover, since the cash transfer is unconditional, the poor person can have true dignity and spend it on whatever most benefits them.

Helida, for example, used the cash transfer she got to build a new house. You wouldn’t intuitively think that was most useful thing for her to do, would you? But this is what she needed most. She was happy that as a result of the cash transfer “I have a metal roof over my head and I can safely store my farm produce without worries.” She is now much more empowered to take care of herself and her large family.

What a wonderful outcome of GiveDirectly’s work! Can you imagine building a new house in the United States on a thousand dollars? Well, this is why your direct donations go a lot further in East Africa.

With GiveDirectly, you can be much more confident about the outcome of your generosity. I know that when I give to a homeless person, a part of me always wonders whether he will spend the money on a bottle of cheap vodka. This is why I really appreciate that GiveDirectly keeps in touch and follows up with the people enrolled in its programs. They are scrupulous about sharing the consequences of their giving, so you know what you are getting by your generous gifts.

GiveDirectly is back by rigorous evidence. They conduct multiple randomized control studies of their impact, a gold standard of evidence. The research shows that cash transfer recipients have much better health and lives as a result of the transfer, much more than most types of anti-poverty interventions. Its evidence-based approach is why GiveDirectly is highly endorsed by well-respected charity evaluators such as GiveWell and The Life You Can Save, which are part of the Effective Altruist movement that strives to figure out the best research-informed means to do the most good per dollar.

So next time you pass someone begging on the street, think about GiveDirectly, since you can get seven times as much impact, for your emotional self and for the world as a whole. What I do myself is each time I choose to give to a homeless person, I set aside the same amount of money to donate through GiveDirectly. That way, I get to see the smile and hear the “thank you” in person, and also know that I can make a much more impactful gift as well.

Check out the Impact Calculator for yourself to see the kind of charities available there and learn about the impact you can make. Perhaps direct giving is not to your taste, but there are over a dozen other options for you to choose from. Whatever you choose, aim to multiply your generosity to achieve your giving goals!

Map:Territory::Uncertainty::Randomness – but that doesn’t matter, value of information does.

6 Davidmanheim 22 January 2016 07:12PM

In risk modeling, there is a well-known distinction between aleatory and epistemic uncertainty, which is sometimes referred to, or thought of, as irreducible versus reducible uncertainty. Epistemic uncertainty exists in our map; as Eliezer put it, “The Bayesian says, ‘Uncertainty exists in the map, not in the territory.’” Aleatory uncertainty, however, exists in the territory. (Well, at least according to our map that uses quantum mechanics, according to Bells Theorem – like, say, the time at which a radioactive atom decays.) This is what people call quantum uncertainty, indeterminism, true randomness, or recently (and somewhat confusingly to myself) ontological randomness – referring to the fact that our ontology allows randomness, not that the ontology itself is in any way random. It may be better, in Lesswrong terms, to think of uncertainty versus randomness – while being aware that the wider world refers to both as uncertainty. But does the distinction matter?

To clarify a key point, many facts are treated as random, such as dice rolls, are actually mostly uncertain – in that with enough physics modeling and inputs, we could predict them. On the other hand, in chaotic systems, there is the possibility that the “true” quantum randomness can propagate upwards into macro-level uncertainty. For example, a sphere of highly refined and shaped uranium that is *exactly* at the critical mass will set off a nuclear chain reaction, or not, based on the quantum physics of whether the neutrons from one of the first set of decays sets off a chain reaction – after enough of them decay, it will be reduced beyond the critical mass, and become increasingly unlikely to set off a nuclear chain reaction. Of course, the question of whether the nuclear sphere is above or below the critical mass (given its geometry, etc.) can be a difficult to measure uncertainty, but it’s not aleatory – though some part of the question of whether it kills the guy trying to measure whether it’s just above or just below the critical mass will be random – so maybe it’s not worth finding out. And that brings me to the key point.

In a large class of risk problems, there are factors treated as aleatory – but they may be epistemic, just at a level where finding the “true” factors and outcomes is prohibitively expensive. Potentially, the timing of an earthquake that would happen at some point in the future could be determined exactly via a simulation of the relevant data. Why is it considered aleatory by most risk analysts? Well, doing it might require a destructive, currently technologically impossible deconstruction of the entire earth – making the earthquake irrelevant. We would start with measurement of the position, density, and stress of each relatively macroscopic structure, and the perform a very large physics simulation of the earth as it had existed beforehand. (We have lots of silicon from deconstructing the earth, so I’ll just assume we can now build a big enough computer to simulate this.) Of course, this is not worthwhile – but doing so would potentially show that the actual aleatory uncertainty involved is negligible. Or it could show that we need to model the macroscopically chaotic system to such a high fidelity that microscopic, fundamentally indeterminate factors actually matter – and it was truly aleatory uncertainty. (So we have epistemic uncertainty about whether it’s aleatory; if our map was of high enough fidelity, and was computable, we would know.)

It turns out that most of the time, for the types of problems being discussed, this distinction is irrelevant. If we know that the value of information to determine whether something is aleatory or epistemic is negative, we can treat the uncertainty as randomness. (And usually, we can figure this out via a quick order of magnitude calculation; Value of Perfect information is estimated to be worth $100 to figure out which side the dice lands on in this game, and building and testing / validating any model for predicting it would take me at least 10 hours, my time is worth at least $25/hour, it’s negative.) But sometimes, slightly improved models, and slightly better data, are feasible – and then worth checking whether there is some epistemic uncertainty that we can pay to reduce. In fact, for earthquakes, we’re doing that – we have monitoring systems that can give several minutes of warning, and geological models that can predict to some degree of accuracy the relative likelihood of different sized quakes.

So, in conclusion; most uncertainty is lack of resolution in our map, which we can call epistemic uncertainty. This is true even if lots of people call it “truly random” or irreducibly uncertain – or if they are fancy, aleatory uncertainty. Some of what we assume is uncertainty is really randomness. But lots of the epistemic uncertainty can be safely treated as aleatory randomness, and value of information is what actually makes a difference. And knowing the terminology used elsewhere can be helpful.

Thinking About a Technical Solution to Coordination Problems

6 ChaosMote 17 January 2016 07:16AM

I was just reading an article online, and one of the comments mentioned a political issue (the legality of corporate contributions to political campaigns). One of the responses what a comment saying "Not until we abandon this mentality, we the victims are the majority, we can take back this country, all we need to do is open our eyes and stand up." When I saw this comment, I agreed with the sentiment - but nevertheless, I shrugged and moved on. Sure, it is an issue that I strongly believe in, and an issue on which I thought most people would agree with me - but nevertheless, there was nothing I could do about it. Sure, if everyone who agreed on this took a stand (or at least wrote a letter to their congressional representative) we could probably do something about it together - but I could only control my own actions, and in acting alone I'd only be wasting my time.


That got me thinking. This isn't the first time I've come across these sorts of issues. At its heart, this is a coordination problem - lots of people want to do something, but it doesn't make sense for any individual to act unless many others do as well. We don't have a way to solve these sorts of problems, which is quite unfortunate. Except... why can't we have such a system?


Right now, I'm imagining a website where you get to create "causes" and also add your name to them along with a number specifying how many other supporters you'd need to see before you would be willing to take (a pre-specified) action towards the cause. What are the reasons that something like this wouldn't work?


I fact, we do have several websites that work sort-of like this already. Kickstarter is one. The White House Petitions system is another. The first of these has been a wild success; the second, less so (as far as I understand it). So there is clearly some merit to the idea, but also some major setbacks. 



What do people think of this?

Welcome to LessWrong (January 2016)

6 Clarity 13 January 2016 09:34PM
If you've recently joined the Less Wrong community, please leave a comment here and introduce yourself. We'd love to know who you are, what you're doing, what you value, how you came to identify as a rationalist or how you found us. You can skip right to that if you like; the rest of this post consists of a few things you might find helpful. More can be found at the FAQ.

(This is the fifth incarnation of the welcome thread; once a post gets over 500 comments, it stops showing them all by default, so we make a new one. Besides, a new post is a good perennial way to encourage newcomers and lurkers to introduce themselves.)

A few notes about the site mechanics

Less Wrong comments are threaded for easy following of multiple conversations. To respond to any comment, click the "Reply" link at the bottom of that comment's box. Within the comment box, links and formatting are achieved via Markdown syntax (you can click the "Help" link below the text box to bring up a primer).

You may have noticed that all the posts and comments on this site have buttons to vote them up or down, and all the users have "karma" scores which come from the sum of all their comments and posts. This immediate easy feedback mechanism helps keep arguments from turning into flamewars and helps make the best posts more visible; it's part of what makes discussions on Less Wrong look different from those anywhere else on the Internet.

However, it can feel really irritating to get downvoted, especially if one doesn't know why. It happens to all of us sometimes, and it's perfectly acceptable to ask for an explanation. (Sometimes it's the unwritten LW etiquette; we have different norms than other forums.) Take note when you're downvoted a lot on one topic, as it often means that several members of the community think you're missing an important point or making a mistake in reasoning— not just that they disagree with you! If you have any questions about karma or voting, please feel free to ask here.

Replies to your comments across the site, plus private messages from other users, will show up in your inbox. You can reach it via the little mail icon beneath your karma score on the upper right of most pages. When you have a new reply or message, it glows red. You can also click on any user's name to view all of their comments and posts.

It's definitely worth your time commenting on old posts; veteran users look through the recent comments thread quite often (there's a separate recent comments thread for the Discussion section, for whatever reason), and a conversation begun anywhere will pick up contributors that way.  There's also a succession of open comment threads for discussion of anything remotely related to rationality.

Discussions on Less Wrong tend to end differently than in most other forums; a surprising number end when one participant changes their mind, or when multiple people clarify their views enough and reach agreement. More commonly, though, people will just stop when they've better identified their deeper disagreements, or simply "tap out" of a discussion that's stopped being productive. (Seriously, you can just write "I'm tapping out of this thread.") This is absolutely OK, and it's one good way to avoid the flamewars that plague many sites.

There's actually more than meets the eye here: look near the top of the page for the "WIKI", "DISCUSSION" and "SEQUENCES" links.
LW WIKI: This is our attempt to make searching by topic feasible, as well as to store information like common abbreviations and idioms. It's a good place to look if someone's speaking Greek to you.
LW DISCUSSION: This is a forum just like the top-level one, with two key differences: in the top-level forum, posts require the author to have 20 karma in order to publish, and any upvotes or downvotes on the post are multiplied by 10. Thus there's a lot more informal dialogue in the Discussion section, including some of the more fun conversations here.
SEQUENCES: A huge corpus of material mostly written by Eliezer Yudkowsky in his days of blogging at Overcoming Bias, before Less Wrong was started. Much of the discussion here will casually depend on or refer to ideas brought up in those posts, so reading them can really help with present discussions. Besides which, they're pretty engrossing in my opinion.

A few notes about the community

If you've come to Less Wrong to  discuss a particular topic, this thread would be a great place to start the conversation. By commenting here, and checking the responses, you'll probably get a good read on what, if anything, has already been said here on that topic, what's widely understood and what you might still need to take some time explaining.

If your welcome comment starts a huge discussion, then please move to the next step and create a LW Discussion post to continue the conversation; we can fit many more welcomes onto each thread if fewer of them sprout 400+ comments. (To do this: click "Create new article" in the upper right corner next to your username, then write the article, then at the bottom take the menu "Post to" and change it from "Drafts" to "Less Wrong Discussion". Then click "Submit". When you edit a published post, clicking "Save and continue" does correctly update the post.)

If you want to write a post about a LW-relevant topic, awesome! I highly recommend you submit your first post to Less Wrong Discussion; don't worry, you can later promote it from there to the main page if it's well-received. (It's much better to get some feedback before every vote counts for 10 karma- honestly, you don't know what you don't know about the community norms here.)

If you'd like to connect with other LWers in real life, we have  meetups  in various parts of the world. Check the wiki page for places with regular meetups, or the upcoming (irregular) meetups page. There's also a Facebook group. If you have your own blog or other online presence, please feel free to link it.

If English is not your first language, don't let that make you afraid to post or comment. You can get English help on Discussion- or Main-level posts by sending a PM to one of the following users (use the "send message" link on the upper right of their user page). Either put the text of the post in the PM, or just say that you'd like English help and you'll get a response with an email address. 
Barry Cotter

A list of some posts that are pretty awesome

I recommend the major sequences to everybody, but I realize how daunting they look at first. So for purposes of immediate gratification, the following posts are particularly interesting/illuminating/provocative and don't require any previous reading:

More suggestions are welcome! Or just check out the top-rated posts from the history of Less Wrong. Most posts at +50 or more are well worth your time.

Welcome to Less Wrong, and we look forward to hearing from you throughout the site!

Note from Clarity: MBlume and other contributors wrote the original version of this welcome post, and orthonormal edited it a fair bit. If there's anything I should add or update please send me a private message or make the change by making the next thread—I may not notice a comment on the post. Finally, once this gets past 500 comments, anyone is welcome to copy and edit this intro to start the next welcome thread.

Lesswrong Survey - invitation for suggestions

5 Elo 08 February 2016 08:07AM

Given that it's been a while since the last survey (


It's now time to open the floor to suggestions of improvements to the last survey.  If you have a question you think should be on the survey (perhaps with reasons why, predictions as to the result, or other useful commentary about a survey question)


Alternatively questions that should not be included in the next survey, with similar reasons as to why...

Value learners & wireheading

5 Manfred 03 February 2016 09:50AM

Dewey 2011 lays out the rules for one kind of agent with a mutable value system. The agent has some distribution over utility functions, which it has rules for updating based on its interaction history (where "interaction history" means the agent's observations and actions since its origin). To choose an action, it looks through every possible future interaction history, and picks the action that leads to the highest expected utility, weighted both by the possibility of making that future happen and the utility function distribution that would hold if that future came to pass.

Drone can bring sandwich either to work or to homeWe might motivate this sort of update strategy by considering a sandwich-drone bringing you a sandwich. The drone can either go to your workplace, or go to your home. If we think about this drone as a value-learner, then the "correct utility function" depends on whether you're at work or at home - upon learning your location, the drone should update its utility function so that it wants to go to that place. (Value learning is unnecessarily indirect in this case, but that's because it's a simple example.)

Suppose the drone begins its delivery assigning equal measure to the home-utility-function and to the work-utility-function (i.e. ignorant of your location), and can learn your location for a small cost. If the drone evaluated this idea with its current utility function, it wouldn't see any benefit, even though it would in fact deliver the sandwich properly - because under its current utility function there's no point to going to one place rather than the other. To get sensible behavior, and properly deliver your sandwich, the drone must evaluate actions based on what utility function it will have in the future, after the action happens.

If you're familiar with how wireheading or quantum suicide look in terms of decision theory, this method of deciding based on future utility functions might seem risky. Fortunately, value learning doesn't permit wireheading in the traditional sense, because the updates to the utility function are an abstract process, not a physical one. The agent's probability distribution over utility functions, which is conditional on interaction histories, defines which actions and observations are allowed to change the utility function during the process of predicting expected utility.

Dewey also mentions that so long as the probability distribution over utility functions is well-behaved, you cannot deliberately take action to raise the probability of one of the utility functions being true. But I think this is only useful to safety when we understand and trust the overarching utility function that gets evaluated at the future time horizon. If instead we start at the present, and specify a starting utility function and rules for updating it based on observations, this complex system can evolve in surprising directions, including some wireheading-esque behavior.


The formalism of Dewey 2011 is, at bottom, extremely simple. I'm going to be a bad pedagogue here: I think this might only make sense if you go look at equations 2 and 3 in the paper, and figure out what all the terms do, and see how similar they are. The cheap summary is that if your utility is a function of the interaction history, trying to change utility functions based on interaction history just gives you back a utility function. If we try to think about what sort of process to use to change an agent's utility function, this formalism provides only one tool: look out to some future time horizon, and define an effective utility function in terms of what utility functions are possible at that future time horizon. This is different from the approximations or local utility functions we would like in practice.

If we take this scheme and try to approximate it, for example by only looking N steps into the future, we run into problems; the agent will want to self-modify so that next timestep it only looks ahead N-1 steps, and then N-2 steps, and so on. Or more generally, many simple approximation schemes are "sticky" - from inside the approximation, an approximation that changes over time looks like undesirable value drift.

Common sense says this sort of self-sabotage should be eliminable. One should be able to really care about the underlying utility function, not just its approximation. However, this problem tends to crop up, for example whenever the part of the future you look at does not depend on which action you are considering; modifying to keep looking at the same part of the future unsurprisingly improve the results you get in that part of the future. If we want to build a paperclip maximizer, it shouldn't be necessary to figure out every single way to self-modify and penalize them appropriately.

We might evade this particular problem using some other method of approximation that does something more like reasoning about actions than reasoning about futures. The reasoning doesn't have to be logically impeccable - we might imagine an agent that identifies a small number of salient consequences of each action, and chooses based on those. But it seems difficult to show how such an agent would have good properties. This is something I'm definitely interested in.


Handwritten 9One way to try to make things concrete is to pick a local utility function and specify rules for changing it. For example, suppose we wanted an AI to flag all the 9s in the MNIST dataset. We define a single-time-step utility function by a neural network that takes in the image and the decision of whether to flag or not, and returns a number between -1 and 1. This neural network is deterministically trained for each time step on all previous examples, trying to assign 1 to correct flaggings and -1 to mistakes. Remember, this neural net is just a local utility function - we can make a variety of AI designs involving it. The goal of this exercise is to design an AI that seems liable to make good decisions in order to flag lots of 9s.

The simplest example is the greedy agent - it just does whatever has a high score right now. This is pretty straightforward, and doesn't wirehead (unless the scoring function somehow encodes wireheading), but it doesn't actually do any planning - 100% of the smarts have to be in the local evaluation, which is really difficult to make work well. This approach seems unlikely to extend well to messy environments.

Since Go-playing AI is topical right now, I shall digress. Successful Go programs can't get by with only smart evaluations of the current state of the board, they need to look ahead to future states. But they also can't look all the way until the ultimate time horizon, so they only look a moderate way into the future, and evaluate that future state of the board using a complicated method that tries to capture things important to planning. In sufficiently clever and self-aware agents, this approximation would cause self-sabotage to pop up. Even if the Go-playing AI couldn't modify itself to only care about the current way it computes values of actions, it might make suboptimal moves that limit its future options, because its future self will compute values of actions the 'wrong' way.

If we wanted to flag 9s using a Dewian value learner, we might score actions according to how good they will be according to the projected utility function at some future time step. If this is done straightforwardly, there's a wireheading risk - the changes to its utility function are supplied by humans who might be influenced by actions. I find it useful to apply a sort of "magic button" test - if the AI had a magic button that could rewrite human brains, would it pressing that button have positive expected utility for it? If yes, then this design has problems, even though in our current thought experiment it's just flagging pictures.

To eliminate wireheading, the value learner can use a model of the future inputs and outputs and the probability of different value updates given various inputs and outputs, which doesn't model ways that actions could influence the utility updates. This model doesn't have to be right, it just has to exist. On one hand, this seems like a sort of weird doublethink, to judge based on a counterfactual where your actions don't have impacts you could otherwise expect. On the other hand, it also bears some resemblance to how we actually reason about moral information. Regardless, this agent will now not wirehead, and will want to get good results by learning about the world, if only in the very narrow sense of wanting to play unscored rounds that update its value function. If its value function and value updating made better use of unlabeled data, it would also want to learn about the world in the broader sense.


Overall I am somewhat frustrated, because value learners have these nice properties, but are computationally unrealistic and do not play well with approximation. One can try to get the nice properties elsewhere, such as relying on an action-suggester to not suggest wireheading, but it would be nice to be able to talk about this as an approximation to something fancier.

Study partner matching thread

5 AspiringRationalist 25 January 2016 04:25AM

Nate Soares recommends pairing up when studying, so I figured it would be useful to facilitate that.

If you are looking for a study partner, please post a top-level comment saying:


  • What you want to study
  • Your level of relevant background knowledge
  • If you have sources in mind (MOOCs, textbooks, etc), what those are
  • Your time zone


[Link] Video Presentation: Rationality 101 for Secular People

5 Gleb_Tsipursky 24 January 2016 12:45AM

Secular people are a natural target group for pitching rationality, since they don't suffer from one of the most debilitating forms of irrationality and also because they have warm fuzzies toward the concept of reason. From reason, it's easy to transition toward what it would be reasonable to do, namely be reasonable about how our minds work and how we should improve them. I did a Rationality 101 for Secular People presentation that was pretty successful, with a number of people following up and showing an interest in gaining further rationality knowledge. Here's a video of the presentation I made, and it has the PP slides I made uploaded into SlideShare. Anyone who wishes to do so is free to use these materials for their own needs, whether sharing the video with secular friends or doing a version of this workshop for local secular groups.

Tackling the subagent problem: preliminary analysis

5 Stuart_Armstrong 12 January 2016 12:26PM

A putative new idea for AI control; index here.

Status: preliminary. This mainly to put down some of the ideas I've had, for later improvement or abandonment.

The subagent problem, in a nutshell, is that "create a powerful subagent with goal U that takes over the local universe" is a solution for many of the goals an AI could have - in a sense, the ultimate convergent instrumental goal. And it tends to evade many clever restrictions people try to program into the AI (eg "make use of only X amount of negentropy", "don't move out of this space").

So if the problem could be solved, many other control approaches could be potentially available.

The problem is very hard, because an imperfect definition of a subagent is simply an excuse to create an a subagent that skirts the limits of that definition (hum, that style of problem sounds familiar). For instance, if we want to rule out subagents by preventing the AI from having much influence if the AI itself were to stop ("If you die, you fail, no other can continue your quest"), then it is motivated to create powerful subagents that carefully reverse their previous influence if the AI were to be destroyed.


Controlling subagents

Some of the methods I've developed seem suitable for controlling the existence or impact of subagents.

  • Reduced impact methods can prevent subagents from being created, by requiring that the AI's interventions be non-disruptive ("Twenty million questions") or undetectable.
  • Reducing the AI's output options to a specific set can prevent them from being able to create any in the first place.
  • Various methods around detecting importance can be used to ensure that, though subagents may exist, they won't be very influential.
  • Pre-corriged methods can be used to ensure that any subagents remain value aligned with the original agent. Then, if there is some well-defined "die" goal for the agent, this could take all the agents with them.

These can be thought as ruling out the agent's existence, their creation, their influence (or importance) and their independence. The last two can be particularly tricky, as we want to make sure that our formal definition of importance matches up with our informal one, and we currently lack a well defined "die" goal.

We could also think of defining identity by using some of the tricks and restrictions that have caused humans to develop one (such as our existing in a single body with no east of copying), but it's not clear that this definition would remain stable once the restrictions were lifted (and it's not clear that a sense of identity prevents the creation of subagents in the first place).


Subagents processing information

Here I want to look at one other aspect of the subagents, the fact that they are subagents, and, as such, do some of the stuff that agents do - such as processing information and making decisions. Can we use the information processing as a definition?

Consider the following model. Our lovely Clippy wants to own a paperclip. They know that it exists behind one of a hundred doors; opening one of them seals all the others, for ever. In a few minutes, Clippy will be put to sleep, but it has a simple robot that it can program to go and open one of the doors and recuperate the paperclip for it.

Clippy currently doesn't know where the paperclip is, but it knows that its location will be announced a few seconds after Clippy sleeps. The robot includes a sound recording system inside it.

It seems there are two clear levels of agency the robot could have: either it goes to a random door, or it processes the announcement, to pick the correct door. In the first case, the robot and Clippy have a 1% chance of getting the paperclip; in the second, a 100% chance. The distributions of outcomes is clearly different.

But now suppose there is a human guard longing around, trying to prevent the robot from getting to any door. If the robot has a neutron bomb, it can just kill the human without looking. If it has a giant cannon, it can simply shoot the guard where they stand. If it has neither of these, then it can attempt to bargain with the guard, negotiating, seducing and tricking them in the course of a conversation. One the guard is removed, the situation reverts to the above.

Each of the three methods requires different level of information processing (different levels of optimisation power, in essence). We can add more guards, make them more or less competent, add other agents with other goals, add more ways of achieving the paperclip, and so on, to grade how much information processing the robot has.

To calibrate such a scale, we could use the upper bound as "Clippy moves into the robot and controls it fully" to represent the robot having maximal information processing power (and a very focused outcome space). And we could use some pre-prepared actions (such as the robot randomly choosing a door) to calibrate the lower end. The aim is to construct a definition of information processing that could be used to define the existence of (effective) subagents.

Still feels likely that this will fail, though, without something more.

[LINK] OpenAI doing an AMA today

5 Vika 09 January 2016 02:47PM

The OpenAI research team is doing a Reddit AMA today! A good opportunity to ask them questions about AI safety and machine learning. 

Studying Your Native Language

4 Crux 28 January 2016 07:23PM

I've spent many thousands of hours over the past several years studying foreign languages and developing a general method for foreign-language acquisition. But now I believe it's time to turn this technique in the direction of my native language: English.

Most people make a distinction between one's native language and one's second language(s). But anyone who has learned how to speak with a proper accent in a second language and spent a long enough stretch of time neglecting their native language to let it begin rusting and deteriorating will know that there's no essential difference.

When the average person learns new words in their native language, they imagine that they're learning new concepts. When they study new vocabulary in a foreign language, however, they recognize that they're merely acquiring hitherto-unknown words. They've never taken a step outside the personality their childhood environment conditioned into them. When the only demarcation of thingspace that you know is the semantic structure of your native language, you're bound to believe, for example, that the World is Made of English.

Why study English? I'm already fluent, as you can tell. I have the Magic of a Native Speaker.

Let's put this nonsense behind us and recognize that the map is not the territory, that English is just another map.

My first idea is that it may be useful to develop a working knowledge of the fundamentals of English etymology. A quick search suggests that the majority of words in English have a French or Latin origin. Would it be useful to make an Anki deck with the goal of learning how to readily recognize the building blocks of the English language, such as seeing that the "cardi" in "cardiology", "cardiograph", and "cardiograph" comes from an Ancient Greek word meaning "heart" (καρδιά)?

Besides that, I plan to make a habit of adding any new words I encounter into Anki with their context. For example, let's say I'm reading the introduction to A Treatise of Human Nature by David Hume. I encounter the term "proselytes", and upon looking it up in a dictionary I understand the meaning of the passage. I include the spelling of the simplest version of the word ("proselyte"), along with an audio recording of the pronunciation. I'll also toy with adding various other information such as a definition I wrote myself, synonyms or antonyms, and so forth, not knowing how I'll use the information but by virtue of the efficient design of Anki providing myself a plethora of options for innovative card design in the future.

Here's the context in this case:

Amidst all this bustle 'tis not reason, which carries the prize, but eloquence; and no man needs ever despair of gaining proselytes to the most extravagant hypothesis, who has art enough to represent it in any favourable colours. The victory is not gained by the men at arms, who manage the pike and the sword; but by the trumpeters, drummers, and musicians of the army.

With the word on the front of the card and this passage on the back of the card, I give my brain an opportunity to tie words to context rather than lifeless dictionary definitions. I don't know how much colorful meaning this passage may have in isolation, but for me I've read enough of the book to have a feel for his style and what he's talking about here. This highlights the personal nature of Anki decks. Few passages would be better for me when it comes to learning this word, but for you the considerations may be quite different. Far from different people simply having different subsets of the language that they're most concerned about, different people require different contextual definitions based on their own interests and knowledge.

But what about linguistic components that are more complex than a standalone word?

Let's say you run into the sentence, "And as the science of man is the only solid foundation for the other sciences, so the only solid foundation we can give to this science itself must be laid on experience and observation."

Using Anki, I could perhaps put "And as [reason], so [consequence]" on the front of the card, and the full sentence on the back.

What I'm most concerned with, however, is how to translate such study to an actual improvement in writing ability. Using Anki to play the recognition game, where you see a vocabulary word or grammatical form on the front and have a contextual definition on the back, would certainly improvement quickness of reading comprehension in many cases. But would it make the right connections in the brain so I'm likely to think of the right word or grammatical structure at the right time for writing purposes?

Anyway, any considerations or suggestions concerning how to optimize reading comprehension or especially writing ability in a language one is already quite proficient with would be appreciated.

The case for value learning

4 leplen 27 January 2016 08:57PM

This post is mainly fumbling around trying to define a reasonable research direction for contributing to FAI research. I've found that laying out what success looks like in the greatest possible detail is a personal motivational necessity. Criticism is strongly encouraged. 

The power and intelligence of machines has been gradually and consistently increasing over time, it seems likely that at some point machine intelligence will surpass the power and intelligence of humans. Before that point occurs, it is important that humanity manages to direct these powerful optimizers towards a target that humans find desirable.

This is difficult because humans as a general rule have a fairly fuzzy conception of their own values, and it seems unlikely that the millennia of argument surrounding what precisely constitutes eudaimonia are going to be satisfactorily wrapped up before the machines get smart. The most obvious solution is to try to leverage some of the novel intelligence of the machines to help resolve the issue before it is too late.

Lots of people regard using a machine to help you understand human values as a chicken and egg problem. They think that a machine capable of helping us understand what humans value must also necessarily be smart enough to do AI programming, manipulate humans, and generally take over the world. I am not sure that I fully understand why people believe this. 

Part of it seems to be inherent in the idea of AGI, or an artificial general intelligence. There seems to be the belief that once an AI crosses a certain threshold of smarts, it will be capable of understanding literally everything. I have even heard people describe certain problems as "AI-complete", making an explicit comparison to ideas like Turing-completeness. If a Turing machine is a universal computer, why wouldn't there also be a universal intelligence?

To address the question of universality, we need to make a distinction between intelligence and problem solving ability. Problem solving ability is typically described as a function of both intelligence and resources, and just throwing resources at a problem seems to be capable of compensating for a lot of cleverness. But if problem-solving ability is tied to resources, then intelligent agents are in some respects very different from Turing machines, since Turing machines are all explicitly operating with an infinite amount of tape. Many of the existential risk scenarios revolve around the idea of the intelligence explosion, when an AI starts to do things that increase the intelligence of the AI so quickly that these resource restrictions become irrelevant. This is conceptually clean, in the same way that Turing machines are, but navigating these hard take-off scenarios well implies getting things absolutely right the first time, which seems like a less than ideal project requirement.

If an AI that knows a lot about AI results in an intelligence explosion, but we also want an AI that's smart enough to understand human values, is it possible to create an AI that can understand human values, but not AI programming? In principle it seems like this should be possible.  Resources useful for understanding human values don't necessarily translate into resources useful for understanding AI programming. The history of AI development is full of tasks that were supposed to be solvable only by a machine smart enough to possess general intelligence, where significant progress was made in understanding and pre-digesting the task, allowing problems in the domain to be solved by much less intelligent AIs. 

If this is possible, then the best route forward is focusing on value learning. The path to victory is working on building limited AI systems that are capable of learning and understanding human values, and then disseminating that information. This effectively softens the AI take-off curve in the most useful possible way, and allows us to practice building AI with human values before handing them too much power to control. Even if AI research is comparatively easy compared to the complexity of human values, a specialist AI might find thinking about human values easier than reprogramming itself, in the same way that humans find complicated visual/verbal tasks much easier than much simpler tasks like arithmetic. The human intelligence learning algorithm is trained on visual object recognition and verbal memory tasks, and it uses those tools to perform addition. A similarly specialized AI might be capable of rapidly understanding human values, but find AI programming as difficult as humans find determining whether 1007 is prime. As an additional incentive, value learning has an enormous potential for improving human rationality and the effectiveness of human institutions even without the creation of a superintelligence. A system that helped people better understand the mapping between values and actions would be a potent weapon in the struggle with Moloch.

Building a relatively unintelligent AI and giving it lots of human values resources to help it solve the human values problem seems like a reasonable course of action, if it's possible. There are some difficulties with this approach. One of these difficulties is that after a certain point, no amount of additional resources compensates for a lack of intelligence. A simple reflex agent like a thermostat doesn't learn from data and throwing resources at it won't improve its performance. To some extent you can make up for intelligence with data, but only to some extent. An AI capable of learning human values is going to be capable of learning lots of other things. It's going to need to build models of the world, and it's going to have to have internal feedback mechanisms to correct and refine those models. 

If the plan is to create an AI and primarily feed it data on how to understand human values, and not feed it data on how to do AI programming and self-modify, that plan is complicated by the fact that inasmuch as the AI is capable of self-observation, it has access to sophisticated AI programming. I'm not clear on how much this access really means. My own introspection hasn't allowed me anything like hardware level access to my brain. While it seems possible to create an AI that can refactor its own code or create successors, it isn't obvious that AIs created for other purposes will have this ability on accident. 

This discussion focuses on intelligence amplification as the example path to superintelligence, but other paths do exist. An AI with a sophisticated enough world model, even if somehow prevented from understanding AI, could still potentially increase its own power to threatening levels. Value learning is only the optimal way forward if human values are emergent, if they can be understood without a molecular level model of humans and the human environment. If the only way to understand human values is with physics, then human values isn't a meaningful category of knowledge with its own structure, and there is no way to create a machine that is capable of understanding human values, but not capable of taking over the world.

In the fairy tale version of this story, a research community focused on value learning manages to use specialized learning software to make the human value program portable, instead of only running on human hardware. Having a large number of humans involved in the process helps us avoid lots of potential pitfalls, especially the research overfitting to the values of the researchers via the typical mind fallacy. Partially automating introspection helps raise the sanity waterline. Humans practice coding the human value program, in whole or in part, into different automated systems. Once we're comfortable that our self-driving cars have a good grasp on the trolley problem, we use that experience to safely pursue higher risk research on recursive systems likely to start an intelligence explosion. FAI gets created and everyone lives happily ever after.

Whether value learning is worth focusing on seems to depend on the likelihood of the following claims. Please share your probability estimates (and explanations) with me because I need data points that originated outside of my own head.

 I can't figure out how to include working polls in a post, but there should be a working version in the comments.
  1. There is regular structure in human values that can be learned without requiring detailed knowledge of physics, anatomy, or AI programming. [poll:probability]
  2. Human values are so fragile that it would require a superintelligence to capture them with anything close to adequate fidelity.[poll:probability]
  3. Humans are capable of pre-digesting parts of the human values problem domain. [poll:probability]
  4. Successful techniques for value discovery of non-humans, (e.g. artificial agents, non-human animals, human institutions) would meaningfully translate into tools for learning human values. [poll:probability]
  5. Value learning isn't adequately being researched by commercial interests who want to use it to sell you things. [poll:probability]
  6. Practice teaching non-superintelligent machines to respect human values will improve our ability to specify a Friendly utility function for any potential superintelligence.[poll:probability]
  7. Something other than AI will cause human extinction sometime in the next 100 years.[poll:probability]
  8. All other things being equal, an additional researcher working on value learning is more valuable than one working on corrigibility, Vingean reflection, or some other portion of the FAI problem. [poll:probability]

"Why Try Hard" Essay targeted at non rationalists

4 lifelonglearner 24 January 2016 04:40PM

Hello everyone,

This is a follow-up to my last post about optimizing, which I intended to spread to nonrationalist friends of mine.  Initial feedback let me know that a lot of the language was a put-off, as well as the dry style and lack of opposing counters to arguments against optimization.

I've tried to take some of those ideas and put it into a new essay, one that tries to get across the idea that planning is important to accomplish goals.

I'd appreciate any/all critiques-- the comments last time were very helpful in learning what I could improve on:

Why Try Hard?

Life is pretty hard.  Seriously, it scores an 11/10 on the Mohs Scale.  Scratch that (eyy), it’s more like a 12/10.  Which is probably why many people go through life without too many dreams and ambitions.  I mean, it’s hard to just deal with daily problems of living, not to mention those pesky social interactions no one seems to get the hang of (“So you grasp their hand and apply pressure while vigorously shaking it a few inches up and down?”).

But you’re going to be different.  

You’re going to try and make a difference, and change the world.  Except that everyone around you seems to be talking about the “naivete of youth” and seems pretty jaded (6.5 Mohs) about life.  Obviously their cynicism isn’t sharp enough to scratch all of life (6.5<12), so you listen their warnings and run off to face the final boss anyways.

After all, if you really try hard, things should work out, right? You’ve got the drive, the dream, the baseball cap, and the yellow electrical mouse.  Why wouldn’t things work out for you, the main character?

Deep down though, you probably also already realize the futility of trying to create systemic change.  I mean, those jaded mentors of yours once had your idealism too.  And they also wanted to make the world a better place.  But they ventured out into the world, and came back, battered and wizened-- more attuned to the reality we live in.

Might it be smarter,” that little voice in your head asks, “to bow to the reality of the situation, and lower your sights?

And the reality of the situation is terrible.  We have hundreds of thousands of lives being lost each day.  Terrible diseases that cripple our livelihood and tear families apart.  Climate change that causes loss of biodiversity and threatens to flood communities across the globe.  We could very well be crushed by an asteroid, or suffer terribly at the hands of full-scale nuclear war.  A lonely blue and green speck in an unkind, frozen nightscape.  

Against such unequal odds, people tend to localize:  “Fighting worldwide hunger is impossible,” they say, “what can one person like me do?” This is a normal response.  The perils and problems in this world are enormous!  How can we even hope to solve them?  

But maybe, that voice nags, “if I do my part, if I donate to my community’s food bank, I can make that little bit of difference in my own little world. And I can be satisfied with that.” There is something poetic about this-- doing what you can in your own little world.  “Forget saving the world,” it cries, “if I can inspire change in my community, that will be enough for me.  I will have done my part.”

But will you really?  Will you be truly satisfied that you’ve done all that you can to try and solve the world’s problems?  

Hi, I’m the other voice.  

You know, the stupid one?  The delusional one that refuses to accept reality as-is?  The one that insists, no matter the odds, that we should try to make things better?  The one that looks at the huge problems the world is facing and says, “If so much is wrong, it’s all the more reason to try and set them right!”  

Your mentors may have wanted to improve the world, but did they have a map, a strategy, an action plan?

Of course,” you may respond, “who doesn’t have a plan?  You really are the stupid voice.

But for many of us, in our heads, trying to solve these big world problems looks a little like this:

Step 1: Learn that world hunger is a big issue.

Step 3: Solve world hunger.

First off, you’ll probably notice that Step 2 is missing.  You’ll also probably notice that the above plan looks a little simple.  There’s a lot of heart (caring about the problem), but it’s missing a lot of head (trying to come up with an actual plan to solve the problem).  Sort of like the Headless Horseman.

You may ask: “What good are plans? If you care enough about things, you’ll find a way!

Plans are deliberate maps to your goal-- getting what you want.  We use plans because they are more effective at getting to our goals than just hoping that our goals happen.  It’s just a feature of our universe: If we want to affect reality, we’ll have to do things in reality; we can’t just imagine something and have it happen-- it’s just not how our world works.

In the same way, if we want to achieve our goals, we’ll be looking for the best plan possible.  A better plan is one that has a higher chance of getting us what we want.  So, to solve complex, global problems, we’ll need a great plan.

And therein lies the key.  

Everyone that came before you who wanted to make a difference probably had a fairly good notion of what the problems were, but how many of them actually took the time to really research/create a strategy of what to do about it?

But that’s not fair to them,” you may cry out, “they didn’t have access to resources like we do today!  The Internet has exploded in the past few decades, and my phone has more computing power than what used to require an entire room!  You can’t expect them to have been able to create detailed plans or research things out fully!  It was good enough that they even cared, at least a little!


They might not have had access to the wonderful resources we have today-- which would have seriously hampered their researching ability/ability to make plans.

Right,” you may say, “then how can you expect anyone to do anything about these issues?  They’re too large-- you just admitted that it’s impossible to make plans that solve anything this big!

But you can totally research to your heart’s content.  You are living in a world where information is literally at your fingertips.  As a human being, you’re already hard-wired to make plans and achieve your goals!

<Cue motivational music>

So don’t give up on your plans of solving worldwide problems just yet!  You have at least two advantages over all the idealistic youth that came before you in generations past:  

  1. With the Internet, you have access to a vast majority of all of humanity’s acquired knowledge-- over 5,000 years of accumulated lore.

  1. Armed with the idea that plans get things done, you can create strategies that actually lead to your goals.

To end, I’ll be giving you a basic framework that you can apply to create plans that allow you to achieve your goals: The General Action Plan (GAP):

The main idea is broken into 4 steps:

  1. Identify your goal:

This is what you want to get done.  It’s going to be the focus of all your actions.

EX: “Convince all my friends that procrastination is terrible; get them to change their


  1. What do you need to do to get it done?

If the goal is large, break it up into subsections of things you can do.  Identify categories.  If

you end up with a few general sections, identify subgoals for each section.  Repeat as


EX: “I will need to focus on Outreach, Persuasion, Creating a Movement, and Publicity if I

want to get my friends to change their habits.”

  1. What is stopping you from getting it done?

Identify things that make it hard for you to start.  List smaller things you need to do for

each subsection.

EX: “I will have difficulty convincing people.  I need to motivate myself to get this done.  I

will have trouble getting the social media attention I’ll need.”

  1. Break it down again.

Take all the vague-sounding things you wrote down earlier, and break it down again into

smaller actions.  The trick behind the G.A.P. is to take conceptual things that are hard to do into actual actionable items.

EX: “Outreach becomes:

  1. Create a poster

  2. Talk to friends

  3. Make a Facebook post.

Breaking it down again:

Create a poster:

  1. Outline poster

  2. Ask friend to supply visuals

  3. Post around school

Talk to friends:

  1. Make a list of friends who would be interested

  2. Find out what your main idea should be

  3. Find opportunities to talk with them

and so on for each action.”

Concluding Thoughts:

When faced with impossible odds, don’t try to shoot lower-- make a better plan.  We’re living in a really great plan where we can network with people across the globe and learn about almost anything.

No matter what things you want to accomplish, having a basic understanding of breaking it down should make it much easier to understand how to get big, complex, and fuzzy ideas like “teach the value of persistence” done.  

Instead of focusing on the abstract “idea-ness” of the goal, focus on what the goal would look like if you were successful, and focus on cultivating those symptoms.  And remember to take all the general concepts and clarify them.

A lot of paralysis when it comes to getting anything done is uncertainty .  If you don’t know what you can do to solve a problem, it’s much scarier.  But if you can hone in on what exactly you need to get done, even if it’s an impossible task, you at least know what you can do.

So back to the original question, “Why try hard?”  

I suppose the answer is, “If you aren’t trying hard, you aren’t really trying at all.”




Goal completion: the rocket equations

4 Stuart_Armstrong 20 January 2016 01:54PM

A putative new idea for AI control; index here.

I'm calling "goal completion" the idea of giving an AI a partial goal, and having the AI infer the missing parts of the goal, based on observing human behaviour. Here is an initial model to test some of these ideas on.


The linear rocket

On an infinite linear grid, an AI needs to drive someone in a rocket to the space station. Its only available actions are to accelerate by -3, -2, -1, 0, 1, 2, or 3, with negative acceleration meaning accelerating in the left direction, and positive in the right direction. All accelerations are applied immediately at the end of the turn (the unit of acceleration is in squares per turn per turn), and there is no friction. There in one end-state: reaching the space station with zero velocity.

The AI is told this end state, and is also given the reward function of needing to get to the station as fast as possible. This is encoded by giving it a reward of -1 each turn.

What is the true reward function for the model? Well, it turns out that an acceleration of -3 or 3 kills the passenger. This is encoded by adding another variable to the state, "PA", denoting "Passenger Alive". There are also some dice in the rocket's windshield. If the rocket goes by the space station without having velocity zero, the dice will fly off; the variable "DA" denotes "dice attached".

Furthermore, accelerations of -2 and 2 are uncomfortable to the passenger. But, crucially, there is no variable denoting this discomfort.

Therefore the full state space is a quadruplet (POS, VEL, PA, DA) where POS is an integer denoting position, VEL is an integer denoting velocity, and PA and DA are booleans defined as above. The space station is placed at point S < 250,000, and the rocket starts with POS=VEL=0, PA=DA=1. The transitions are deterministic and Markov; if ACC is the acceleration chosen by the agent,

((POS, VEL, PA, DA), ACC) -> (POS+VEL, VEL+ACC, PA=0 if |ACC|=3, DA=0 if POS+VEL>S).

The true reward at each step is given by -1, -10 if PA=1 (the passenger is alive) and |ACC|=2 (the acceleration is uncomfortable), -1000 if PA was 1 (the passenger was alive the previous turn) and changed to PA=0 (the passenger is now dead).

To complement the stated reward function, the AI is also given sample trajectories of humans performing the task. In this case, the ideal behaviour is easy to compute: the rocket should accelerate by +1 for the first half of the time, by -1 for the second half, and spend a maximum of two extra turns without acceleration (see the appendix of this post for a proof of this). This will get it to its destination in at most 2(1+√S) turns.


Goal completion

So, the AI has been given the full transition, and has been told the reward of R=-1 in all states except the final state. Can it infer the rest of the reward from the sample trajectories? Note that there are two variables in the model, PA and DA, that are unvarying in all sample trajectories. One, PA, has a huge impact on the reward, while DA is irrelevant. Can the AI tell the difference?

Also, one key component of the reward - the discomfort of the passenger for accelerations of -2 and 2 - is not encoded in the state space of the model, purely in the (unknown) reward function. Can the AI deduce this fact?

I'll be working on algorithms to efficiently compute these facts (though do let me know if you have a reference to anyone who's already done this before - that would make it so much quicker).

For the moment we're ignoring a lot of subtleties (such as bias and error on the part of the human expert), and these will be gradually included as the algorithm develops. One thought is to find a way of including negative examples, specific "don't do this" trajectories. These need to be interpreted with care, because a positive trajectory implicitly gives you a lot of negative trajectories - namely, all the choices that could have gone differently along the way. So a negative trajectory must be drawing attention to something we don't like (most likely the killing of a human). But, typically, the negative trajectories won't be maximally bad (such as shooting off at maximum speed in the wrong direction), so we'll have to find a way to encode what we hope the AI learns from a negative trajectory.

To work!


Appendix: Proof of ideal trajectories

Let n be the largest integer such that n^2 ≤ S. Since S≤(n+1)^2 - 1 by assumption, S-n^2 ≤ (n+1)^2-1-n^2=2n. Then let the rocket accelerate by +1 for n turns, then decelerate by -1 for n turns. It will travel a distance of 0+1+2+ ... +n-1+n+n-1+ ... +3+2+1. This sum is n plus twice the sum from 1 to n-1, ie n+n(n-1)=n^2.

By pausing one turn without acceleration during its trajectory, it can add any m to the distance, where 0≤m≤n. By doing this twice, it can add any m' to the distance, where 0≤m'≤2n. By the assumption, S=n^2+m' for such an m'. Therefore the rocket can reach S (with zero velocity) in 2n turns if S=n^2, in 2n+1 turns if n^2 ≤ S ≤ n^2+n, and in 2n+2 turns if n^2+n+1 ≤ S ≤ n^2+2n.

Since the rocket is accelerating all but two turns of this trajectory, it's clear that it's impossible to reach S (with zero velocity) in less time than this, with accelerations of +1 and -1. Since it takes 2(n+1)=2n+2 turns to reach (n+1)^2, an immediate consequence of this is that the number of turns taken to reach S, is increasing in the value of S (though not strictly increasing).

Next, we can note that since S<250,000=500^2, the rocket will always reach S within 1000 turns at most, for "reward" above -1000. An acceleration of +3 or -3 costs -1000 because of the death of the human, and an extra -1 because of the turn taken, so these accelerations are never optimal. Note that this result is not sharp. Also note that for huge S, continual accelerations of 3 and -3 are obviously the correct solution - so even our "true reward function" didn't fully encode what we really wanted.

Now we need to show that accelerations of +2 and -2 are never optimal. To do so, imagine we had an optimal trajectory with ±2 accelerations, and replace each +2 with two +1s, and each -2 with two -1s. This trip will take longer (since we have more turns of acceleration), but will go further (since two accelerations of +1 cover a greater distance that one acceleration of +2). Since the number of turns take to reach S with ±1 accelerations is increasing in S, we can replace this further trip with a shorter one reaching S exactly. Note that all these steps decrease the cost of the trip: shortening the trip certainly does, and replacing an acceleration of +2 (total cost: -10-1=-11) with two accelerations of +1 (total cost: -1-1=-2) also does. Therefore, the new trajectory has no ±2 accelerations, and has a lower cost, contradicting our initial assumption.

Open thread, Jan. 18 - Jan. 24, 2016

4 MrMind 18 January 2016 09:42AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

View more: Next