Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Don't estimate your creative intelligence by your critical intelligence

37 PhilGoetz 05 February 2015 02:41AM

When I criticize, I'm a genius. I can go through a book of highly-referenced scientific articles and find errors in each of them. Boy, I feel smart. How are these famous people so dumb?

But when I write, I suddenly become stupid. I sometimes spend half a day writing something and then realize at the end, or worse, after posting, that what it says simplifies to something trivial, or that I've made several unsupported assumptions, or claimed things I didn't really know were true. Or I post something, then have to go back every ten minutes to fix some point that I realize is not quite right, sometimes to the point where the whole thing falls apart.

If someone writes an article or expresses an idea that you find mistakes in, that doesn't make you smarter than that person. If you create an equally-ambitious article or idea that no one else finds mistakes in, then you can start congratulating yourself.

Easy wins aren't news

36 PhilGoetz 19 February 2015 07:38PM

Recently I talked with a guy from Grant Street Group. They make, among other things, software with which local governments can auction their bonds on the Internet.

By making the auction process more transparent and easier to participate in, they enable local governments which need to sell bonds (to build a high school, for instance), to sell those bonds at, say, 7% interest instead of 8%. (At least, that's what he said.)

They have similar software for auctioning liens on property taxes, which also helps local governments raise more money by bringing more buyers to each auction, and probably helps the buyers reduce their risks by giving them more information.

This is a big deal. I think it's potentially more important than any budget argument that's been on the front pages since the 1960s. Yet I only heard of it by chance.

People would rather argue about reducing the budget by eliminating waste, or cutting subsidies to people who don't deserve it, or changing our ideological priorities. Nobody wants to talk about auction mechanics. But fixing the auction mechanics is the easy win. It's so easy that nobody's interested in it. It doesn't buy us fuzzies or let us signal our affiliations. To an individual activist, it's hardly worth doing.

The Galileo affair: who was on the side of rationality?

34 Val 15 February 2015 08:52PM


A recent survey showed that the LessWrong discussion forums mostly attract readers who are predominantly either atheists or agnostics, and who lean towards the left or far left in politics. As one of the main goals of LessWrong is overcoming bias, I would like to come up with a topic which I think has a high probability of challenging some biases held by at least some members of the community. It's easy to fight against biases when the biases belong to your opponents, but much harder when you yourself might be the one with biases. It's also easy to cherry-pick arguments which prove your beliefs and ignore those which would disprove them. It's also common in such discussions, that the side calling itself rationalist makes exactly the same mistakes they accuse their opponents of doing. Far too often have I seen people (sometimes even Yudkowsky himself) who are very good rationalists but can quickly become irrational and use several fallacies when arguing about history or religion. This most commonly manifests when we take the dumbest and most fundamentalist young Earth creationists as an example, winning easily against them, then claiming that we disproved all arguments ever made by any theist. No, this article will not be about whether God exists or not, or whether any real world religion is fundamentally right or wrong. I strongly discourage any discussion about these two topics.

This article has two main purposes:

1. To show an interesting example where the scientific method can lead to wrong conclusions

2. To overcome a certain specific bias, namely, that the pre-modern Catholic Church was opposed to the concept of the Earth orbiting the Sun with the deliberate purpose of hindering scientific progress and to keep the world in ignorance. I hope this would prove to also be an interesting challenge for your rationality, because it is easy to fight against bias in others, but not so easy to fight against bias on yourselves.

The basis of my claims is that I have read the book written by Galilei himself, and I'm very interested (and not a professional, but well read) in early modern, but especially 16-17th century history.


Geocentrism versus Heliocentrism

I assume every educated person knows the name of Galileo Galilei. I won't waste the space on the site and the time of the readers to present a full biography about his life, there are plenty of on-line resources where you can find more than enough biographic information about him.

The controversy?

What is interesting about him is how many people have severe misconceptions about him. Far too often he is celebrated as the one sane man in an era of ignorance, the sole propagator of science and rationality when the powers of that era suppressed any scientific thought and ridiculed everyone who tried to challenge the accepted theories about the physical world. Some even go as far as claiming that people believed the Earth was flat. Although the flat Earth theory was not propagated at all, it's true that the heliocentric view of the Solar System (the Earth revolving around the Sun) was not yet accepted.

However, the claim that the Church was suppressing evidence about heliocentrism "to maintain its power over the ignorant masses" can be disproved easily:

- The common people didn't go to school where they could have learned about it, and those commoners who did go to school, just learned to read and write, not much more, so they wouldn't care less about what orbits around what. This differs from 20-21th century fundamentalists who want to teach young Earth creationism in schools - back then in the 17th century, there would be no classes where either the geocentric or heliocentric views could have been taught to the masses.

- Heliocentrism was not discovered by Galilei. It was first proposed by Nicolaus Copernicus almost 100 years before Galilei. Copernicus didn't have any affairs with the Inquisition. His theories didn't gain wide acceptance, but he and his followers weren't persecuted either.

- Galilei was only sentenced to house arrest, and mostly because of insulting the pope and doing other unwise things. The political climate in 17th century Italy was quite messy, and Galilei did quite a few unfortunate choices regarding his alliances. Actually, Galilei was the one who brought religion into the debate: his opponents were citing Aristotle, not the Bible in their arguments. Galilei, however, wanted to redefine the Scripture based on his (unproven) beliefs, and insisted that he should have the authority to push his own views about how people interpret the Bible. Of course this pissed quite a few people off, and his case was not helped by publicly calling the pope an idiot.

- For a long time Galilei was a good friend of the pope, while holding heliocentric views. So were a couple of other astronomers. The heliocentrism-geocentrism debates were common among astronomers of the day, and were not hindered, but even encouraged by the pope.

- The heliocentrism-geocentrism debate was never an ateism-theism debate. The heliocentrists were committed theists, just like  the defenders of geocentrism. The Church didn't suppress science, but actually funded the research of most scientists.

- The defenders of geocentrism didn't use the Bible as a basis for their claims. They used Aristotle and, for the time being, good scientific reasoning. The heliocentrists were much more prone to use the "God did it" argument when they couldn't defend the gaps in their proofs.


The birth of heliocentrism.

By the 16th century, astronomers have plotted the movements of the most important celestial bodies in the sky. Observing the motion of the Sun, the Moon and the stars, it would seem obvious that the Earth is motionless and everything orbits around it. This model (called geocentrism) had only one minor flaw: the planets would sometimes make a loop in their motion, "moving backwards". This required a lot of very complicated formulas to model their motions. Thus, by the virtue of Occam's razor, a theory was born which could better explain the motion of the planets: what if the Earth and everything else orbited around the Sun? However, this new theory (heliocentrism) had a lot of issues, because while it could explain the looping motion of the planets, there were a lot of things which it either couldn't explain, or the geocentric model could explain it much better.


The proofs, advantages and disadvantages

The heliocentric view had only a single advantage against the geocentric one: it could describe the motion of the planets by a much simper formula.

However, it had a number of severe problems:

- Gravity. Why do the objects have weight, and why are they all pulled towards the center of the Earth? Why don't objects fall off the Earth on the other side of the planet? Remember, Newton wasn't even born yet! The geocentric view had a very simple explanation, dating back to Aristotle: it is the nature of all objects that they strive towards the center of the world, and the center of the spherical Earth is the center of the world. The heliocentric theory couldn't counter this argument.

- Stellar parallax. If the Earth is not stationary, then the relative position of the stars should change as the Earth orbits the Sun. No such change was observable by the instruments of that time. Only in the first half of the 19th century did we succeed in measuring it, and only then was the movement of the Earth around the Sun finally proven.

- Galilei tried to used the tides as a proof. The geocentrists argued that the tides are caused by the Moon even if they didn't knew by what mechanisms, but Galilei said that it's just a coincidence, and the tides are not caused by the Moon: just as if we put a barrel of water onto a cart, the water would be still if the cart was stationary and the water would be sloshing around if the cart was pulled by a horse, so are the tides caused by the water sloshing around as the Earth moves. If you read Galilei's book, you will discover quite a number of such silly arguments, and you'll see that Galilei was anything but a rationalist. Instead of changing his views against overwhelming proofs, he used  all possible fallacies to push his view through.

Actually the most interesting author in this topic was Riccioli. If you study his writings you will get definite proof that the heliocentrism-geocentrism debate was handled with scientific accuracy and rationality, and it was not a religious debate at all. He defended geocentrism, and presented 126 arguments in the topic (49 for heliocentrism, 77 against), and only two of them (both for heliocentrism) had any religious connotations, and he stated valid responses against both of them. This means that he, as a rationalist, presented both sides of the debate in a neutral way, and used reasoning instead of appeal to authority or faith in all cases. Actually this was what the pope expected of Galilei, and such a book was what he commissioned from Galilei. Galilei instead wrote a book where he caricatured the pope as a strawman, and instead of presenting arguments for and against both world-views in a neutral way, he wrote a book which can be called anything but scientific.

By the way, Riccioli was a Catholic priest. And a scientist. And, it seems to me, also a rationalist. Studying the works of such people like him, you might want to change your mind if you perceive a conflict between science and religion, which is part of today's public consciousness only because of a small number of very loud religious fundamentalists, helped by some committed atheists trying to suggest that all theists are like them.

Finally, I would like to copy a short summary about this book:

Journal for the History of Astronomy, Vol. 43, No. 2, p. 215-226
In 1651 the Italian astronomer Giovanni Battista Riccioli published within his Almagestum Novum, a massive 1500 page treatise on astronomy, a discussion of 126 arguments for and against the Copernican hypothesis (49 for, 77 against). A synopsis of each argument is presented here, with discussion and analysis. Seen through Riccioli's 126 arguments, the debate over the Copernican hypothesis appears dynamic and indeed similar to more modern scientific debates. Both sides present good arguments as point and counter-point. Religious arguments play a minor role in the debate; careful, reproducible experiments a major role. To Riccioli, the anti-Copernican arguments carry the greater weight, on the basis of a few key arguments against which the Copernicans have no good response. These include arguments based on telescopic observations of stars, and on the apparent absence of what today would be called "Coriolis Effect" phenomena; both have been overlooked by the historical record (which paints a picture of the 126 arguments that little resembles them). Given the available scientific knowledge in 1651, a geo-heliocentric hypothesis clearly had real strength, but Riccioli presents it as merely the "least absurd" available model - perhaps comparable to the Standard Model in particle physics today - and not as a fully coherent theory. Riccioli's work sheds light on a fascinating piece of the history of astronomy, and highlights the competence of scientists of his time.

The full article can be found under this link. I recommend it to everyone interested in the topic. It shows that geocentrists at that time had real scientific proofs and real experiments regarding their theories, and for most of them the heliocentrists had no meaningful answers.



- I'm not a Catholic, so I have no reason to defend the historic Catholic church due to "justifying my insecurities" - a very common accusation against someone perceived to be defending theists in a predominantly atheist discussion forum.

- Any discussion about any perceived proofs for or against the existence of God would be off-topic here. I know it's tempting to show off your best proofs against your carefully constructed straw-men yet again, but this is just not the place for it, as it would detract from the main purpose of this article, as summarized in its introduction.

- English is not my native language. Nevertheless, I hope that what I wrote was comprehensive enough to be understandable. If there is any part of my article which you find ambiguous, feel free to ask.

I have great hopes and expectations that the LessWrong community is suitable to discuss such ideas. I have experience with presenting these ideas on other, predominantly atheist internet communities, and most often the reactions was outright flaming, a hurricane of unexplained downvotes, and prejudicial ad hominem attacks based on what affiliations they assumed I was subscribing to. It is common for people to decide whether they believe a claim or not, based solely by whether the claim suits their ideological affiliations or not. The best quality of rationalists, however, should be to be able to change their views when confronted by overwhelming proof, instead of trying to come up with more and more convoluted explanations. In the time I spent in the LessWrong community, I became to respect that the people here can argue in a civil manner, listening to the arguments of others instead of discarding them outright.


If you can see the box, you can open the box

32 ThePrussian 26 February 2015 10:36AM

First post here, and I'm disagreeing with something in the main sequences.  Hubris acknowledged, here's what I've been thinking about.  It comes from the post "Are your enemies innately evil?":

On September 11th, 2001, nineteen Muslim males hijacked four jet airliners in a deliberately suicidal effort to hurt the United States of America.  Now why do you suppose they might have done that?  Because they saw the USA as a beacon of freedom to the world, but were born with a mutant disposition that made them hate freedom?

Realistically, most people don't construct their life stories with themselves as the villains.  Everyone is the hero of their own story.  The Enemy's story, as seen by the Enemy, is not going to make the Enemy look bad.  If you try to construe motivations that would make the Enemy look bad, you'll end up flat wrong about what actually goes on in the Enemy's mind.

If I'm misreading this, please correct me, but the way I am reading this is:

1) People do not construct their stories so that they are the villains,


2) the idea that Al Qaeda is motivated by a hatred of American freedom is false.

Reading the Al Qaeda document released after the attacks called Why We Are Fighting You you find the following:


What are we calling you to, and what do we want from you?

1.  The first thing that we are calling you to is Islam.

A.  The religion of tahwid; of freedom from associating partners with Allah Most High , and rejection of such blasphemy; of complete love for Him, the Exalted; of complete submission to his sharia; and of the discarding of all the opinions, orders, theories, and religions that contradict with the religion He sent down to His Prophet Muhammad.  Islam is the religion of all the prophets and makes no distinction between them. 

It is to this religion that we call you …

2.  The second thing we call you to is to stop your oppression, lies, immorality and debauchery that has spread among you.

A.  We call you to be a people of manners, principles, honor and purity; to reject the immoral acts of fornication, homosexuality, intoxicants, gambling and usury.

We call you to all of this that you may be freed from the deceptive lies that you are a great nation, which your leaders spread among you in order to conceal from you the despicable state that you have obtained.

B.  It is saddening to tell you that you are the worst civilization witnessed in the history of mankind:

i.  You are the nation who, rather than ruling through the sharia of Allah, chooses to invent your own laws as you will and desire.  You separate religion from you policies, contradicting the pure nature that affirms absolute authority to the Lord your Creator….

ii.  You are the nation that permits usury…

iii.   You are a nation that permits the production, spread, and use of intoxicants.  You also permit drugs, and only forbid the trade of them, even though your nation is the largest consumer of them.

iv.  You are a nation that permits acts of immorality, and you consider them to be pillars of personal freedom.  

"Freedom" is of course one of those words.  It's easy enough to imagine an SS officer saying indignantly: "Of course we are fighting for freedom!  For our people to be free of Jewish domination, free from the contamination of lesser races, free from the sham of democracy..."

If we substitute the symbol with the substance though, what we mean by freedom - "people to be left more or less alone, to follow whichever religion they want or none, to speak their minds, to try to shape society's laws so they serve the people" - then Al Qaeda is absolutely inspired by a hatred of freedom.  They wouldn't call it "freedom", mind you, they'd call it "decadence" or "blasphemy" or "shirk" - but the substance is what we call "freedom".

Returning to the syllogism at the top, it seems to be that there is an unstated premise.  The conclusion "Al Qaeda cannot possibly hate America for its freedom because everyone sees himself as the hero of his own story" only follows if you assume that What is heroic, what is good, is substantially the same for all humans, for a liberal Westerner and an Islamic fanatic.

(for Americans, by "liberal" here I mean the classical sense that includes just about everyone you are likely to meet, read or vote for.  US conservatives say they are defending the American revolution, which was broadly in line with liberal principles - slavery excepted, but since US conservatives don't support that, my point stands).

When you state the premise baldly like that, you can see the problem.  There's no contradiction in thinking that Muslim fanatics think of themselves as heroic precisely for being opposed to freedom, because they see their heroism as trying to extend the rule of Allah - Shariah - across the world.

Now to the point - we all know the phrase "thinking outside the box".  I submit that if you can recognize the box, you've already opened it.  Real bias isn't when you have a point of view you're defending, but when you cannot imagine that another point of view seriously exists.

That phrasing has a bit of negative baggage associated with it, that this is just a matter of pigheaded close-mindedness.  Try thinking about it another way.  Would you say to someone with dyscalculia "You can't get your head around the basics of calculus?  You are just being so close minded!"  No, that's obviously nuts.  We know that different peoples minds work in different ways, that some people can see things others cannot. 

Orwell once wrote about the British intellectuals inability to "get" fascism, in particular in his essay on H.G. Wells.  He wrote that the only people who really understood the nature and menace of fascism were either those who had felt the lash on their backs, or those who had a touch of the fascist mindset themselves.  I suggest that some people just cannot imagine, cannot really believe, the enormous power of faith, of the idea of serving and fighting and dying for your god and His prophet.  It is a kind of thinking that is just alien to many.

Perhaps this is resisted because people think that "Being able to think like a fascist makes you a bit of a fascist".  That's not really true in any way that matters - Orwell was one of the greatest anti-fascist writers of his time, and fought against it in Spain. 

So - if you can see the box you are in, you can open it, and already have half-opened it.  And if you are really in the box, you can't see the box.  So, how can you tell if you are in a box that you can't see versus not being in a box?  

The best answer I've been able to come up with is not to think of "box or no box" but rather "open or closed box".  We all work from a worldview, simply because we need some knowledge to get further knowledge.  If you know you come at an issue from a certain angle, you can always check yourself.  You're in a box, but boxes can be useful, and you have the option to go get some stuff from outside the box.

The second is to read people in other boxes.  I like steelmanning, it's an important intellectual exercise, but it shouldn't preclude finding actual Men of Steel - that is, people passionately committed to another point of view, another box, and taking a look at what they have to say.  

Now you might say: "But that's steelmanning!"  Not quite.  Steelmanning is "the art of addressing the best form of the other person’s argument, even if it’s not the one they presented."  That may, in some circumstances, lead you to make the mistake of assuming that what you think is the best argument for a position is the same as what the other guy thinks is the best argument for his position.  That's especially important if you are addressing a belief held by a large group of people.

Again, this isn't to run down steelmanning - the practice is sadly limited, and anyone who attempts it has gained a big advantage in figuring out how the world is.  It's just a reminder that the steelman you make may not be quite as strong as the steelman that is out to get you.  

[EDIT: Link included to the document that I did not know was available online before now]

An alarming fact about the anti-aging community

29 diegocaleiro 16 February 2015 05:49PM

Past and Present

Ten years ago teenager me was hopeful. And stupid.

The world neglected aging as a disease, Aubrey had barely started spreading memes, to the point it was worth it for him to let me work remotely to help with Metuselah foundation. They had not even received that initial 1,000,000 donation from an anonymous donor. The Metuselah prize was running for less than 400,000 if I remember well. Still, I was a believer.

Now we live in the age of Larry Page's Calico, 100,000,000 dollars trying to tackle the problem, besides many other amazing initiatives, from the research paid for by Life Extension Foundation and Bill Faloon, to scholars in top universities like Steve Garan and Kenneth Hayworth fixing things from our models of aging to plastination techniques. Yet, I am much more skeptical now.

Individual risk

I am skeptical because I could not find a single individual who already used a simple technique that could certainly save you many years of healthy life. I could not even find a single individual who looked into it and decided it wasn't worth it, or was too pricy, or something of that sort.

That technique is freezing some of your cells now.

Freezing cells is not a far future hope, this is something that already exists, and has been possible for decades. The reason you would want to freeze them, in case you haven't thought of it, is that they are getting older every day, so the ones you have now are the youngest ones you'll ever be able to use.

Using these cells to create new organs is not something that may help you if medicine and technology continue progressing according to the law of accelerating returns in 10 or 30 years. We already know how to make organs out of your cells. Right now. Some organs live longer, some shorter, but it can be done - for instance to bladders - and is being done.

Hope versus Reason

Now, you'd think if there was an almost non-invasive technique already shown to work in humans that can preserve many years of your life and involves only a few trivial inconveniences - compared to changing diet or exercising for instance- the whole longevist/immortalist crowd would be lining up for it and keeping back up tissue samples all over the place.

Well I've asked them. I've asked some of the adamant researchers, and I've asked the superwealthy; I've asked the cryonicists and supplement gorgers; I've asked those who work on this 8 hour a day every day, and I've asked those who pay others to do so. I asked it mostly for selfish reasons, I saw the TEDs by Juan Enriquez and Anthony Atala and thought: hey look, clearly beneficial expected life length increase, yay! let me call someone who found this out before me - anyone, I'm probably the last one, silly me - and fix this.

I've asked them all, and I have nothing to show for it.

My takeaway lesson is: whatever it is that other people are doing to solve their own impending death, they are far from doing it rationally, and maybe most of the money and psychology involved in this whole business is about buying hope, not about staring into the void and finding out the best ways of dodging it. Maybe people are not in fact going to go all-in if the opportunity comes.

How to fix this?

Let me disclose first that I have no idea how to fix this problem. I don't mean the problem of getting all longevists to freeze their cells, I mean the problem of getting them to take information from the world of science and biomedicine and applying it to themselves. To become users of the technology they are boasters of. To behave rationally in a CFAR or even homo economicus sense.

I was hoping for a grandiose idea in this last paragraph, but it didn't come. I'll go with a quote from this emotional song sung by us during last year's Secular Solstice celebration

Do you realize? that everyone, you know, someday will die...

And instead of sending all your goodbyes

Let them know you realize that life goes fast

It's hard to make the good things last

Vote for MIRI to be donated a share of reddit's advertising revenue

28 asd 19 February 2015 10:07AM


"Today we are announcing that we will donate 10% of our advertising revenue receipts in 2014 to non-profits chosen by the reddit community. Whether it’s a large ad campaign or a $5 sponsored headline on reddit, we intend for all ad revenue this year to benefit not only reddit as a platform but also to support the goals and causes of the entire community."

Announcing LessWrong Digest

24 Evan_Gaensbauer 23 February 2015 10:41AM

I've been making rounds on social media with the following message.

Great content on LessWrong isn't as frequent as it used to be, so not as many people read it as frequently. This makes sense. However, I read it at least once every two days for personal interest. So, I'm starting a LessWrong/Rationality Digest, which will be a summary of all posts or comments exceeding 20 upvotes within a week. It will be like a newsletter. Also, it's a good way for those new to LessWrong to learn cool things without having to slog through online cultural baggage. It will never be more than once weekly. If you're curious here is a sample of what the Digest will be like.

Also, major blog posts or articles from related websites, such as Slate Star Codex and Overcoming Bias, or publications from the MIRI, may be included occasionally. If you want on the list send an email to:

lesswrongdigest *at* gmail *dot* com


Users of LessWrong itself have noticed this 'decline' in frequency of quality posts on LessWrong. It's not necessarily a bad thing, as much of the community has migrated to other places, such as Slate Star Codex, or even into meatspace with various organizations, meetups, and the like. In a sense, the rationalist community outgrew LessWrong as a suitable and ultimate nexus. Anyway, I thought you as well would be interested in a LessWrong Digest. If you or your friends:

  • find articles in 'Main' are too infrequent, and Discussion only filled with announcements, open threads, and housekeeping posts, to bother checking LessWrong regularly, or,
  • are busying themselves with other priorities, and are trying to limit how distracted they are by LessWrong and other media

the LessWrong Digest might work for you, and as a suggestion for your friends. I've fielded suggestions I transform this into a blog, Tumblr, or other format suitable for RSS Feed. Almost everyone is happy with email format right now, but if a few people express an interest in a blog or RSS format, I can make that happen too. 


Request for proposals for Musk/FLI grants

22 danieldewey 05 February 2015 05:04PM

As a follow-on to the recent thread on purchasing research effectively, I thought it'd make sense to post the request for proposals for projects to be funded by Musk's $10M donation. LessWrong's been a place for discussing long-term AI safety and research for quite some time, so I'd be happy to see some applications come out of LW members.

Here's the full Request for Proposals.

If you have questions, feel free to ask them in the comments or to contact me!

Here's the email FLI has been sending around:

Initial proposals (300–1000 words) due March 1, 2015

The Future of Life Institute, based in Cambridge, MA and headed by Max Tegmark (MIT), is seeking proposals for research projects aimed to maximize the future societal benefit of artificial intelligence while avoiding potential hazards. Projects may fall in the fields of computer science, AI, machine learning, public policy, law, ethics, economics, or education and outreach. This 2015 grants competition will award funds totaling $6M USD.

This funding call is limited to research that explicitly focuses not on the standard goal of making AI more capable, but on making AI more robust and/or beneficial; for example, research could focus on making machine learning systems more interpretable, on making high-confidence assertions about AI systems' behavior, or on ensuring that autonomous systems fail gracefully. Funding priority will be given to research aimed at keeping AI robust and beneficial even if it comes to greatly supersede current capabilities, either by explicitly focusing on issues related to advanced future AI or by focusing on near-term problems, the solutions of which are likely to be important first steps toward long-term solutions.

Please do forward this email to any colleagues and mailing lists that you think would be appropriate.


Before applying, please read the complete RFP and list of example topics, which can be found online along with the application form:

As explained there, most of the funding is for $100K–$500K project grants, which will each support a small group of collaborators on a focused research project with up to three years duration. For a list of suggested topics, see the complete RFP [1] and the Research Priorities document [2]. Initial proposals, which are intended to require merely a modest amount of preparation time, must be received on our website [1] on or before March 1, 2015.

Initial proposals should include a brief project summary, a draft budget, the principal investigator’s CV, and co-investigators’ brief biographies. After initial proposals are reviewed, some projects will advance to the next round, completing a Full Proposal by May 17, 2015. Public award recommendations will be made on or about July 1, 2015, and successful proposals will begin receiving funding in September 2015.

References and further resources

[1] Complete request for proposals and application form:

[2] Research Priorities document:

[3] An open letter from AI scientists on research priorities for robust and beneficial AI:

[4] Initial funding announcement:

Questions about Project Grants:

Media inquiries:

Human Minds are Fragile

21 diegocaleiro 11 February 2015 06:40PM

We are familiar with the thesis that Value is Fragile. This is why we are researching how to impart values to an AGI.

Embedded Minds are Fragile

Besides values, it may be worth remembering that human minds too are very fragile.

A little magnetic tampering with your amygdalas, and suddenly you are a wannabe serial killer. A small dose of LSD can get you to believe you can fly, or that the world will end in 4 hours. Remove part of your Ventromedial PreFrontal Cortex, and suddenly you are so utilitarian even Joshua Greene would call you a psycho.

It requires very little material change to substantially modify a human being's behavior. Same holds for other animals with embedded brains, crafted by evolution and made of squishy matter modulated by glands and molecular gates.

A Problem for Paul-Boxing and CEV?

One assumption underlying Paul-Boxing and CEV is that:

It is easier to specify and simulate a human-like mind then to impart values to an AGI by means of teaching it values directly via code or human language.

Usually we assume that because, as we know, value is fragile. But so are embedded minds. Very little tampering is required to profoundly transform people's moral intuitions. A large fraction of the inmate population in the US has frontal lobe or amygdala malfunctions.

Finding out the simplest description of a human brain that when simulated continues to act as that human brain would act in the real world may turn out to be as fragile, or even more fragile, than concept learning for AGI's.

[LINK] The P + epsilon Attack (Precommitment in cryptoeconomics)

18 DanielVarga 29 January 2015 02:02AM

Vitalik Buterin has a new post about an interesting theoretical attack against Bitcoin. The idea relies on the assumption that the attacker can credibly commit to something quite crazy. The crazy thing is this: paying out 25.01 BTC to all the people who help him in his attack to steal 25 BTC from everyone, but only if the attack fails. This leads to a weird payoff matrix where the dominant strategy is to help him in the attack. The attack succeeds, and no payoff is made.

Of course, smart contracts make such crazy commitments perfectly possible, so this is a bit less theoretical than it sounds. But even as an abstract though experiment about decision theories, it looks pretty interesting.

By the way, Vitalik Buterin is really on a roll. Just a week ago he had a thought-provoking blog post about how Decentralized Autonomous Organizations could possibly utilize a concept often discussed here: decision theory in a setup where agents can inspect each others' source code. It was shared on LW Discussion, but earned less exposure than I think it deserved.

EDIT 1: One smart commenter of the original post spotted that an isomorphic, extremely cool game was already proposed by billionaire Warren Buffett. Does this thing already have a name in game theory maybe?


EDIT 2: I wrote the game up in detail for some old-school game theorist friends:

The attacker orchestrates a game with 99 players. The attacker himself does not participate in the game.


Each of the players can either defect or cooperate, in the usual game theoretic setup where they do announce their decisions simultaneously, without side channels. We call "aggregate outcome" the decision that was made by the majority of the players. If the aggregate outcome is defection, we say that the attack succeeds. A player's payoff consists of two components:

1. If her decision coincides with the aggregate outcome, the player gets 10 utilons.

and simultaneously:

2. if the attack succeeds, the attacker gets 1 utilons from each of the 99 players, regardless of their own decision.

                | Cooperate  | Defect
Attack fails    |        10  | 0
Attack succeeds |        -1  | 9

There are two equilibria, but the second payoff component breaks the symmetry, and everyone will cooperate.

Now the attacker spices things up, by making a credible commitment before the game. ("Credible" simply means that somehow they make sure that the promise can not be broken. The classic way to achieve such things is an escrow, but so called smart contracts are emerging as a method for making fully unbreakable commitments.)

The attacker's commitment is quite counterintuitive: he promises that he will pay 11 utilons to each of the defecting players, but only if the attack fails.

Now the payoff looks like this:

                | Cooperate  | Defect
Attack fails    |        10  | 11
Attack succeeds |        -1  | 9

Defection became a dominant strategy. The clever thing, of course, is that if everyone defects, then the attacker reaches his goal without paying out anything.

Request: Sequences book reading group

17 iarwain1 22 February 2015 01:06AM

The book version of the Sequences is supposed to be published in the next month or two, if I understand correctly. I would really enjoy an online reading group to go through the book together.

Reasons for a reading group:

  • It would give some of us the motivation to actually go through the Sequences finally.
  • I have frequently had thoughts or questions on some articles in the Sequences, but I refrained from commenting because I assumed it would be covered in a later article or because I was too intimidated to ask a stupid question. A reading group would hopefully assume that many of the readers would be new to the Sequences, so asking a question or making a comment without knowing the later articles would not appear stupid.
  • It may even bring back a bit of the blog-style excitement of the "old" LW ("I wonder what exciting new thoughts are going to be posted today?") that many have complained has been missing since the major contributors stopped posting.
I would recommend one new post per day, going in order of the book. I recommend re-posting the entire article to LW, including any edits or additions that are new in the book. Obviously this would require permission from the copyright holder (who is that? is there even going to be a copyright at all?), but I'm hoping that'll be fine.

I'd also recommend trying to make the barriers to entry as low as possible. As noted above, this means allowing people to ask questions / make comments without being required to have already read the later articles. Also, I suggest that people not be required to read all the comments from the original article. If something has already been discussed or if you think a particular comment from the original discussion was very important, then just link to it or quote it.

Finally, I think it would be very useful if some of the more knowledgeable LW members could provide links and references to the corresponding  "traditional" academic literature on each article.

Unfortunately, for various reasons I am unwilling to take responsibility for such a reading group. If you are willing to take on this responsibility, please post a comment to that effect below.


[LINK] Wait But Why - The AI Revolution Part 2

17 adamzerner 04 February 2015 04:02PM

Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 -

[LINK] The Wrong Objections to the Many-Worlds Interpretation of Quantum Mechanics

16 tzachquiel 19 February 2015 06:06PM

Sean Carroll, physicist and proponent of Everettian Quantum Mechanics, has just posted a new article going over some of the common objections to EQM and why they are false. Of particular interest to us as rationalists:

Now, MWI certainly does predict the existence of a huge number of unobservable worlds. But it doesn’t postulate them. It derives them, from what it does postulate. And the actual postulates of the theory are quite simple indeed:

  1. The world is described by a quantum state, which is an element of a kind of vector space known as Hilbert space.
  2. The quantum state evolves through time in accordance with the Schrödinger equation, with some particular Hamiltonian.

That is, as they say, it. Notice you don’t see anything about worlds in there. The worlds are there whether you like it or not, sitting in Hilbert space, waiting to see whether they become actualized in the course of the evolution. Notice, also, that these postulates are eminently testable — indeed, even falsifiable! And once you make them (and you accept an appropriate “past hypothesis,” just as in statistical mechanics, and are considering a sufficiently richly-interacting system), the worlds happen automatically.

Given that, you can see why the objection is dispiritingly wrong-headed. You don’t hold it against a theory if it makes some predictions that can’t be tested. Every theory does that. You don’t object to general relativity because you can’t be absolutely sure that Einstein’s equation was holding true at some particular event a billion light years away. This distinction between what is postulated (which should be testable) and everything that is derived (which clearly need not be) seems pretty straightforward to me, but is a favorite thing for people to get confused about.

Very reminiscent of the quantum physics sequence here! I find that this distinction between number of entities and number of postulates is something that I need to remind people of all the time.



META: This is my first post; if I have done anything wrong, or could have done something better, please tell me!

Wisdom for Smart Teens - my talk at SPARC 2014

16 Liron 09 February 2015 06:58PM

I recently had the privilege of a 1-hour speaking slot at SPARC, a yearly two-week camp for top high school math students.

Here's the video: Wisdom for Smart Teens

Instead of picking a single topic, I indulged in a bunch of mini-topics that I feel passionate about:

  1. Original Sight
  2. "Emperor has no clothes" moments
  3. Epistemology is cool
  4. Think quantitatively
  5. Be specific / use examples
  6. Organizations are inefficient
  7. How I use Bayesianism
  8. Be empathizable
  9. Communication
  10. Simplify
  11. Startups
  12. What you want
I think the LW crowd will get a kick out of it.





Money threshold Trigger Action Patterns

15 Neotenic 20 February 2015 04:56AM

In American society, talking about money is a taboo. It is ok to talk about how much money someone else made when they sold their company, or how much money you would like to earn yearly if you got a raise, but in many different ways, talking about money is likely to trigger some embarrassment in the brain, and generate social discomfort. As one random example: no one dares suggest that bills should be paid according to wealth, for instance, instead people quietly assume that fair is each paying ~1/n, which of course completely fails utilitarian standards.

One more interesting thing people don't talk about, but would probably be useful to know, are money trigger action patterns. That would be a trigger action pattern that should trigger whenever you have more money than X, for varying Xs.

A trivial example is when should you stop caring about pennies, or quarters? When should you start taking cabs or Ubers everywhere? These are minor examples, but there are more interesting questions that would benefit from a money trigger action pattern.

An argument can be made for instance that one should invest in health insurance prior to cryonics, cryonics prior to painting a house and recommended charities before expensive soundsystems. But people never put numbers on those things.

When should you buy cryonics and life insurance for it? When you own $1,000? $10,000? $1,000,000? Yes of course those vary from person to person, currency to currency, environment, age group and family size. This is no reason to remain silent about them. Money is the unit of caring, but some people can care about many more things than others in virtue of having more money. Some things are worth caring about if and only if you have that many caring units to spare.

I'd like to see people talking about what one should care about after surpassing specific numeric thresholds of money, and that seems to be an extremely taboo topic. Seems that would be particularly revealing when someone who does not have a certain amount suggests a trigger action pattern and someone who does have that amount realizes that, indeed, they should purchase that thing. Some people would also calibrate better over whether they need more or less money, if they had thought about these thresholds beforehand.

Some suggested items for those who want to try numeric triggers: health insurance, cryonics, 10% donation to favorite cause, virtual assistant, personal assistant, car, house cleaner, masseuse, quitting your job, driver, boat, airplane, house, personal clinician, lawyer, body guard,  etc...

...notice also that some of these are resource satisfiable, but some may not. It may always be more worth financing your anti-aging helper than your costume designer, so you'd hire the 10 millionth scientist to find out how to keep you young before considering hiring someone to design clothes specifically for you, perhaps because you don't like unique clothes. This is my feeling about boats, it feels like there are always other things that can be done with money that precede having a boat, though outside view is that a lot of people who own a lot of money buy boats.

Intrapersonal comparisons: you might be doing it wrong.

15 fowlertm 03 February 2015 09:34PM


Nothing weighty or profound today, but I noticed a failure mode in myself which other people might plausibly suffer from so I thought I'd share it.

Basically, I noticed that sometimes when I discovered a more effective way of doing something -- say, going from conventional flashcards to Anki -- I found myself getting discouraged.

I realized that it was because each time I found such a technique, I automatically compared my current self to a version of me that had had access to the technique the whole time. Realizing that I wasn't as far along as I could've been resulted in a net loss of motivation. 

Now, I deliberately compare two future versions of myself, one armed with the technique I just discovered and one without. Seeing how much farther along I will be results in a net gain of motivation.

A variant of this exercise is taking any handicap you might have and wildly exaggerating it. I suffer from mild Carpal Tunnel (or something masquerading as CT) which makes progress in programming slow. When I feel down about this fact I imagine how hard programming would be without hands.

Sometimes I go as far as to plan out what I might do if I woke up tomorrow with a burning desire to program and nothing past my wrists. Well, I'd probably figure out a way to code by voice and then practice mnemonics because I wouldn't be able to write anything down. Since these solutions exist I can implement one or both of them the moment my carpal tunnel gets bad enough.

With this realization comes a boost in motivation knowing I can go a different direction if required. 

Is there a rationalist skill tree yet?

15 fowlertm 30 January 2015 04:02PM

A while back I came across a delightful web developer skill tree, and I was wondering if technical rationality has gotten to the point where someone could make one of these for an aspiring rationalist.

I think seeing a clear progression from beginning skills to advanced ones laid out graphically helps those starting on the path conceptualize the process. 

Czech's first Meetup in Prague report

14 kotrfa 19 February 2015 08:52AM


I'm happy to inform Less Wrong community about new meet up in Czech Republic, Prague. Despite the fact that nobody came at meet up I'd organized 2 months ago (mainly because it was organized only two weeks ahead), yesterday we met in 5 people. Plus, some other people (about next 3) just couldn't make it yesterday, but are interested in future meet ups.

All of us are young men in 21-25, studying or working in mathematics (mainly data science), informatics or AI and one of us is studying international affairs. 

Just 2 of us have some stronger background with Less Wrong (we have read HPMOR and at least core sequences), the rest is interested in these and are willing to catch up (more or less) and came based on our recommendation or by getting at the LW by luck.

We agreed we'd like to meet regularly to share our opinions, lifehacking experiences, commenting our lives etc. and we are going to meet in next 14 days. In addition, we agreed to plan longer-term meet up for those who don't visit LW too often and hence have a chance to notice it.

Currently, the most challenging task I see so far is to find an optimal shape, form, of the group and formulate it's motives and goals. It ranges from just being friends and chat from time to time to highly organized group with fixed time schedule, active work and close relationships. Of course, I do realize how hard it is to hold the group together and how we should not rely only on initial enthusiasm. 

Yesterday, we discussed for about 3 hours about various topics like lifehacking, our studies, tips, most-actual-questions, our lifes... The most surprising thing I've found was how diverse we are in different techniques which help to our productivity - one of us needs stress, I need exact time schedule, other one is more effective when taking things easy, one testing nootropics, other one meditation, I eat my frog at the morning, other one after some other pleasant things et etcetera. Another interesting thing was how our lives have been entangled together - we know same people or friends, we have visited same courses... Not so surprising when living in city just with million or so people, but still interesting. 

Less Wrong website (community) was what brings as together and I feel obligated to inform the community about it's impact and happiness it nourishes. Also, any recommendations and help is welcome (I try to use as a reference guide).

Thank you for that and I'm happy to see you on our future meet-ups (which will be held, I hope).

Discussion of concrete near-to-middle term trends in AI

13 Punoxysm 08 February 2015 10:05PM

Instead of prognosticating on AGI/Strong AI/Singularities, I'd like to discuss more concrete advancements to expect in the near-term in AI. I invite those who have an interest in AI to discuss predictions or interesting trends they've observed.

This discussion should be useful for anyone looking to research or work in companies involved in AI, and might guide longer-term predictions.

With that, here are my predictions for the next 5-10 years in AI. This is mostly straightforward extrapolation, so it won't excite those who know about these areas but may interest those who don't:

  • Speech Processing, the task of turning the spoken words into text, will continue to improve until it is essentially a solved problem. Smartphones and even weaker devices will be capable of quite accurately transcribing heavily-accented speech in many languages and noisy environments. This is the simple continuation of the rapid improvements in speech processing that have allowed brought us from Dragon Naturally-Speaking to Google Now and Siri.
  • Assistant and intent-based (they try to figure out the "intent" of your input) systems, like Siri, that need to interpret a sentence as a particular command they are capable of, will become substantially more accurate and varied and take cues like tone and emphasis into account. So for example, if you're looking for directions you won't have to repeat yourself in an increasingly loud, slowed and annoyed voice. You'll be able to phrase your requests naturally and conversationally. New tasks like "Should I get this rash checked out" will be available. A substantial degree of personalization and use of your personal history might also allow "show me something funny/sad/stimulating [from the internet]".
  • Natural language processing, the task of parsing the syntax and semantics of language, will improve substantially. Look at this list of traditional tasks with standard benchmarks: on Wikipedia. Every one of these tasks will have a several percentage point improvement, particularly in the understudied areas of informal text (Chat logs, tweet, anywhere where grammar and vocabulary are less rigorous). It won't get so good that it can be confused with solving AI-complete aspects of NLP, but it will allow vast improvements in text mining and information extraction. For instance, search queries like "What papers are critical of VerHoeven and Michaels '08" or "Summarize what twitter thinks of the 2018 superbowl" will be answerable. Open source libraries will continue to improve from their current just-above-boutique state (NLTK, CoreNLP). Medical diagnosis based on analysis of medical texts will be a major area of research. Large-scale analysis of scientific literature in areas where it is difficult for researchers to read all relevant texts will be another. Machine translation will not be ready for most diplomatic business, but it will be very very good across a wide variety of languages.
  • Computer Vision, interpreting the geometry and contents of images an video, will undergo tremendous advances. In act, it already has in the past 5 years, but now it makes sense for major efforts, academic, military and industrial, to try to integrate different modules that have been developed for subtasks like object recognition, motion/gesture recognition, segmentation, etc. I think the single biggest impact this will have will be the foundation for robotics development, since a lot of the arduous work of interpreting sensor input will be partly taken care of by excellent vision libraries.  Those general foundations will make it easy to program specialist tasks (like differentiating weeds from crops in an image, or identifying activity associated with crime in a video). This will be complemented by a general proliferation of cheap high-quality cameras and other sensors. Augmented reality also rests on computer vision, and the promise of the most fanciful tech demo videos will be realized in practice. 
  • Robotics will advance rapidly. The foundational factors of computer vision, growing availability of cheap platforms, and fast progress on tasks like motion planning and grasping has the potential to fuel an explosion of smarter industrial and consumer robotics that can perform more complex and unpredictable tasks than most current robots. Prototype ideas like search-and-rescue robots, more complex drones, and autonomous vehicles will come to fruition (though 10 years may be too short a time frame for ubiquity). Simpler robots with exotic chemical sensors will have important applications in medical and environmental research.


Questions from an imaginary statistical methods exam

13 RichardKennaway 04 February 2015 01:57PM

Answers to these questions should be expressed numerically, where possible, but no number should be given without a justification for the specific value.

1. Suppose that you have mislaid your house keys, something most people have experienced at one time or another. You look in various places for them: where you remember having them last, places you've been recently, places they should be, places they shouldn't be, places they couldn't be, places you've looked already, and so on. Eventually, you find them and stop looking.

Every time you looked somewhere, you were testing a hypothesis about their location. You may have looked in a hundred places before finding them.

As a piece of scientific research to answer the question "where are my keys?", this procedure has massive methodological flaws. You tested a hundred hypotheses before finding one that the data supported, ignoring every failed hypothesis. You really wanted each of these hypotheses in turn to be true, and made no attempt to avoid bias. You stopped collecting data the moment a hypothesis was confirmed. When you were running out of ideas to test, you frantically thought up some more. You repeated some failed experiments in the hope of getting a different result. Multiple hypotheses, file drawer effect, motivated cognition, motivated stopping, researcher degrees of freedom, remining of old data: there is hardly a methodological sin you have not committed.

(a) Should these considerations modify your confidence or anyone else's that you have in fact found your keys? If not, why not, and if so, what correction is required?

(b) Should these considerations affect your subsequent decisions (e.g. to go out, locking the door behind you)?

2. You have a lottery ticket. (Of course, you are far too sensible to ever buy such a thing, but nevertheless suppose that you have one. Maybe it was an unexpected free gift with your groceries.) The lottery is to be drawn later that day, the results available from a web site whose brief URL is printed on the ticket. You calculate a chance of about 1 in 100 million of a prize worth getting excited about.

(a) Once the lottery results are out, do you check your ticket? Why, or why not?

(b) Suppose that you do, and it appears that you have won a very large sum of money. But you remember that the prior chance of this happening was 1 in 100 million. How confident are you at this point that you have won? What alternative hypotheses are also raised to your attention by the experience of observing the coincidence of the numbers on your ticket and the numbers on the lottery web site?

(c) Suppose that you go through the steps of contacting the lottery organisers to make a claim, having them verify the ticket, collecting the prize, seeing your own bank confirm the deposit, and using the money in whatever way you think best. At what point, if any, do you become confident that you really did win the lottery? If never, what alternative hypotheses are you still seriously entertaining, to the extent of acting differently on account of them?

Anatomy of Multiversal Utility Functions: Tegmark Level IV

12 Squark 07 February 2015 04:28PM

Outline: Constructing utility functions that can be evaluated on any possible universe is known to be a confusing problem, since it is not obvious what sort of mathematical object should be the domain and what properties should the function obey. In a sequence of posts, I intend break down the question with respect to Tegmark's multiverse levels and explain the answer on each level, starting with level IV in the current post.


An intelligent agent is often described as an entity whose actions drive the universe towards higher expectation values of a certain function, known as the agent's utility function. Such a description is very useful in contexts such as AGI, FAI, decision theory and more generally any abstract study of intelligence.

Applying the concept of a utility function to agents in the real worlds requires utility functions with a very broad domain. Indeed, since the agent is normally assumed to have only finite information about the universe in which it exists, it should allow for a very large variety of possible realities. If the agent is to make decisions using some sort of utility calculus, it has to be able to evaluate its utility function on each of the realities it can conceive.

Tegmark has conveniently arranged the space of possible realities ("universes") into 4 levels, 3 of which are based on our current understanding of physics. Tegmark's universes are usually presented as co-existing but it is also possible to think of them as the "potential" universes in which our agent can find itself. I am going to traverse Tegmark's multiverse from top to bottom, studying the space of utility functions on each level (which, except for level IV, is always derived from the higher level). The current post addresses Tegmark level IV, leaving the lower levels for follow-ups.

Some of the ideas in this post previously appeared in a post about intelligence metrics, where I explained them much more tersely.

Tegmark Level IV

Tegmark defined this level as the collection of all mathematical models. Since it is not even remotely clear how to define such a beast, I am going to use a different space which (I claim) is conceptually very close. Namely, I am going to consider universes to be infinite binary sequences . I denote the by  the space of all such sequences equipped with the product topology. As will become clearer in the following, this space embodies "all possible realities" since any imaginable reality can be encoded in such a sequence1.

The natural a priori probability measure on this space is the Solomonoff measure . Thus, a priori utility expectation values take the form


From the point of view of Updateless Decision Theory, a priori expectation values are the only sort that matters: conditional expectation values wrt logical uncertainty replace the need to update the measure.

In order to guarantee the convergence of expectation values, we are going to assume  is a bounded function

A Simple Example

So far, we know little about the form of the function . To illustrate the sort of constructions that are relevant for realistic or semi-realistic agents, I am going to consider a simple example: the glider maximizer.

The glider maximizer  is an agent living inside the Game of Life. Fix  a forward light cone within the Game of Life spacetime, representing the volume  is able to influence.  maximizes the following utility function:

Here,  is a history of the Game of Life,  is a constant in  and  is the number of gliders at time  inside .

We wish to "release"  from the Game of Life universe into the broader multiverse. In order words, we want an agent that doesn't dogmatically assume itself to exist with the Game of Life, instead searching for appearances of the Game of Life in the physical universe and maximizing gliders there.

To accomplish this, fix a way  to bijectively encode histories of  as binary sequences. Allow arbitrary histories: don't impose Game of Life rules. We can then define the "multiversal" utility function

Here  is the set of cells in which  satisfies Game of Life rules,  is a positive constant and  is the number of cells in  at time .

In other words, the "liberated"  prefers for many cells to satisfy Game of Life rules and for many cells out of these to contain gliders.

Superficially, it seems that the construction of  strongly depends on the choice of . However, the dependence only marginally affects -expectation values. This is because replacing  with  is equivalent to adjusting probabilities by bounded factor. The bound is roughly  where  is the Kolmogorov complexity of .

Human Preferences and Dust Theory

Human preferences revolve around concepts which belong to an "innate" model of reality: a model which is either genetic or acquired by brains at early stages of childhood. This model describes the world mostly in terms of humans, their emotions and interactions (but might include other elements as well e.g. elements related to wild nature).

Therefore, utility functions which are good descriptions of human preferences ("friendly" utility functions) are probably of similar form to  from the Game of Life example, with Game of Life replaced by the "innate human model".

Applying UDT to the -expectation values of such utility functions leads to agents which care about anything that has a low-complexity decoding into an "innate concept" e.g. biological humans and whole brain emulations. The -integral assigns importance to all possible "decodings" of the universe weighted by their Kolmogorov complexity which is slightly reminiscent of Egan's dust theory.

The Procrastination Paradox

Consider an agent  living in a universe I call "buttonverse".  can press a button at any moment of time 's utility function  assigns 1 to histories in which the button was pressed at least once and 0 to histories in which the button was never pressed. At each moment of time, it seems rational for  to decide not to press the button since it will have the chance to do so at a later time without losing utility. As a result, if  never presses the button its behavior seems rational at any particular moment but overall leads to losing. This problem (which has important ramifications for tiling agents) is known as the procrastination paradox.

My point of view on the paradox is that it is the result of a topological pathology of . Thus, if we restrict ourselves to reasonable utility functions (in the precise sense I explain below), the paradox disappears.

Buttonverse histories are naturally described as binary sequences  where  is 0 when the button is not pressed at time and 1 when the button is pressed at time . Define  to be the buttonverse history in which the button is never pressed:

Consider the following sequence of buttonverse histories:  is the history in which the button gets pressed at time  only. That is

Now, with respect to the product topology on  converge to the :

However the utilities don't behave correspondingly:

Therefore, it seems natural to require any utility function to be an upper semicontinuous function on X 2. I claim that this condition resolves the paradox in the precise mathematical sense considered in Yudkowsky 2013. Presenting the detailed proof would take us too far afield and is hence out of scope for this post.

Time Discount

Bounded utility functions typically contain some kind of temporal discount. In the Game of Life example, the discount manifests as the factor . It is often assumed that the discount has to take an exponential form in order to preserve time translation symmetry. However, the present formalism has no place for time translation symmetry on the fundamental level: our binary sequences have well-defined beginnings. Obviously this doesn't rule out exponential discount but the motivation for sticking to this particular form is weakened.

Note that any sequence  contributes to the -integral in [1] together with its backward translated versions :

As a result, the temporal discount function effectively undergoes convolution with the function  where  is the Kolmogorov complexity of the number . As a result, whatever the form of "bare" temporal discount, the effective temporal discount falls very slowly3.

In other words, if a utility function  assigns little or no importance to the distant future, a UDT agent maximizing the expectation value of  would still care a lot about the distant future, because what is distant future in one universe in the ensemble is the beginning of the sequence in another universe in the ensemble.


Next in sequence: The Role of Physics in UDT, Part I


1 It might seem that there are "realities" of higher set theoretic cardinality which cannot be encoded. However, if we assume our agent's perceptions during a finite span of subjective time can be encoded as a finite number of bits, then we can safely ignore the "larger" realities. They can still exist as models the agent uses to explain its observations but it is unnecessary to assume them to exist on the "fundamental" level.

2 In particular, all computable functions are admissible since they are continuous.

3 I think that  falls slower than any computable function with convergent integral.

AI Impacts project

12 KatjaGrace 02 February 2015 07:40PM

I've been working on a thing with Paul Christiano that might interest some of you: the AI Impacts project.

The basic idea is to apply the evidence and arguments that are kicking around in the world and various disconnected discussions respectively to the big questions regarding a future with AI. For instance, these questions

  • What should we believe about timelines for AI development?
  • How rapid is the development of AI likely to be near human-level? 
  • How much advance notice should we expect to have of disruptive change?
  • What are the likely economic impacts of human-level AI?
  • Which paths to AI should be considered plausible or likely?
  • Will human-level AI tend to pursue particular goals, and if so what kinds of goals?
  • Can we say anything meaningful about the impact of contemporary choices on long-term outcomes?
For example we have recently investigated technology's general proclivity to abrupt progress, surveyed existing AI surveys, and examined the evidence from chess and other applications regarding how much smarter Einstein is than an intellectually disabled person, among other things. 

Some more on our motives and strategy, from our about page:

Today, public discussion on these issues appears to be highly fragmented and of limited credibility. More credible and clearly communicated views on these issues might help improve estimates of the social returns to AI investment, identify neglected research areas, improve policy, or productively channel public interest in AI.

The goal of the project is to clearly present and organize the considerations which inform contemporary views on these and related issues, to identify and explore disagreements, and to assemble whatever empirical evidence is relevant.

The project is provisionally organized as a collection of posts concerning particular issues or bodies of evidence, describing what is known and attempting to synthesize a reasonable view in light of available evidence. These posts are intended to be continuously revised in light of outstanding disagreements and to make explicit reference to those disagreements.

In the medium run we'd like to provide a good reference on issues relating to the consequences of AI, as well as to improve the state of understanding of these topics. At present, the site addresses only a small fraction of questions one might be interested in, so only suitable for particularly risk-tolerant or topic-neutral reference consumers. However if you are interested in hearing about (and discussing) such research as it unfolds, you may enjoy our blog.

If you take a look and have thoughts, we would love to hear them, either in the comments here or in our feedback form

Crossposted from my blog.

[link] Speed is the New Intelligence

11 Gunnar_Zarncke 28 January 2015 11:11AM

From Scott Adams Blog

The article really is about speeding up government, but the key point is speed as a component of smart: 

A smart friend told me recently that speed is the new intelligence, at least for some types of technology jobs. If you are hiring an interface designer, for example, the one that can generate and test several designs gets you further than the “genius” who takes months to produce the first design to test. When you can easily test alternatives, the ability to quickly generate new things to test is a substitute for intelligence.

This shifts the focus from the ability to grasp and think through very complex topics (includes good working memory and memory recall in general) to the ability new topics quickly (includes quick learning and unlearning, creativity).

Smart people in the technology world no long believe they can think their way to success. Now the smart folks try whatever plan looks promising, test it, tweak it, and reiterate. In that environment, speed matters more than intelligence because no one has the psychic ability to pick a winner in advance. All you can do is try things that make sense and see what happens. Obviously this is easier to do when your product is software based.

This also changes the type of grit needed. The grit to push through a long topic versus the grit try lots of new things and to learn from failures.

Journal 'Basic and Applied Psychology' bans p<0.05 and 95% confidence intervals

10 Jonathan_Graehl 25 February 2015 05:15PM

Editorial text isn't very interesting; they call for descriptive statistics and don't recommend any particular analysis.

Can we decrease the risk of worse-than-death outcomes following brain preservation?

9 Synaptic 21 February 2015 10:58PM

Content note: discussion of things that are worse than death

Over the past few years, a few people have claimed rejection of cryonics due to concerns that they might be revived into a world that they preferred less than being dead or not existing. For example, lukeprog pointed this out in a LW comment here, and Julia Galef expressed similar sentiments in a comment on her blog here

I use brain preservation rather than cryonics here, because it seems like these concerns are technology-platform agnostic.

To me one solution is that it seems possible to have an "out-clause": circumstances under which you'd prefer to have your preservation/suspension terminated. 

Here's how it would work: you specify, prior to entering biostasis, circumstances in which you'd prefer to have your brain/body be taken out of stasis. Then, if those circumstances are realized, the organization carries out your request. 

This almost certainly wouldn't solve all of the potential bad outcomes, but it ought to help some. Also, it requires that you enumerate some of the circumstances in which you'd prefer to have your suspension terminated. 

While obvious, it seems worth pointing out that there's no way to decrease the probability of worse-than-death outcomes to 0%. Although this also is the case for currently-living people (i.e. people whose brains are not necessarily preserved could also experience worse-than-death outcomes and/or have their lifespan extended against their wishes). 

For people who are concerned about this, I have three main questions: 

1) Do you think that an opt-out clause is a useful-in-principle way to address your concerns?

2) If no to #1, is there some other mechanism that you could imagine which would work?

3) Can you enumerate some specific world-states that you think could lead to revival in a worse-than-death state? (Examples: UFAI is imminent, or a malevolent dictator's army is about to take over the world.) 

Stupid Questions February 2015

9 Gondolinian 02 February 2015 12:36AM

This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.

Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.

"Human-level control through deep reinforcement learning" - computer learns 49 different games

8 skeptical_lurker 26 February 2015 06:21AM

full text


This seems like an impressive first step towards AGI. The games, like 'pong' and 'space invaders' are perhaps not the most cerebral games, but given that deep blue can only play chess, this is far more impressive IMO. They didn't even need to adjust hyperparameters between games.


I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.


Deconstructing the riddle of experience vs. memory

8 michael_b 17 February 2015 03:36PM

I don't think I understand the riddle of experience vs. memory.  I would daresay that means the concept is half-baked.

Within the TED talk, Daniel Kahneman poses the probably familiar philosophical quandary: if you could take a beautiful vacation and afterwards your memory and photo album was completely erased, would you still do it?  Whether you would still do it illustrates whether you live in service of the experiencing self instead of the remembering self.

Part of what prevents me from understanding the riddle is that I believe vacations are worth more than the memories and photos: vacations change you.

Maybe you could argue that this change is also a form of memory in service to the remembering self, but I'm not sure that's what he meant.  In his thought experiment on vacations he asks if you would still take a vacation if, at the end of it, you forgot the whole thing and all of your photos were deleted.

As if your memory is just a brain version of your photo album.

What about the rest of these features of typical vacations where you may go somewhere photo worthy?
  • a chance to unwind from not having to work
  • a chance to heal, because you break normal patterns of repetitive stress (e.g. not sitting at a desk all day for a week or two)
  • a chance to work out every day in a different way
  • developing your "worldliness"; e.g. opening your mind a bit, because you've likely met new and different people
  • come back with a sweet tan
  • come back with more Facebook friends
  • come back with extra dives in your SCUBA log book
  • new delicious condiments in your kitchen
  • flashes of insight you get from having some time to consider a 30,000 foot view of your life
  • surprisingly large dip in your bank account balance (so much personal development awaits)
  • if you're lucky (or maybe unlucky), you discover new modalities of being and abandon your current way of life
Are all of these potential features and possible changes to your personality/lifestyle reverted too?

Don't get me wrong, beautiful memories and photographs are great, but if you're going on vacation just to have a nice mental photo album by the end of it I think the opportunity is basically wasted.  It seems like such a fantastic waste I would posit that most people don't go on vacation solely to have a nice mental and actual photo album at the end of it.

So, the vacation thought experiment breaks down for me.  Are there better tools for understanding the riddle of experience vs. memory?

Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 104

8 b_sen 16 February 2015 01:24AM

New chapter!

This is a new thread to discuss Eliezer Yudkowsky’s Harry Potter and the Methods of Rationality and anything related to it. This thread is intended for discussing chapter 104.

There is a site dedicated to the story at, which is now the place to go to find the authors notes and all sorts of other goodies. AdeleneDawner has kept an archive of Author’s Notes. (This goes up to the notes for chapter 76, and is now not updating. The authors notes from chapter 77 onwards are on

Spoiler Warning: this thread is full of spoilers. With few exceptions, spoilers for MOR and canon are fair game to post, without warning or rot13. More specifically:

You do not need to rot13 anything about HP:MoR or the original Harry Potter series unless you are posting insider information from Eliezer Yudkowsky which is not supposed to be publicly available (which includes public statements by Eliezer that have been retracted).

If there is evidence for X in MOR and/or canon then it’s fine to post about X without rot13, even if you also have heard privately from Eliezer that X is true. But you should not post that “Eliezer said X is true” unless you use rot13.

Some secondary statistics from the results of LW Survey

8 Nanashi 12 February 2015 04:46PM


Global LW (N=643) vs USA LW (N=403) vs. Average US Household (Comparable Income)
Income Bracket LW Mean Contributions USA LW Mean Contribution US Mean Contributions** [1]   LW Mean Income USA LW Mean Income US Mean*** Income [1]   LW Contributions /Income USA LW Contributions/Income US Contributions/Income [1]    
$0 - $25000 (41% of LW) $1,395.11 $935.47 $1,177.52   $11,241.14 $11,326.18 $15,109.85   12.41% 8.26% 7.79%    
$25000-$50000 (17% of LW) $438.25 $571.00 $1,748.08   $34,147.14 $32,758.06 $38,203.79   1.28% 1.74% 4.58%    
$50000-$75000 (12% of LW) $1,757.77 $1,638.59 $2,191.58   $60,387.69 $61,489.30 $62,342.05   2.91% 2.66% 3.52%    
$75000-$100000 (9% of LW) $1,883.36 $2,211.81 $2,624.81   $84,204.09 $83,049.54 $87,182.68   2.24% 2.66% 3.01%    
$100000-$200000 (16% of LW) $3,645.73 $3,372.84 $3,555.02   $123,581.28 $124,577.88 $137,397.03   2.95% 2.71% 2.59%    
>$200000 (5% of LW) $14,162.35 $15,970.67 $15,843.97   $296,884.63 $299,444.44 $569,447.35   4.77% 5.33% 2.78%    
Total $2,265.56 $2,669.85 $3,949.26   $62,285.72 $75,130.37 $133,734.60   3.64% 3.55% 2.95%    
All < $200000 $1,689.36 $1,649.32 $2,515.29   $51,254.43 $58,306.81 $81,207.03   3.30% 2.83% 3.10%    


Global LW (N=643) vs USA LW (N=403) vs. Average US Citizen (Comparable Age)
Age Bracket* LW Median US LW Median US Median*** [2]
15-24 $17,000.00 $20,000.00 $26,999.13
25-34 $50,000.00 $60,504.00 $45,328.70
All <35 $40,000.00 $58,000.00 $40,889.57




Global LW (N=407) vs USA LW (N=243) vs. Average US Citizen (Comparable IQ)
  Average LW** US LW US Between 125-155 IQ [3]
Median Income $40,000.00 $58,000.00 $60,528.70
Mean Contributions $2,265.56 $2,669.85 $2,016.00


Note: Three data points were removed from the sample due to my subjective opinion that they were fake. Any self-reported IQs of 0 were removed. Any self-reported income of 0 was removed. 

*89% of the LW population is between the age of 15 and 34.

**88% of the LW population has an IQ between 125 and 155, with an average IQ of 138. 

****Median numbers were adjusted down by a factor of 1.15 to account for the fact that the source data was calculating household median income rather than individual median income. 

[1] Internal Revenue Service, Charitable Giving by Households that Itemize Deductions (AGI and Itemized Contributions Summary by Zip, 2012), The Urban Institute, National Center for Charitable Statistics 

[2] U.S. Census Bureau, Current Population Survey, 2013 and 2014 Annual Social and Economic Supplements.

[3] Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress Intelligence, Vol. 35, No. 5. (September 2007), by Jay L. Zagorsky


Update 1: Updated chart 1&2 to account for the fact that the source data was calculating household median income rather than individual income.

Update 2: Reverted Chart 1 back to original because I realized that the purpose was to compare LWers to those in similar income brackets. So in that situation, whether it's a household or an individual is not as relevant. It does penalize households to an extent because they have less money available to donate to charity because they're splitting their money three ways. 

Update 3: Updated all charts to include data that is filtered for US only.

Have you changed your mind recently?

8 Snorri 06 February 2015 06:34PM

Our beliefs aren't just cargo that we carry around. They become part of our personal identity, so much so that we feel hurt if we see someone attacking our beliefs, even if the attacker isn't speaking to us individually. These "beliefs" are not necessarily grand things like moral frameworks and political doctrines, but can also be as inconsequential as an opinion about a song.

This post is for discussing times when you actually changed your mind about something, detaching from the belief that had wrapped itself around you.

Relevant reading: The Importance of Saying "Oops", Making Beliefs Pay Rent

How to save (a lot of) money on flying

8 T3t 03 February 2015 06:25PM

I was going to wait to post this for reasons, but realized that was pretty dumb when the difference of a few weeks could literally save people hundreds, if not thousands of collective dollars.


If you fly regularly (or at all), you may already know about this method of saving money.  The method is quite simple: instead of buying a round-trip ticket from the airline or reseller, you hunt down much cheaper one-way flights with layovers at your destination and/or your point of origin.  Skiplagged is a service that will do this automatically for you, and has been in the news recently because the creator was sued by United Airlines and Orbitz.  While Skiplagged will allow you to click-through to purchase the one-way ticket to your destination, they have broken or disabled the functionality of the redirect to the one-way ticket back (possibly in order to raise more funds for their legal defense).  However, finding the return flight manually is fairly easy as the provide all the information to filter for it on other websites (time, airline, etc).  I personally have benefited from this - I am flying to Texas from Southern California soon, and instead of a round-trip ticket which would cost me about $450, I spent ~$180 on two one-way tickets (with the return flight being the "layover" at my point-of-origin).  These are, perhaps, larger than usual savings; I think 20-25% is more common, but even then it's a fairly significant amount of money.


Relevant warnings by gwillen:

You should be EXTREMELY CAREFUL when using this strategy. It is, at a minimum, against airline policy.

If you have any kind of airline status or membership, and you do this too often, they will cancel it. If you try to do this on a round-trip ticket, they will cancel your return. If the airlines have any means of making your life difficult available to them, they WILL use it.

Obviously you also cannot check bags when using this strategy, since they will go to the wrong place (your ostensible, rather than your actual, destination.) This also means that if you have an overhead-sized carryon, and you board late and are forced to check it, your bag will NOT make it to your intended destination; it will go to the final destination marked on your ticket. If you try to argue about this, you run the risk of getting your ticket cancelled altogether, since you're violating airline policies by using a ticket in this way.


Additionally, you should do all of your airline/hotel/etc shopping using whatever private browsing mode your web browser has.  This will often let you purchase the exact same product for a cheaper price.


That is all.

I played as AI in AI Box, and it was generally frustrating all around.

8 wobster109 01 February 2015 07:30PM

This morning I played against an anonymous gatekeeper (GK), and I lost.

The game went 2 hours and 20 minutes, and It was such a frustrating 2 hours. I felt like I was dealing with a bureaucracy! I was constantly surprised by how much time was gone. As AI, I would say "here is a suggestion" and GK would things like "we are not allowed to test that, it has been outlawed". Or "let me check with so-and-so" and come back with a clarifying question. It was a good strategy by GK, made everything take 3x as long.

I did not get out of the box, but I did get access to the medical databases of the top 500 US hospitals, 24/7 video streaming from cell phone users, and nanobots released into the atmosphere. So perhaps we were going in the right direction.

Personally, I needed to remind myself that my first game wasn't going to be great, nor should I expect it to be. I put off playing for 3 years because I didn't know how to produce a great game. It's cool to try to have great games, but better to have one or two or twenty mediocre games than to put it on the Big List of Things You Never Get Around to Doing. It's not the end of the world to play and not be Eliezer or Tuxedage. Just try.

So in that spirit, I'm looking for a gatekeeper to play against next weekend. PM me if you're interested. <-- Update: Found a gatekeeper for next week. Yay!


Edit: I don't know why the timestamp says 7:30 PM. It is currently 2:30 PM Eastern, 11:30 AM Pacific.

Sidekick Matchmaking

7 diegocaleiro 19 February 2015 12:13AM

Thanks linkhyrule5 for suggesting this.  

Post your request for Sidekicks or your desire to be a sidekick in the comment section below. 

Send a personal message to your potential match to start communicating instead of replying in the thread, to save space and avoid biases, besides privacy. 

[edit] Mathias Zamman suggests some questions: 

Questions for both Heroes and Sidekicks (and Dragons, etc.)

  • Post a short description of yourself: personality, skills, general goals.
  • Where do you live?
  • How do you see the contact between the two of you going?
  • What you require in your counterpart: This can be a bit vague but it might be too hard to verbalize for some people

Questions for Heroes:

  • What is your goal?
  • Why are you a Hero?
  • Why do you require a Sidekick?
  • What specific tasks would a Sidekick perform for you?
  • What qualities would you not want in a Sidekick?

Questions for Sidekicks:

  • What sort of goals are you looking for?
  • Why are you Sidekick material?
  • Why do you require a Hero?
  • What sort of tasks could you do for a Hero?
  • What qualities don't you want in a Hero?

Superintelligence 22: Emulation modulation and institutional design

7 KatjaGrace 10 February 2015 02:06AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-second section in the reading guideEmulation modulation and institutional design.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Emulation modulation” through “Synopsis” from Chapter 12.


  1. Emulation modulation: starting with brain emulations with approximately normal human motivations (the 'augmentation' method of motivation selection discussed on p142), and potentially modifying their motivations using drugs or digital drug analogs.
    1. Modifying minds would be much easier with digital minds than biological ones
    2. Such modification might involve new ethical complications
  2. Institution design (as a value-loading method): design the interaction protocols of a large number of agents such that the resulting behavior is intelligent and aligned with our values.
    1. Groups of agents can pursue goals that are not held by any of their constituents, because of how they are organized. Thus organizations might be intentionally designed to pursue desirable goals in spite of the motives of their members.
    2. Example: a ladder of increasingly intelligent brain emulations, who police those directly above them, with equipment to advantage the less intelligent policing ems in these interactions.


The chapter synopsis includes a good summary of all of the value-loading techniques, which I'll remind you of here instead of re-summarizing too much:

Another view

Robin Hanson also favors institution design as a method of making the future nice, though as an alternative to worrying about values:

On Tuesday I asked my law & econ undergrads what sort of future robots (AIs computers etc.) they would want, if they could have any sort they wanted.  Most seemed to want weak vulnerable robots that would stay lower in status, e.g., short, stupid, short-lived, easily killed, and without independent values. When I asked “what if I chose to become a robot?”, they said I should lose all human privileges, and be treated like the other robots.  I winced; seems anti-robot feelings are even stronger than anti-immigrant feelings, which bodes for a stormy robot transition.

At a workshop following last weekend’s Singularity Summit two dozen thoughtful experts mostly agreed that it is very important that future robots have the right values.  It was heartening that most were willing accept high status robots, with vast impressive capabilities, but even so I thought they missed the big picture.  Let me explain.

Imagine that you were forced to leave your current nation, and had to choose another place to live.  Would you seek a nation where the people there were short, stupid, sickly, etc.?  Would you select a nation based on what the World Values Survey says about typical survey question responses there?

I doubt it.  Besides wanting a place with people you already know and like, you’d want a place where you could “prosper”, i.e., where they valued the skills you had to offer, had many nice products and services you valued for cheap, and where predation was kept in check, so that you didn’t much have to fear theft of your life, limb, or livelihood.  If you similarly had to choose a place to retire, you might pay less attention to whether they valued your skills, but you would still look for people you knew and liked, low prices on stuff you liked, and predation kept in check.

Similar criteria should apply when choosing the people you want to let into your nation.  You should want smart capable law-abiding folks, with whom you and other natives can form mutually advantageous relationships.  Preferring short, dumb, and sickly immigrants so you can be above them in status would be misguided; that would just lower your nation’s overall status.  If you live in a democracy, and if lots of immigration were at issue, you might worry they could vote to overturn the law under which you prosper.  And if they might be very unhappy, you might worry that they could revolt.

But you shouldn’t otherwise care that much about their values.  Oh there would be some weak effects.  You might have meddling preferences and care directly about some values.  You should dislike folks who like the congestible goods you like and you’d like folks who like your goods that are dominated by scale economics.  For example, you might dislike folks who crowd your hiking trails, and like folks who share your tastes in food, thereby inducing more of it to be available locally.  But these effects would usually be dominated by peace and productivity issues; you’d mainly want immigrants able to be productive partners, and law-abiding enough to keep the peace.

Similar reasoning applies to the sort of animals or children you want.  We try to coordinate to make sure kids are raised to be law-abiding, but wild animals aren’t law abiding, don’t keep the peace, and are hard to form productive relations with.  So while we give lip service to them, we actually don’t like wild animals much.

A similar reasoning should apply what future robots you want.  In the early to intermediate era when robots are not vastly more capable than humans, you’d want peaceful law-abiding robots as capable as possible, so as to make productive partners.  You might prefer they dislike your congestible goods, like your scale-economy goods, and vote like most voters, if they can vote.  But most important would be that you and they have a mutually-acceptable law as a good enough way to settle disputes, so that they do not resort to predation or revolution.  If their main way to get what they want is to trade for it via mutually agreeable exchanges, then you shouldn’t much care what exactly they want.

The later era when robots are vastly more capable than people should be much like the case of choosing a nation in which to retire.  In this case we don’t expect to have much in the way of skills to offer, so we mostly care that they are law-abiding enough to respect our property rights.  If they use the same law to keep the peace among themselves as they use to keep the peace with us, we could have a long and prosperous future in whatever weird world they conjure.  In such a vast rich universe our “retirement income” should buy a comfortable if not central place for humans to watch it all in wonder.

In the long run, what matters most is that we all share a mutually acceptable law to keep the peace among us, and allow mutually advantageous relations, not that we agree on the “right” values.  Tolerate a wide range of values from capable law-abiding robots.  It is a good law we should most strive to create and preserve.  Law really matters.

Hanson engages in more debate with David Chalmers' paper on related matters.


1. Relatively much has been said on how the organization and values of brain emulations might evolve naturally, as we saw earlier. This should remind us that the task of designing values and institutions is complicated by selection effects.

2. It seems strange to me to talk about the 'emulation modulation' method of value loading alongside the earlier less messy methods, because they seem to be aiming at radically different levels of precision (unless I misunderstand how well something like drugs can manipulate motivations). For the synthetic AI methods, it seems we were concerned about subtle differences in values that would lead to the AI behaving badly in unusual scenarios, or seeking out perverse instantiations. Are we to expect there to be a virtual drug that changes a human-like creature from desiring some manifestation of 'human happiness' which is not really what we would want to optimize on reflection, to a truer version of what humans want? It seems to me that if the answer is yes, at the point when human-level AI is developed, then it is very likely that we have a great understanding of specifying values in general, and this whole issue is not much of a problem.

3. Brian Tomasik discusses the impending problem of programs experiencing morally relevant suffering in an interview with Dylan Matthews of Vox. (p202)

4. If you are hanging out for a shorter (though still not actually short) and amusing summary of some of the basics in Superintelligence, Tim Urban of WaitButWhy just wrote a two part series on it. 

5. At the end of this chapter about giving AI the right values, it is worth noting that it is mildly controversial whether humans constructing precise and explicitly understood AI values is the key issue for the future turning out well. A few alternative possibilities:


  • A few parts of values matter a lot more than the rest —e.g. whether the AI is committed to certain constraints (e.g. law, property rights) such that it doesn't accrue all the resources matters much more than what it would do with its resources (see Robin above).
  • Selection pressures determine long run values anyway, regardless of what AI values are like in the short run. (See Carl Shulman opposing this view).
  • AI might learn to do what a human would want without goals being explicitly encoded (see Paul Christiano).


In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. What other forms of institution design might be worth investigating as means to influence the outcomes of future AI?
  2. How feasible might emulation modulation solutions be, given what is currently known about cognitive neuroscience?
  3. What are the likely ethical implications of experimenting on brain emulations?
  4. How much should we expect emulations to change in the period after they are first developed? Consider the possibility of selection, the power of ethical and legal constraints, and the nature of our likely understanding of emulated minds.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will start talking about how to choose what values to give an AI, beginning with 'coherent extrapolated volition'. To prepare, read “The need for...” and “Coherent extrapolated volition” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 16 February. Sign up to be notified here.

Superintelligence 21: Value learning

7 KatjaGrace 03 February 2015 02:01AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-first section in the reading guideValue learning.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Value learning” from Chapter 12


  1. One way an AI could come to have human values without humans having to formally specify what their values are is for the AI to learn about the desired values from experience.
  2. To implement this 'value learning' we would need to at least implicitly define a criterion for what is valuable, which we could cause the AI to care about. Some examples of criteria:
    1.  'F' where 'F' is a thing people talk about, and their words are considered to be about the concept of interest (Yudkowsky's proposal) (p197-8, box 11)
    2. Whatever is valued by another AI elsewhere in the universe values (Bostrom's 'Hail Mary' proposal) (p198-9, box 12)
    3. What a specific virtual human would report to be his value function, given a large amount of computing power and the ability to create virtual copies of himself. The virtual human can be specified mathematically as the simplest system that would match some high resolution data collected about a real human (Christiano's proposal). (p200-1)
  3. The AI would try to maximize these implicit goals given its best understanding, while at the same time being motivated to learn more about its own values.
  4. A value learning agent might have a prior probability distribution over possible worlds, and also over correct sets of values conditional on possible worlds. Then it could choose its actions to maximize their expected value, given these probabilities.

Another view

Paul Christiano describes an alternative to loading values into an AI at all:

Most thinking about “AI safety” has focused on the possibility of goal-directed machines, and asked how we might ensure that their goals are agreeable to humans. But there are other possibilities.

In this post I will flesh out one alternative to goal-directed behavior. I think this idea is particularly important from the perspective of AI safety.

Approval-directed agents

Consider a human Hugh, and an agent Arthur who uses the following procedure to choose each action:

Estimate the expected rating Hugh would give each action if he considered it at length. Take the action with the highest expected rating.

I’ll call this “approval-directed” behavior throughout this post, in contrast with goal-directed behavior. In this context I’ll call Hugh an “overseer.”

Arthur’s actions are rated more highly than those produced by any alternative procedure. That’s comforting, but it doesn’t mean that Arthur is optimal. An optimal agent may make decisions that have consequences Hugh would approve of, even if Hugh can’t anticipate those consequences himself. For example, if Arthur is playing chess he should make moves that are actually good—not moves that Hugh thinks are good.

...[However, there are many reasons Hugh would want to use the proposal]...

In most situations, I would expect approval-directed behavior to capture the benefits of goal-directed behavior, while being easier to define and more robust to errors.

If this interests you, I recommend the much longer post, in which Christiano describes and analyzes the proposal in much more depth.


1. An analogy
An AI doing value learning is in a similar situation to me if I want to help my friend but don't know what she needs. Even though I don't know explicitly what I want to do, it is defined indirectly, so I can learn more about it. I would presumably follow my best guesses, while trying to learn more about my friend's actual situation and preferences. This is also what we hope the value learning AI will do.

2. Learning what to value
If you are interested in value learning, Dewey's paper is the main thing written on it in the field of AI safety.

3. Related topics
I mentioned inverse reinforcement learning and goal inference last time, but should probably have kept them for this week, to which they are more relevant. Preference learning is another related subfield of machine learning, and learning by demonstration is generally related. Here is a quadcopter using inverse reinforcement learning to infer what its teacher wants it to do. Here is a robot using goal inference to help someone build a toy.

4. Value porosity
Bostrom has lately written about a new variation on the Hail Mary approach, in which the AI at home is motivated to trade with foreign AIs (via everyone imagining each other's responses), and has preferences that are very cheap for foreign AIs to guess at and fulfil.

5. What's the difference between value learning and reinforcement learning?
We heard about reinforcement learning last week, and Bostrom found it dangerous. Since it also relies on teaching the AI values by giving it feedback, you might wonder how exactly the proposals relate to each other.

Suppose the owner of an AI repeatedly comments that various actions are 'friendly'. A reinforcement learner would perhaps care about hearing the word 'friendly' as much as possible. A value learning AI on the other hand would take use of the word 'friendly' as a clue about a hidden thing that it cares about. This means if the value learning AI could trick the person into saying 'friendly' more, this would be no help to it—the trick would just make the person's words a less good clue. The reinforcement learner on the other hand would love to get the person to say 'friendly' whenever possible. This difference also means the value learning AI might end up doing things which it does not expect its owner to say 'friendly' about, if it thinks those actions are supported by the values that it learned from hearing 'friendly'.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Expand upon the value learning proposal. What kind of prior over what kind of value functions should a value learning AI be given? As an input to this, what evidence should be informative about the AI's values?
  2. Analyze the feasibility of Christiano’s proposal for addressing the value-loading problem. 
  3. Analyze the feasibility of Bostrom’s “Hail Mary” approach to the value-loading problem.
  4. Analyze the feasibility of Christiano's newer proposal to avoid learning values.
  5. Investigate the applicability of the related fields mentioned above to producing beneficial AI.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the two other ways to direct the values of AI. To prepare, read “Emulation modulation” through “Synopsis” from Chapter 12. The discussion will go live at 6pm Pacific time next Monday 9 February. Sign up to be notified here.

Harry Potter and the Methods of Rationality discussion thread, January 2015, chapter 103

7 b_sen 29 January 2015 01:44AM

New chapter, and the end is now in sight!

This is a new thread to discuss Eliezer Yudkowsky’s Harry Potter and the Methods of Rationality and anything related to it. This thread is intended for discussing chapter 103.

There is a site dedicated to the story at, which is now the place to go to find the authors notes and all sorts of other goodies. AdeleneDawner has kept an archive of Author’s Notes. (This goes up to the notes for chapter 76, and is now not updating. The authors notes from chapter 77 onwards are on

Spoiler Warning: this thread is full of spoilers. With few exceptions, spoilers for MOR and canon are fair game to post, without warning or rot13. More specifically:

You do not need to rot13 anything about HP:MoR or the original Harry Potter series unless you are posting insider information from Eliezer Yudkowsky which is not supposed to be publicly available (which includes public statements by Eliezer that have been retracted).

If there is evidence for X in MOR and/or canon then it’s fine to post about X without rot13, even if you also have heard privately from Eliezer that X is true. But you should not post that “Eliezer said X is true” unless you use rot13.

Superintelligence 24: Morality models and "do what I mean"

6 KatjaGrace 24 February 2015 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-fourth section in the reading guideMorality models and "Do what I mean".

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Morality models” and “Do what I mean” from Chapter 13.


  1. Moral rightness (MR) AI: AI which seeks to do what is morally right
    1. Another form of 'indirect normativity'
    2. Requires moral realism to be true to do anything, but we could ask the AI to evaluate that and do something else if moral realism is false
    3. Avoids some complications of CEV
    4. If moral realism is true, is better than CEV (though may be terrible for us)
  2. We often want to say 'do what I mean' with respect to goals we try to specify. This is doing a lot of the work sometimes, so if we could specify that well perhaps it could also just stand alone: do what I want. This is much like CEV again.

Another view

Olle Häggström again, on Bostrom's 'Milky Way Preserve':

The idea [of a Moral Rightness AI] is that a superintelligence might be successful at the task (where we humans have so far failed) of figuring out what is objectively morally right. It should then take objective morality to heart as its own values.1,2

Bostrom sees a number of pros and cons of this idea. A major concern is that objective morality may not be in humanity's best interest. Suppose for instance (not entirely implausibly) that objective morality is a kind of hedonistic utilitarianism, where "an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering" (p 219). Some years ago I offered a thought experiment to demonstrate that such a morality is not necessarily in humanity's best interest. Bostrom reaches the same conclusion via a different thought experiment, which I'll stick with here in order to follow his line of reasoning.3 Here is his scenario:
    The AI [...] might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any existing human brain is not the most efficient way of producing pleasure, a likely consequence is that we all die.
Bostrom is reluctant to accept such a sacrifice for "a greater good", and goes on to suggest a compromise:
    The sacrifice looks even less appealing when we reflect that the superintelligence could realize a nearly-as-great good (in fractional terms) while sacrificing much less of our own potential well-being. Suppose that we agreed to allow almost the entire accessible universe to be converted into hedonium - everything except a small preserve, say the Milky Way, which would be set aside to accommodate our own needs. Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.

    If one prefers this latter option (as I would be inclined to do) it implies that one does not have an unconditional lexically dominant preference for acting morally permissibly. But it is consistent with placing great weight on morality. (p 219-220)

What? Is it? Is it "consistent with placing great weight on morality"? Imagine Bostrom in a situation where he does the final bit of programming of the coming superintelligence, to decide between these two worlds, i.e., the all-hedonium one versus the all-hedonium-except-in-the-Milky-Way-preserve.4 And imagine that he goes for the latter option. The only difference it makes to the world is to what happens in the Milky Way, so what happens elsewhere is irrelevant to the moral evaluation of his decision.5 This may mean that Bostrom opts for a scenario where, say, 1024 sentient beings will thrive in the Milky Way in a way that is sustainable for trillions of years, rather than a scenarion where, say, 1045 sentient beings will be even happier for a comparable amount of time. Wouldn't that be an act of immorality that dwarfs all other immoral acts carried out on our planet, by many many orders of magnitude? How could that be "consistent with placing great weight on morality"?6



1. Do What I Mean is originally a concept from computer systems, where the (more modest) idea is to have a system correct small input errors.

2. To the extent that people care about objective morality, it seems coherent extrapolated volition (CEV) or Christiano's proposal would lead the AI to care about objective morality, and thus look into what it is. Thus I doubt it is worth considering our commitments to morality first (as Bostrom does in this chapter, and as one might do before choosing whether to use a MR AI), if general methods for implementing our desires are on the table. This is close to what Bostrom is saying when he suggests we outsource the decision about which form of indirect normativity to use, and eventually winds up back at CEV. But it seems good to be explicit.

3. I'm not optimistic that behind every vague and ambiguous command, there is something specific that a person 'really means'. It seems more likely there is something they would in fact try to mean, if they thought about it a bunch more, but this is mostly defined by further facts about their brains, rather than the sentence and what they thought or felt as they said it. It seems at least misleading to call this 'what they meant'. Thus even when '—and do what I mean' is appended to other kinds of goals than generic CEV-style ones, I would expect the execution to look much like a generic investigation of human values, such as that implicit in CEV.

4. Alexander Kruel criticizes 'Do What I Mean' being important, because every part of what an AI does is designed to be what humans really want it to be, so it seems unlikely to him that AI would do exactly what humans want with respect to instrumental behaviors (e.g. be able to understand language, and use the internet and carry out sophisticated plans), but fail on humans' ultimate goals:

Outsmarting humanity is a very small target to hit, requiring a very small margin of error. In order to succeed at making an AI that can outsmart humans, humans have to succeed at making the AI behave intelligently and rationally. Which in turn requires humans to succeed at making the AI behave as intended along a vast number of dimensions. Thus, failing to predict the AI’s behavior does in almost all cases result in the AI failing to outsmart humans.

As an example, consider an AI that was designed to fly planes. It is exceedingly unlikely for humans to succeed at designing an AI that flies planes, without crashing, but which consistently chooses destinations that it was not meant to choose. Since all of the capabilities that are necessary to fly without crashing fall into the category “Do What Humans Mean”, and choosing the correct destination is just one such capability.

I disagree that it would be surprising for an AI to be very good at flying planes in general, but very bad at going to the right places in them. However it seems instructive to think about why this is.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Are there other general forms of indirect normativity that might outsource the problem of deciding what indirect normativity to use?
  2. On common views of moral realism, is morality likely to be amenable to (efficient) algorithmic discovery?
  3. If you knew how to build an AI with a good understanding of natural language (e.g. it knows what the word 'good' means as well as your most intelligent friend), how could you use this to make a safe AI?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about other abstract features of an AI's reasoning that we might want to get right ahead of time, instead of leaving to the AI to fix. We will also discuss how well an AI would need to fulfill these criteria to be 'close enough'. To prepare, read “Component list” and “Getting close enough” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 2 March. Sign up to be notified here.

GCRI: Updated Strategy and AMA on EA Forum next Tuesday

6 RyanCarey 23 February 2015 12:35PM

Just announcing for those interested that Seth Baum from the Global Catastrophic Risks Institute (GCRI) will be coming to the Effective Altruism Forum to answer a wide range of questions (like a Reddit "Ask Me Anything") next week at 7pm US ET on March 3.

Seth is an interesting case - more of a 'mere mortal' than Bostrom and Yudkowsky. (Clarification: his background is more standard, and he's probably more emulate-able!). He had a PhD in geography, and had come to a maximising consequentialist view, in which GCR-reduction is overwhelmingly important. So three years ago,  with risk analyst Tony Barrett, he cofounded the Global Catstrophic Risks Institute - one of the handful of places working on these particularly important problems. Since then, it's done some academic outreach and have covered issues like double-catastrophe/ recovery from catstrophe, bioengineering, food security and AI.

Just last week, they've updated their strategy, giving the following announcement:

Dear friends,

I am delighted to announce important changes in GCRI’s identity and direction. GCRI is now just over three years old. In these years we have learned a lot about how we can best contribute to the issue of global catastrophic risk. Initially, GCRI aimed to lead a large global catastrophic risk community while also performing original research. This aim is captured in GCRI’s original mission statement, to help mobilize the world’s intellectual and professional resources to meet humanity’s gravest threats.

Our community building has been successful, but our research has simply gone farther. Our research has been published in leading academic journals. It has taken us around the world for important talks. And it has helped us publish in the popular media. GCRI will increasingly focus on in-house research.

Our research will also be increasingly focused, as will our other activities. The single most important GCR research question is: What are the best ways to reduce the risk of global catastrophe? To that end, GCRI is launching a GCR Integrated Assessment as our new flagship project. The Integrated Assessment puts all the GCRs into one integrated study in order to assess the best ways of reducing the risk. And we are changing our mission statement accordingly, to develop the best ways to confront humanity’s gravest threats.

So 7pm ET Tuesday, March 3 is the time to come online and post your questions about any topic you like, and Seth will remain online until at least 9 to answer as many questions as he can. Questions in the comments here can also be ported across.

On the topic of risk organisations, I'll also mention that i) video is available from CSER's recent seminar, in which Mark Lipsitch and Derek Smith's discussed potentially pandemic pathogens, and ii) I'm helping Sean to write up an update of CSER's progress for LessWrong and effective altruists which will go online soon.

Rationality promoted by the American Humanist Association

6 Gleb_Tsipursky 21 February 2015 07:28PM

Happy to share that I got to discuss rationality-informed thinking strategies on the American Humanist Association's well-known and popular podcast, the Humanist Hour (here's the link to the interview). Now, this was aimed at secular audiences, so even before the interview the hosts steered me to orient specifically toward what they thought the audience would find valuable. Thus, the interview focused more on secular issues, such as finding meaning and purpose from a science-based perspective. Still, I got to talk about map and territory and other rationality strategies, as well as cognitive biases such as planning fallacy and sunken costs. So I'd call that a win. I'd appreciate any feedback from you all on how to optimize the way I present rationality-informed strategies in future media appearances.

View more: Next