Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] 2018 AI Safety Literature Review and Charity Comparison

2 Larks 20 December 2017 10:04PM

MIRI's 2017 Fundraiser

8 malo 07 December 2017 09:47PM

Update 2017-12-27: We've blown past our 3rd and final target, and reached the matching cap of $300,000 for the $2 million Matching Challenge! Thanks so much to everyone who supported us!

All donations made before 23:59 PST on Dec 31st will continue to be counted towards our fundraiser total. The fundraiser total includes projected matching funds from the Challenge.



MIRI’s 2017 fundraiser is live through the end of December! Our progress so far (updated live):


Donate Now


MIRI is a research nonprofit based in Berkeley, California with a mission of ensuring that smarter-than-human AI technology has a positive impact on the world. You can learn more about our work at “Why AI Safety?” or via MIRI Executive Director Nate Soares’ Google talk on AI alignment.

In 2015, we discussed our interest in potentially branching out to explore multiple research programs simultaneously once we could support a larger team. Following recent changes to our overall picture of the strategic landscape, we’re now moving ahead on that goal and starting to explore new research directions while also continuing to push on our agent foundations agenda. For more on our new views, see “There’s No Fire Alarm for Artificial General Intelligence” and our 2017 strategic update. We plan to expand on our relevant strategic thinking more in the coming weeks.

Our expanded research focus means that our research team can potentially grow big, and grow fast. Our current goal is to hire around ten new research staff over the next two years, mostly software engineers. If we succeed, our point estimate is that our 2018 budget will be $2.8M and our 2019 budget will be $3.5M, up from roughly $1.9M in 2017.1

We’ve set our fundraiser targets by estimating how quickly we could grow while maintaining a 1.5-year runway, on the simplifying assumption that about 1/3 of the donations we receive between now and the beginning of 2019 will come during our current fundraiser.2

Hitting Target 1 ($625k) then lets us act on our growth plans in 2018 (but not in 2019); Target 2 ($850k) lets us act on our full two-year growth plan; and in the case where our hiring goes better than expected, Target 3 ($1.25M) would allow us to add new members to our team about twice as quickly, or pay higher salaries for new research staff as needed.

We discuss more details below, both in terms of our current organizational activities and how we see our work fitting into the larger strategy space.

continue reading »

Announcing the AI Alignment Prize

7 cousin_it 03 November 2017 03:45PM

Stronger than human artificial intelligence would be dangerous to humanity. It is vital any such intelligence’s goals are aligned with humanity's goals. Maximizing the chance that this happens is a difficult, important and under-studied problem.

To encourage more and better work on this important problem, we (Zvi Mowshowitz and Vladimir Slepnev) are announcing a $5000 prize for publicly posted work advancing understanding of AI alignment, funded by Paul Christiano.

This prize will be awarded based on entries gathered over the next two months. If the prize is successful, we will award further prizes in the future.

This prize is not backed by or affiliated with any organization.


Your entry must be published online for the first time between November 3 and December 31, 2017, and contain novel ideas about AI alignment. Entries have no minimum or maximum size. Important ideas can be short!

Your entry must be written by you, and submitted before 9pm Pacific Time on December 31, 2017. Submit your entries either as URLs in the comments below, or by email to apply@ai-alignment.com. We may provide feedback on early entries to allow improvement.

We will award $5000 to between one and five winners. The first place winner will get at least $2500. The second place winner will get at least $1000. Other winners will get at least $500.

Entries will be judged subjectively. Final judgment will be by Paul Christiano. Prizes will be awarded on or before January 15, 2018.

What kind of work are we looking for?

AI Alignment focuses on ways to ensure that future smarter than human intelligence will have goals aligned with the goals of humanity. Many approaches to AI Alignment deserve attention. This includes technical and philosophical topics, as well as strategic research about related social, economic or political issues. A non-exhaustive list of technical and other topics can be found here.

We are not interested in research dealing with the dangers of existing machine learning systems commonly called AI that do not have smarter than human intelligence. These concerns are also understudied, but are not the subject of this prize except in the context of future smarter than human intelligence. We are also not interested in general AI research. We care about AI Alignment, which may or may not also advance the cause of general AI research.

[Link] Should we be spending no less on alternate foods than AI now?

2 denkenberger 30 October 2017 12:13AM

Halloween costume: Paperclipperer

5 Elo 21 October 2017 06:32AM

Original post: http://bearlamp.com.au/halloween-costume-paperclipperer/

Guidelines for becoming a paperclipperer for halloween.


  • Paperclips (some as a prop, make your life easier by buying some, but show effort by making your own)
  • pliers (extra pairs for extra effect)
  • metal wire (can get colourful for novelty) (Florist wire)
  • crazy hat (for character)
  • Paperclip props.  Think glasses frame, phone case, gloves, cufflinks, shoes, belt, jewellery...
  • if party going - Consider a gift that is suspiciously paperclip like.  example - paperclip coasters, paperclip vase, paperclip party-snack-bowl
  • Epic commitment - make fortune cookies with paperclips in them.  The possibilities are endless.
  • Epic: paperclip tattoo on the heart.  Slightly less epic, draw paperclips on yourself.


While at the party, use the pliers and wire to make paperclips.  When people are not watching, try to attach them to objects around the house (example, on light fittings, on the toilet paper roll, under the soap.  When people are watching you - try to give them to people to wear.  Also wear them on the edges of your clothing.

When people ask about it, offer to teach them to make paperclips.  Exclaim that it's really fun!  Be confused, bewildered or distant when you insist you can't explain why.

Remember that paperclipping is a compulsion and has no reason.  However that it's very important.  "you can stop any time" but after a few minutes you get fidgety and pull out a new pair of pliers and some wire to make some more paperclips.

Try to leave paperclips where they can be found the next day or the next week.  cutlery drawers, in the fridge, on the windowsills.  And generally around the place.  The more home made paperclips the better.

Try to get faster at making paperclips, try to encourage competitions in making paperclips.

Hints for conversation:

  • Are spiral galaxies actually just really big paperclips?
  • Have you heard the good word of our lord and saviour paperclips?
  • Would you like some paperclips in your tea?
  • How many paperclips would you sell your internal organs for?
  • Do you also dream about paperclips (best to have a dream prepared to share)


The better you are at the character, the more likely someone might try to spoil your character by getting in your way, stealing your props, taking your paperclips.  The more you are okay with it, the better.  ideas like, "that's okay, there will be more paperclips".  This is also why you might be good to have a few pairs of pliers and wire.  Also know when to quit the battles and walk away.  This whole thing is about having fun.  Have fun!

Meta: chances are that other people who also read this will not be the paperclipper for halloween.  Which means that you can do it without fear that your friends will copy.  Feel free to share pictures!

Cross posted to lesserwrong: 

[Link] New program can beat Alpha Go, didn't need input from human games

6 NancyLebovitz 18 October 2017 08:01PM

Examples of AI's behaving badly

26 Stuart_Armstrong 16 July 2015 10:01AM

Some past examples to motivate thought on how AI's could misbehave:

An algorithm pauses the game to never lose at Tetris.

In "Learning to Drive a Bicycle using Reinforcement Learning and Shaping", Randlov and Alstrom, describes a system that learns to ride a simulated bicycle to a particular location. To speed up learning, they provided positive rewards whenever the agent made progress towards the goal. The agent learned to ride in tiny circles near the start state because no penalty was incurred from riding away from the goal.

A similar problem occurred with a football (soccer) playing robot being trained by David Andre and Astro Teller (personal communication to Stuart Russell). Because possession in soccer is important, they provided a reward for touching the ball. The agent learned a policy whereby it remained next to the ball and “vibrated,” touching the ball as frequently as possible. 

Algorithms claiming credit in Eurisko: Sometimes a "mutant" heuristic appears that does little more than continually cause itself to be triggered, creating within the program an infinite loop. During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

There was something else going on, though. The AI was crafting super weapons that the designers had never intended. Players would be pulled into fights against ships armed with ridiculous weapons that would cut them to pieces. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities," according to a post written by Frontier community manager Zac Antonaci. "Meaning that all new and never before seen (sometimes devastating) weapons were created, such as a rail gun with the fire rate of a pulse laser. These appear to have been compounded by the additional stats and abilities of the engineers weaponry."

Programs classifying gender based on photos of irises may have been artificially effective due to mascara in the photos.

A robot which was supposed to grasp items instead positioned its manipulator in between the camera and the object so that it only appeared to be grasping it.

Toy model of the AI control problem: animated version

7 Stuart_Armstrong 10 October 2017 11:12AM

Crossposted at LessWrong 2.0.

A few years back, I came up with a toy model of the AI control problem. It has a robot moving boxes into a hole, with a slightly different goal than it's human designers, and a security camera to check that it's behaving as it should. The robot learns to block the camera to get its highest reward.

I've been told that the model is useful for explaining the control problem quite a few people, and I've always wanted to program the "robot" and get an animated version of it. Gwern had a live demo, but it didn't illustrate all the things I wanted to.

So I programmed the toy problem in python, and generated a video with commentary.

In this simplified version, the state space is sufficiently small that you can explicitly generate the whole table of Q-values (expected reward for taking an action in a certain state, assuming otherwise optimal policy). Since behaviour is deterministic, this can be updated in dynamical programming, using a full width backup. The number of such backups essentially measures the depth of the robot's predictive ability.

The most interesting depths of planning are:

  • At depth 0, the robot behaves randomly.
  • At depth 3, the robot crashes itself efficiently, to avoid loosing too much reward.
  • At depth 6, the robot figures out how to get one box into the hole. The automated camera turns it off.
  • At depth 17, the robot finally starts to deceive/conceal/manipulate, by blocking the camera and pushing two boxes into the hole. Note that the robot's deception comes from the fact that its incentives are misaligned, and that humans tried to control it.
  • At depth 18, the robot efficiently does the plan from depth 17.
  • At depth 20, the robot does the maximally efficient plan: blocking the camera, and pushing all boxes into the hole.
  • At depth 32, the robot has the correct Q-values for the maximally efficient plan.
  • At depth 45, finally, the Q-value table is fully updated, and the robot will take maximally efficient, and, if need be, deceptive plans from any robot/box starting positions.

The code and images can be found here.

Rational Feed

7 deluks917 17 September 2017 10:03PM

Note: I am trying out a weekly feed. 

===Highly Recommended Articles:

Superintelligence Risk Project: Conclusion by Jeff Kaufman - "I'm not convinced that AI risk should be highly prioritized, but I'm also not convinced that it shouldn't. Highly qualified researchers in a position to have a good sense the field have massively different views on core questions like how capable ML systems are now, how capable they will be soon, and how we can influence their development." There are links to all the previous posts. The final write up goes into some detail about MIRI's research program and an alternative safety paradigm connected to openAI.

On Bottlenecks To Intellectual Progress In The by Habryka (lesswrong) - Why LessWrong 2.0 is a project worth pursuing. A summary of the existing discussion around LessWrong 2.0. The models used to design the new page. Open questions.

Patriarchy Is The Problem by Sarah Constantin - Dominance hierarchies and stress in low status monkeys. Serotnonin levels and the abuse cycles. Complex Post Traumatic Stress Disorder. Submission displays. Morality-As-Submission vs. Morality-As-Pattern. The biblical God and the Golden Calf.

Ea Survey 2017 Series Donation Data by Tee (EA forum) - How Much are EAs Donating? Percentage of Income Donated. Donations Data Among EAs Earning to Give (who donated 57% of the total). Comparisons to 2014 and 2015. Donations totals were very heavily skewed by large donors.


Classified Thread 3 Semper Classifiedelis by Scott Alexander - " Post advertisements, personals, and any interesting success stories from the last thread". Scott's notes: Community member starting tutoring company, homeless community member gofundme, data science in North Carolina.

Toward A Predictive Theory Of Depression by Scott Alexander - "If the brain works to minimize prediction error, isn’t its best strategy to sit in a dark room and do nothing forever? After all, then it can predict its sense-data pretty much perfectly – it’ll always just stay “darkened room”." But why would low confidence cause sadness? Well, what, really, is emotion?

Promising Projects for Open Science To by SlateStarScratchpad - Scott answers what the most promising projects are in the field of transparent and open science and meta-science.

Ot84 Threadictive Processing by Scott Alexander - New sidebar ad for social interaction questions. Sidebar policy and feedback. Selected Comments: Animal instincts, the connectome, novel concepts encoded in the same brain areas across animals, hard coded fear of snakes, kitten's who can't see horizontal lines.


Peer Review Younger Think by Marginal Revolution - Peer Review as a concept only dates to the early seventies.

The Wedding Ceremony by Jacob Falkovich - Jacob gets married. Marriage is really about two agents exchanging their utility functions for the average utility function of the pair. Very funny.

Fish Oil And The Self Critical Brain Loop by Elo - Taking fish oil stopped ELO from getting distracted by a critical feedback loop.

Against Facebook The Stalking by Zvi Moshowitz - Zvi removes Facebook from his phone. Facebook proceeds to start emailing him and eventually starts texting him.

Postmortem: Mindlevelup The Book by mindlevelup - Estimates vs reality. Finishing both on-target and on-time. Finished product vs expectations. Took more time to write than expected. Going Against The Incentive Gradient. Impact evaluation. What Even is Rationality? Final Lessons.

Prepare For Nuclear Winter by Robin Hanson - Between nuclear war and natural disaster Robin estimates there is about a 1 in 10K chance per year that most sunlight is blocked for 5-10 years. This aggregates to about 1% per century. We have the technology to survive this as a species. But how do we preserve social order?

Nonfiction Ive Been Reading Lately by Particular Virtue - Selfish Reasons to Have More Kids. Eating Animals. Your Money Or Your Life. The Commitment.

Dealism by Bayesian Investor - "Under dealism, morality consists of rules / agreements / deals, especially those that can be universalized. We become more civilized as we coordinate better to produce more cooperative deals." Dealism is similar to contractualism with a larger set of agents and less dependence on initial conditions.

On Bottlenecks To Intellectual Progress In The by Habryka (lesswrong) - Why LessWrong 2.0 is a project worth pursuing. A summary of the existing discussion around LessWrong 2.0. The models used to design the new page. Open questions.

Lw 20 Open Beta Starts 920 by vanier (lesswrong) - The new site goes live on September 20th.

2017 Lesswrong Survey by ingres (lesswrong) - Take the survey! Community demographics, politics, Lesswrong 2.0 and more!

Contra Yudkowsky On Quidditch And A Meta Point by Tom Bartleby - Eliezer criticizes Quiditch in HPMOR. Why the snitch makes Quiditch great. Quidditch is not about winning matches its about scoring points over a series of games. Harry/Eliezer's mistake is the Achilles heel of rationalists. If lots of people have chosen not to tear down a fence you shouldn't either, even if you think you understand why the fence went up.

Whats Appeal Anonymous Message Apps by Brute Reason - Fundamental lack of honesty. Western culture is highly hostile to the idea that some behaviors (ex lying) might be ok in some contexts but not in others. Compliments. Feedback. Openness.

Meritocracy Vs Trust by Particular Virtue - "If I know you can reject me for lack of skill, I may worry about this and lose confidence. But if I know you never will, I may phone it in and stop caring about my actual work output." Trust Improves Productivity But So Does Meritocracy. Minimum Hiring Bars and Other Solutions.

Is Feedback Suffering by Gordan (Map and Territory) - The future will probably have many orders of magnitude more entities than today, and those entities may be very weird. How do we determine if the future will have order of magnitude more suffering? Phenomenology of Suffering. Panpsychism and Suffering. Feedback is desire but necessarily suffering. Contentment wraps suffering in happiness. Many things may be able to suffer.

Epistemic Spot Check Exercise For Mood And Anxiety by Aceso Under Glass - Outline: Evidence that exercise is very helpful and why, to create motivation. Setting up an environment where exercise requires relatively little will power to start. Scripts and advice to make exercise as unmiserable as possible. Scripts and advice to milk as much mood benefit as possible. An idiotic chapter on weight and food. Spit Check: Theory is supported, advice follows from theory, no direct proof the methods work.

Best Of Dont Worry About The Vase by Zvi Moshowitz - Zvi's best posts. Top5 posts for Marginal Revolution Readers. Top5 in general. Against Facebook Series. Choices are Bad series. Rationalist Culture and Ideas (for outsiders and insiders). Decision theory. About Rationality.


Superintelligence Risk Project: Conclusion by Jeff Kaufman - "I'm not convinced that AI risk should be highly prioritized, but I'm also not convinced that it shouldn't. Highly qualified researchers in a position to have a good sense the field have massively different views on core questions like how capable ML systems are now, how capable they will be soon, and how we can influence their development." There are links to all the previous posts. The final write up goes into some detail about MIRI's research program and an alternative safety paradigm connected to openAI.

Understanding Policy Gradients by Squirrel In Hell - Three perspectives on mathematical thinking: engineering/practical, symbolic/formal and deep understanding/above. Application of the theory to understanding policy gradients and reinforcement learning.

Learning To Model Other Minds by Open Ai - "We’re releasing an algorithm which accounts for the fact that other agents are learning too, and discovers self-interested yet collaborative strategies like tit-for-tat in the iterated prisoner’s dilemma."

Hillary Clinton On Ai Risk by Luke Muehlhauser - A quote by Hilary Clinton showing that she is increasingly concerned about AI risk. She thinks politicians need to stop playing catch-up with technological change.


Welfare Differences Between Cage And Cage Free Housing by Open Philosophy - OpenPhil funded several campaigns to promote cage free eggs. They now believe they were overconfident in their claims that a cage free system would be substantially better. Hen welfare, hen mortality, transition costs and other issues are discussed.

Ea Survey 2017 Series Donation Data by Tee (EA forum) - How Much are EAs Donating? Percentage of Income Donated. Donations Data Among EAs Earning to Give (who donated 57% of the total). Comparisons to 2014 and 2015. Donations totals were very heavily skewed by large donors.

===Politics and Economics:

Men Not Earning by Marginal Revolution - Decline in lifetime wages is rooted in lower wages at early ages, around 25. "I wonder sometimes if a Malthusian/Marxian story might be at work here. At relevant margins, perhaps it is always easier to talk/pay a woman to do a quality hour’s additional work than to talk/pay a man to do the same."

Great Wage Stagnation is Over by Marginal Revolution - Median household incomes rose by 5.2 percent. Gains were concentrated in lower income households. Especially large gains for hispanics, women living alone and immigrants. Some of these increases are the largest in decades.

There Is A Hot Hand After All by Marginal Revolution - Paper link and blurb. "We test for a “hot hand” (i.e., short-term predictability in performance) in Major League Baseball using panel data. We find strong evidence for its existence in all 10 statistical categories we consider. The magnitudes are significant; being “hot” corresponds to between one-half and one standard deviation in the distribution of player abilities."

Public Shaming Isnt As Bad As It Seems by Tom Bartleby - Online mobs are like shark attacks. Damore's economic prospects. Either targets are controversial and get support or uncontroversial and the outrage quickly abates. Justine Sacco. Success of public shaming is orthogonal to truth.

Hoe Cultures A Type Of Non Patriarchal Society by Sarah Constantin - Cultures that farmed with the plow developed classical patriarchy. Hoe cultures that practiced horticulture or large scale gardening developed different gender norms. In plow cultures women are economically dependent on me, in how cultures its the reverse. How cultures had more leisure but less material abundance. Hoe cultures aren't feminist.

Patriarchy Is The Problem by Sarah Constantin - Dominance hierarchies and stress in low status monkeys. Serotnonin levels and the abuse cycles. Complex Post Traumatic Stress Disorder. Submission displays. Morality-As-Submission vs. Morality-As-Pattern. The biblical God and the Golden Calf.

Three Wild Speculations From Amateur Quantitative Macro History by Luke Muehlhauser - Measuring the impact of the industrial revolution: Physical health, Economic well-being, Energy capture, Technological empowerment, Political freedom. Three speculations: Human wellbeing was terrible into the Industrial Revolution then rapidly improved. Most variance in wellbeing is captured by productivity and political freedom. It would take at least 15% of the world to die to knock the world off its current trajectory.

Whats Wrong With Thrive/Survive by Bryan Caplan - Unless you cherry-pick the time and place, it is simply not true that society is drifting leftward. A standard leftist view is that free-market "neoliberal" policies now rule the world. Radical left parties almost invariably ruled countries near the "survive" pole, not the "thrive" pole. You could deny that Communist regimes were "genuinely leftist," but that's pretty desperate. Many big social issues that divide left and right in rich countries like the U.S. directly contradict Thrive/Survive. Major war provides an excellent natural experiment for Thrive/Survive.

Gender Gap Stem by Marginal Revolution - Discussion of a recent paper. "Put (too) simply the only men who are good enough to get into university are men who are good at STEM. Women are good enough to get into non-STEM and STEM fields. Thus, among university students, women dominate in the non-STEM fields and men survive in the STEM fields."

Too Much Of A Good Thing by Robin Hanson - Global warming poll. Are we doing too much/little. Is it possible to do too little/much. "When people are especially eager to show allegiance to moral allies, they often let themselves be especially irrational."


Tim Schafer Videogame Roundup by Aceso Under Glass - Review and discussion of Psychonauts and Massive Chalice. Light discussion of other Schafer games.

Why Numbering Should Start At One by Artir - the author responds to many well known arguments in favor of 0-indexing.

Still Feel Anxious About Communication Every Day by Brute Reason - Setting boundaries. Telling people they hurt you. Doing these things without anxiety might be impossible, you have to do it anyway.

Burning Man by Qualia Computing - Write up of a Burning Man trip. Very long. Introduction. Strong Emergence. The People. Metaphysics. The Strong Tlön Hypothesis. Merging with Other Humans. Fear, Danger, and Tragedy. Post-Darwinian Sexuality and Reproduction. Economy of Thoughts about the Human Experience. Transcending Our Shibboleths. Closing Thoughts.

The Big List Of Existing Things by Everything Studies - Existence of fictional and possible people. Heaps and the Sorites paradox. Categories and basic building blocks. Relational databases. Implicit maps and territories. Which maps and concepts should we use?

Times To Die Mental Health I by (Status 451) - Personal thoughts on depression and suicide. "The depressed person is not seem crying all the time. It is in this way that the depressed person becomes invisible, even to themselves. Yet, positivity culture and the rise of progressive values that elude any conversation about suicide that is not about saving, occlude the unthinkable truth of someone’s existence, that they simply should not be living anymore."

Astronomy Problem by protokol2020 - Star-star occultation probability.


The Impossible War by Waking Up with Sam Harris - " Ken Burns and Lynn Novick about their latest film, The Vietnam War."

Is It Time For A New Scientific Revolution Julia Galef On How To Make Humans Smarter by 80,000 Hours - How people can have productive intellectual disagreements. Urban Design. Are people more rational than 200 years ago? Effective Altruism. Twitter. Should more people write books, run podcasts, or become public intellectuals? Saying you don't believe X won't convince people. Quitting an econ phd. Incentives in the intelligence community. Big institutions. Careers in rationality.

Parenting As A Rationalist by The Bayesian Conspiracy - Desire to protect kids is as natural as the need for human contact in general. Motivation to protect your children. Blackmail by threatening children. Parenting is a new sort of positive qualia. Support from family and friends. Complimenting effort and specific actions not general properties. Mindfulness. Treating kids as people. Handling kid's emotions. Non-violent communication.

The Nature Of Consciousness by Waking Up with Sam Harris - "The scientific and experiential understanding of consciousness. The significance of WWII for the history of ideas, the role of intuition in science, the ethics of building conscious AI, the self as an hallucination, how we identify with our thoughts, attention as the root of the feeling of self, the place of Eastern philosophy in Western science, and the limitations of secular humanism."

A16z Podcast On Trade by Noah Smith - Notes on a podcast Noah appeared on. Topics: Cheap labor as a substitute for automation. Adjustment friction. Exports and productivity.

Gillian Hadfiel by EconTalk - " Hadfield suggests the competitive provision of regulation with government oversight as a way to improve the flexibility and effectiveness of regulation in the dynamic digital world we are living in."

The Turing Test by Ales Fidr (EA forum) - Harvard EA podcast: "The first four episodes feature Larry Summers on his career, economics and EA, Irene Pepperberg on animal cognition and ethics, Josh Greene on moral cognition and EA, Adam Marblestone on incentives in science, differential technological development"

[Link] General and Surprising

3 John_Maxwell_IV 15 September 2017 06:33AM

View more: Next