Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Experience of typical mind fallacy.

0 Elo 27 April 2015 06:39PM

following on from:


I am quite sure in my experience that at some point between the ages of 10-15 I concluded that; "no the rest of the world does not think like me, I think in an unusual way".

This idea disagrees with the typical mind fallacy (where people outwardly generalise to think everyone else has similar minds to their own).

I suspect I started with a typical mind model of the world but at some point it broke badly enough that I re-modelled on "I just think differently to most others".

I wanted to start a new discussion; rather than continuing on from one in 2009;

Where do your experiences lie in relation to typical minds?

Nick Bostrom's TED talk on Superintelligence is now online

8 chaosmage 27 April 2015 03:15PM


Artificial intelligence is getting smarter by leaps and bounds — within this century, research suggests, a computer AI could be as "smart" as a human being. And then, says Nick Bostrom, it will overtake us: "Machine intelligence is the last invention that humanity will ever need to make." A philosopher and technologist, Bostrom asks us to think hard about the world we're building right now, driven by thinking machines. Will our smart machines help to preserve humanity and our values — or will they have values of their own?

I realize this might go into a post in a media thread, rather than its own topic, but it seems big enough, and likely-to-prompt-discussion enough, to have its own thread.

I liked the talk, although it was less polished than TED talks often are. What was missing I think was any indication of how to solve the problem. He could be seen as just an ivory tower philosopher speculating on something that might be a problem one day, because apart from mentioning in the beginning that he works with mathematicians and IT guys, he really does not give an impression that this problem is already being actively worked on.

Open Thread, Apr. 27 - May 3, 2015

0 Gondolinian 27 April 2015 12:18AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

How to sign up for Alcor cryo

25 oge 26 April 2015 02:51AM

I wrote an article about the process of signing up for cryo since I couldn't find any such accounts online. If you have questions about the sign-up process, just ask.

A few months ago, I signed up for Alcor's brain-only cryopreservation. The entire process took me 11 weeks from the day I started till the day I received my medical bracelet (the thing that’ll let paramedics know that your dead body should be handled by Alcor). I paid them $90 for the application fee. From now on, every year I’ll pay $530 for Alcor membership fees, and also pay $275 for my separately purchased life insurance.


Rational discussion of politics

8 cleonid 25 April 2015 09:58PM

In a recent poll, many LW members expressed interest in a separate website for rational discussion of political topics. The website has been created, but we need a group of volunteers to help us test it and calibrate its recommendation system (see below).

If you would like to help (by participating in one or two discussions and giving us your feedback) please sign up here.



About individual recommendation system

All internet forums face a choice between freedom of speech and quality of debate. In absence of censorship, constructive discussions can be easily disrupted by the inflow of the mind-killed which causes the more intelligent participants to leave or descend to the same level.

Preserving quality thus usually requires at least one of the following methods:

  1.  Appointing censors (a.k.a. moderators).
  2.  Limiting membership.
  3.  Declaring certain topics (e.g., politics) off limits.

On the new website, we are going to experiment with a different method. In brief, the idea is to use an automated recommendation system which sorts content, raising the best comments to the top and (optionally) hiding the worst. The sorting is done based on the individual preferences, allowing each user to avoid what he or she (rather than moderators or anyone else) defines as low quality content. In this way we should be able to enhance quality without imposing limits on free speech.


UPDATE. The discussions are scheduled to start on May 1. 

Weekly LW Meetups

1 FrankAdamek 24 April 2015 04:27PM

This summary was posted to LW Main on April 17th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

Limited agents need approximate induction

7 Manfred 24 April 2015 07:42AM

[This post borders on some well-trodden ground in information theory and machine learning, so ideas in this post have an above-average chance of having already been stated elsewhere, by professionals, better.]

I: Introduction

I am fascinated by methods of thinking that work for well-understood reasons - that follow the steps of a mathematically elegant dance. If one has infinite computing power the method of choice is something like Solomonoff induction, which is provably ideal in a certain way at predicting the world. But if you have limited computing power, the choreography is harder to find.

To do Solomonoff induction, you search through all Turing machine hypotheses to find the ones that exactly output your data so far, then use the weighted average of those perfect retrodictors to predict the next time step. So the naivest way to build an ideal limited agent is to merely search through lots of hypotheses (chosen from some simple set) rather than all of them, and only run each Turing machine for time less than some limit. At least it's guaranteed to work in the limit of large computing power, which ain't nothing.

Suppose then that we take this nice elegant algorithm for a general predictor, and we implement it on today's largest supercomputer, and we show it the stock market prices from the last 50 years to try to predict stocks and get very rich. What happens?

Bupkis happens, that's what. Our Solomonoff predictor tries a whole lot of Turing machines and then runs out of time before finding any useful hypotheses that can perfectly replicate 50 years of stock prices. This is because such useful hypotheses are very, very, very rare.

We might then turn to the burgeoning field of logical uncertainty, which has a major goal of handling intractable math problems in an elegant and timely manner. We are logically uncertain about what distribution Solomonoff induction will output, so can we just average over that logical uncertainty to get some expected stock prices?

The trouble with this is that current logical uncertainty methods rely on proofs that certain outputs are impossible or contradictory. For simple questions this can narrow down the answers, but for complicated problems it becomes intractable, replacing the hard problem of evaluating lots of Turing machines with the hard problem of searching through lots and lots of proofs about lots of Turing machines - and so again our predictor runs out of time before becoming useful.

In practice, the methods we've found to work don't look very much like Solomonoff induction. Successful methods don't take the data as-is, but instead throw some of it away: curve fitting and smoothing data, filtering out hard-to-understand signals as noise, and using predictive models that approximate reality imperfectly. The sorts of things that people trying to predict stocks are already doing. These methods are vital to improve computational tractability, but are difficult (to my knowledge) to fit into a framework as general as Solomonoff induction.

II: Rambling

Suppose that we allow a wide variety of models, including lossy models. How would a general purpose AI figure out which model to choose? Ideally we'd like to make a tradeoff between the accuracy of the model, measured in the expected utility of how accurate you expect the model's predictions to be, and the benefit of actually finishing on time, measured by a cost function of the resources used.

Once we know how to tell good models, the last piece would be for our agent to make the explore/exploit tradeoff between searching for better models and using its current best.

There are various techniques to estimate resource usage, but how does one estimate accuracy?

Here was my first thought: If you know how much information you're losing (e.g. by binning data), for discrete distributions this sets the Shannon information of the ideal value (given by Solomonoff prediction) given the predicted value. This uses the relationship between information in bits of data and Shannon information that determines how sharp your probability distribution is allowed to be.

But with no guarantees about the normality (or similar niceness properties) of the ideal value given the prediction, this isn't very helpful. The problem is highlighted by hurricane prediction. If hurricanes behaved nicely as we threw away information, weather models would just be small, high-entropy deviations from reality. Instead, hurricanes can change route greatly even with small differences in initial conditions.

The failure of the above approach can be explained in a very general way: it uses too little information about the model and the data, only the amount of information thrown away. To do better, our agent has to learn a lot from its training data - a subject that workers in AI have already been hard at work on. On the one hand, it's a great sign if we can eventually connect ideal agents to current successful algorithms. On the other, doing so elegantly seems like a hard problem.

To sum up in the blandest possible way: If we want to build successful predictors of the future with limited resources, they should use their experience to learn approximate models of the world.

The real trick, though, is going to be to set this on a solid foundation. What makes a successful method of picking models? As we lack access to the future (yet! Growth mindset!), we can't grade models based on their future predictions unless we descend to solipsism and grade models against models. Thus we're left with grading models based on how well they retrodict the data so far. Sound familiar? The foundation we want seems like an analogue to Solomonoff induction, one that works for known reasons but doesn't require perfection.

III:  An Example

Here's a paradigm that might or might not be a step in the right direction, but at least gestures at what I mean.

The first piece of the puzzle is that a model that gets proportion P of training bits wrong can be converted to a Solomonoff-accepted perfectly-precise model just by specifying the bits it gets wrong. Suppose we break the model output (with total length N) into chunks of size L, and prefix each chunk with the locations of the wrong bits in that chunk. Then the extra data required to rectify an approximate model is at most N/L·log(P·L)+N·P·log(L). Then the hypothesis where the model is right about the next bit is simpler than the hypothesis when it's wrong, because when the model is right you don't have to spend ~log(L) bits correcting it.

In this way, Solomonoff induction natively cares about some approximate models' predictions. There are some interesting details here that are outside the focus of this particular post. Does using the optimal chunk length lead to Solomonoff induction reflecting model accuracy correctly? What are some better schemes for rectifying models that handle things like models that output probabilities? The point is just that even if your model is wrong on fraction P of the training data, Solomonoff induction will still promote it as long as it's simpler than N-N/L·log(P·L)-N·P·log(L).

The second piece of the puzzle is that induction can be done over processed functions of observations, like smoothing the data or filtering difficult-to-predict parts (noise) out. If this processing increases the accuracy of models, we can use this to make high-accuracy models of functions the training data, and then use those models to predict the the processed future observations as above.

These two pieces allow an agent to use approximate models, and to throw away some of its information, and still have its predictions work for the same reason as Solomonoff induction. We can use this paradigm to interpret what an algorithm like curve fitting is doing - the fitted curve is a high-accuracy retrodiction of some smoothed function of the data, which therefore does a good job of predicting what that smoothed function will be in the future.

There are some issues here. If a model that you are using is not the simplest, it might have overfitting problems (though perhaps you can fix this just by throwing away more information than naively appears necessary) or systematic bias. More generally, we haven't explored how models get chosen; we've made the problem easier to brute force but we need to understand non-brute force search methods and what their foundations are. It's a useful habit to keep in mind what actually works for humans - as someone put it to me recently, "humans can make models they understand that work for reasons they understand."

Furthermore, this doesn't seem to capture reductionism well. If our agent learns some laws of physics and then is faced with a big complicated situation it needs to use a simplified model to make a prediction about, it should still in some sense "believe in the laws of physics," and not believe that this complicated situation violates the laws physics even if its current best model is independent of physics.

IV: Logical Uncertainty

It may be possible to relate this back to logical uncertainty - where by "this" I mean the general thesis of predicting the future by building models that are allowed to be imperfect, not the specific example in part III. Soares and Fallenstein use the example of a complex Rube Goldberg machine that deposits a ball into one of several chutes. Given the design of the machine and the laws of physics, suppose that one can in principle predict the output of this machine, but that the problem is much too hard for our computer to do. So rather than having a deterministic method that outputs the right answer, a "logical uncertainty method" in this problem is one that, with a reasonable amount of resources spent, takes in the description of the machine and the laws of physics, and gives a probability distribution over the machine's outputs.

Meanwhile, suppose that we take an approximately inductive predictor and somehow teach it the the laws of physics, then ask it to predict the machine. We'd like it to make predictions via some appropriately simplified folk model of physics. If this model gives a probability distribution over outcomes - like in the simple case of "if you flip this coin in this exact way, it has a 50% shot at landing heads" - doesn't that make it a logical uncertainty method? But note that the probability distribution returned by a single model is not actually the uncertainty introduced by replacing an ideal predictor with a resource-limited predictor. So any measurement of logical uncertainty has to factor in the uncertainty between models, not just the uncertainty within models.

Again, we're back to looking for some prediction method that weights models with some goodness metric more forgiving than just using perfectly-retrodicting Turing machines, and which outputs a probability distribution that includes model uncertainty. But can we apply this to mathematical questions, and not just Rube Goldberg machines? Is there some way to subtract away the machine and leave the math?

Suppose that our approximate predictor was fed math problems and solutions, and built simple, tractable programs to explain its observations. For easy math problems a successful model can just be a Turing machine that finds the right answer. As the math problems get more intractable, successful models will start to become logical uncertainty methods, like how we can't predict a large prime number exactly, but we can predict it's last digit is 1, 3, 7, or 9. Within this realm we have something like low-level reductionism, where even though we can't find a proof of the right answer, we still want to act as if mathematical proofs work and all else is ignorance, and this will help us make successful predictions.

Then we have complicated problems that seem to be beyond this realm, like P=NP. Humans certainly seem to have generated some strong opinions about P=NP without dependence on mathematical proofs narrowing down the options. It seems to such humans that the genuinely right procedure to follow is that, since we've searched long and hard for a fast algorithm for NP-complete problems without success, we should update in the direction that no such algorithm exists. In approximate-Solomonoff-speak, it's that P!=NP is consistent with a simple, tractable explanation for (a recognizable subset of) our observations, while P=NP is only consistent with more complicated tractable explanations. We could absolutely make a predictor that reasons this way - it just sets a few degrees of freedom. But is it the right way to reason?

For one thing, this seems like it's following Gaifman's proposed property of logical uncertainty, that seeing enough examples of something should convince you of it with probability 1 - which has been shown to be "too strong" in some sense (it assigns probability 0 to some true statements - though even this could be okay if those statements are infinitely dilute). Does the most straightforward implementation actually have the Gaifman condition, or not? (I'm sorry, ma'am. Your daughter has... the Gaifman condition.)

This inductive view of logical uncertainty lacks the consistent nature of many other approaches - if it works, it does so by changing approaches to suit the problem at hand. This is bad if you want your logical uncertainty methods to be based on a simple prior followed by some kind of updating procedure. But logical uncertainty is supposed to be practical, after all, and at least this is a simple meta-procedure.

V: Questions

Thanks for reading this post. In conclusion, here are some of my questions:

What's the role of Solomonoff induction in approximate induction? Is Solomonoff induction doing all of the work, or is it possible to make useful predictions using tractable hypotheses Solomonoff induction would exclude, or excluding intractable hypotheses Solomonoff induction would have to include?

Somehow we have to pick out models to promote to attention in the first place. What properties make a process for this good or bad? What methods for picking models can be shown to still lead to making useful predictions - and not merely in the limit of lots of computing time?

Are humans doing the right thing by making models they understand that work for reasons they understand? What's up with that reductionism problem anyhow?

Is it possible to formalize the predictor discussed in the context of logical uncertainty? Does it have to fulfill Gaifman's condition if it finds patterns in things like P!=NP?

Does this whole edifice even make sense?

Moral Anti-Epistemology

-3 Lukas_Gloor 24 April 2015 03:30AM

This post is a half-baked idea that I'm posting here in order to get feedback and further brainstorming. There seem to be some interesting parallels between epistemology and ethics.

Part 1: Moral Anti-Epistemology

"Anti-Epistemology" refers to bad rules of reasoning that exist not because they are useful/truth-tracking, but because they are good at preserving people's cherished beliefs about the world. But cherished beliefs don't just concern factual questions, they also very much concern moral issues. Therefore, we should expect there to be a lot of moral anti-epistemology. 

Tradition as a moral argument, tu quoque, opposition to the use of thought experiments, the noncentral fallacy, slogans like "morality is from humans for humans" – all these are instances of the same general phenomenon. This is trivial and doesn't add much to the already well-known fact that humans often rationalize, but it does add the memetic perspective: Moral rationalizations sometimes concern more than a singular instance, they can affect the entire way people reason about morality. And like with religion or pseudoscience in epistemology about factual claims, there could be entire memeplexes centered around moral anti-epistemology. 

A complication is that metaethics is complicated; it is unclear what exactly moral reasoning is, and whether everyone is trying to do the same thing when they engage in what they think of as moral reasoning. Labelling something "moral anti-epistemology" would suggest that there is a correct way to think about morality. Is there? As long as we always make sure to clarify what it is that we're trying to accomplish, it would seem possible to differentiate between valid and invalid arguments in regard to the specified goal. And this is where moral anti-epistemology might cause troubles. 

Are there reasons to assume that certain popular ethical beliefs are a result of moral anti-epistemology? Deontology comes to mind (mostly because it's my usual suspect when it comes to odd reasoning in ethics), but what is it about deontology that relies on "faulty moral reasoning", if indeed there is something about it that does? How much of it relies on the noncentral fallacy, for instance? Is Yvain's personal opinion that "much of deontology is just an attempt to formalize and justify this fallacy" correct? The perspective of moral anti-epistemology would suggest that it is the other way around: Deontology might be the by-product of people applying the noncentral fallacy, which is done because it helps protect cherished beliefs. Which beliefs would that be? Perhaps the strongly felt intuition that "Some things are JUST WRONG?", which doesn't handle fuzzy concepts/boundaries well and therefore has to be combined with a dogmatic approach. It sounds somewhat plausible, but also really speculative. 

Part 2: Memetics

A lot of people are skeptical towards these memetical just-so stories. They argue that the points made are either too trivial, or too speculative. I have the intuition that a memetic perspective often helps clarify things, and my thoughts about applying the concept of anti-epistemology to ethics seemed like an insight, but I have a hard time coming up with how my expectations about the world have changed because of it. What, if anything, is the value of the idea I just presented? Can I now form a prediction to test whether deontologists want to primarily formalize and justify the noncentral fallacy, or whether they instead want to justify something else by making use of the noncentral fallacy?

Anti-epistemology is a more general model of what is going on in the world than rationalizations are, so it should all reduce to rationalizations in the end. So it shouldn't be worrying that I don't magically find more stuff. Perhaps my expectations were too high and I should be content with having found a way to categorize moral rationalizations, the knowledge of which will make me slightly quicker at spotting or predicting them.


LessWrong Experience of Flavours

1 Elo 24 April 2015 01:02AM

Following on from: 

I would like to ask for other people's experience of flavours.  I am dividing food into significant categories that I can think of.  I don't really like the 5 tastes categories for this task, but I am aware of them.  This post is meant to be about taste preference although it might end up about dietary preferences.

continue reading »

The Effective Altruism Handbook

14 RyanCarey 24 April 2015 12:30AM

Lots of people want to help others but lack information about how to do so effectively. Thanks to the growing effective altruism movement, lots of essays have been written around the topic of charity effectiveness over the last five years. And many of the key insights are gathered together in the Effective Altruism Handbook, which has become available today.

The Effective Altruism Handbook includes an introduction by William MacAskill and Peter Singer followed by five sections. The first section motivates the rest of the book, giving an overview of why people care about effectiveness. The second through fourth sections address tricky decisions involved in helping others: evaluating charities, choosing a career and prioritizing causes. In the final section, the leaders of seven organizations describe why they're doing what they're doing, and describe the kinds of activities they consider especially helpful.

A lot of conversations have gone into picking out the materials for this compilation, so I hope you enjoy reading it! Or, for those who are already familiar with its concepts, sharing it with friends.

The Effective Altruism Handbook can be freely downloaded here.

There are also epub and mobi versions for readers using ebook devices, although their formatting has not been edited as carefully.

Thanks to all of the authors in this compilation for writing their essays in the first place, as well as for making them available for the Handbook. Thanks to Alex Vermeer from MIRI, whose experience and assistance in producing a LaTeX book was invaluable. Thanks also to Bastian Stern, the Centre for Effective Altruism, Peter Orr (for proofreading), and Lauryn Vaughan for cover design. Also, thanks kindly to Agata Sagan who is helping by making a Polish translation! It is always good to see useful ideas spread to a more linguistically diverse audience.

Lastly, here’s the full table of contents:

  • Introduction, Peter Singer and William MacAskill


  • The Drowning Child and the Expanding Circle, Peter Singer 
  • What is Effective Altruism, William MacAskill 
  • Scope Neglect, Eliezer Yudkowsky 
  • Tradeoffs, Julia Wise


  • Efficient Charity: Do Unto Others, Scott Alexander
  • “Efficiency” Measures Miss the Point, Dan Pallotta
  • How Not to Be a “White in Shining Armor”, Holden Karnofsky
  • Estimation Is the Best We Have, Katja Grace
  • Our Updated Top Charities, Elie Hassenfeld


  • Don’t Get a Job at a Charity: Work on Wall Street William MacAskill
  • High Impact Science Carl Shulman
  • How to Assess the Impact of a Career Ben Todd


  • Your Dollar Goes Further Overseas, GiveWell
  • The Haste Consideration, Matt Wage
  • Preventing Human Extinction, Nick Beckstead, Peter Singer & Matt Wage
  • Speciesism, Peter Singer
  • Four Focus Areas of Effective Altruism, Luke Muehlhauser


  • GiveWell, GiveWell
  • Giving What We Can, Michelle Hutchinson
  • The Life You Can Save, Charlie Bresler
  • 80,000 Hours, Ben Todd
  • Charity Science, Xiomara Kikauka
  • The Machine Intelligence Research Institute, Luke Muehlhauser
  • Animal Charity Evaluators, Jon Bockman

Dancing room for you and applied problems in foreign fields

-3 Romashka 23 April 2015 08:29AM

This is a place for matchmaking researchers with research ideas in applied sciences that have a sizeable impact on human condition.

Edit: let there be janitors.

What are the so-called adult problems in your field and who, in your opinion, is needed most to solve them?

Add reasons why you are not working on them yourself. (I, for example, am in a PhD program very remote from practical applications and might have a chance to do some research on useful stuff if I survive that long.)

Truth is holistic

8 MrMind 23 April 2015 07:26AM

You already know by now that truth is undefinable: by a famous result of Tarski, no formal system powerful enough (from now on, just system) can consistently talk about the truth of its own sentences.

You may however not know that Hamkins proved that truth is holistic.
Let me explain: while no system can talk about its own truth, it can nevertheless talk about the truth of its own substructures. For example, in every model of ZFC (the standard axioms of set theory) you can consistently define a model of standard arithmetics and a predicate that works as arithmetics' truth predicate. This can happen because ZFC is strictly more powerful than PA (the axioms of standard arithmetics).
Intuitively, one could think that if you have the same substructure in two different models, what they believe is the truth about that substructure is the same in both. Along this line, two models of ZFC ought to believe the same things about standard arithmetics.
However, it turns out this is not the case. Two different models extending ZFC may very well agree on which entities are standard natural numbers, and yet still disagree about which arithmetic sentences are true or false. For example, they could agree about the standard numbers, how the successor and addition operator works, and yet disagree on multiplication (corollary 7.1 in Hamkins' paper).
This means that when you can talk consistently about the truth of a model (that is, when you are in a more powerful formal system), that truth depends not only on the substructure, but on the entire structure you're immersed in. Figuratively speaking, local truth depends on global truth. Truth is holistic.
There's more: suppose that two model agree on the ontology of some common substructure. Suppose also that they agree about the truth predicate on that structure: they could still disagree about the meta-truths. Or the meta-meta-truths, etc., for all the ordinal levels of the definable truth predicates.

Another striking example from the same paper. There are two different extensions of set theory which agree on the structure of standard arithmetics and on the members of a subset A of natural numbers, and yet one thinks that A is first-order definable while the other thinks it's not (theorem 10).

Not even "being a model of ZFC" is an absolute property: there are two models which agree on an initial segment of the set hierarchy, and yet one thinks that the segment is a model of ZFC while the other proves that it's not (theorem 12).

Two concluding remarks: what I wrote was that there are different models which disagrees the truth of standard arithmetics, not that every different model has different arithmetic truths. Indeed, if two models have access one to the truth relation of the other, then they are bound to have the same truths. This is what happens for example when you prove absoluteness results in forcing.
I'm also remembered of de Blanc's ontological crises: changing ontology can screw with your utility function. It's interesting to note that updating (that is, changing model of reality) can change what you believe even if you don't change ontology.

Tally of LessWrong experience on Alcohol

3 Elo 22 April 2015 11:29PM

As a follow up post to:  http://lesswrong.com/lw/m2r/lesswrong_experience_on_alcohol/

I tallied the responses.


In rought categories:

Doesnt drink: 11

Drinks: 19

Drinks heavily: 4


Disclaimers; I had to make judegements as to people who didn't like alcohol who drink very rarely (but are not morally opposed to the thought of it), and people who drink regularly as to how much would put them into the "drinks heavily" category.  I think I did an okay job of it.

I wonder if LW (and other bodies) can make money for itself using similar click-thru tactics used on book buying but for online alcohol stores.  Drink responsibly!

I will try to update this tally if any more responses are received.


I hope the following question finds its way onto the lesswrong survey:

How regularly do you drink?

daily (or almost daily)

5 days a week

3 days a week

twice a week

once a week

a few times a month

less than 12 time a year

less than 2 times a year



(and possibly)

Has your drinking habits changed since last year?

I drink more

I drink the same

I drink different things but about the same amount of drinks

I drink less


(follow up post on spice preferences coming in a few hours)

(Edit*2: 25/4/15 added some commenters to the tally)

Happiness and Goodness as Universal Terminal Virtues

19 els 21 April 2015 04:42PM
Hi, I'm new to LessWrong. I stumbled onto this site a month ago, and ever since, I've been devouring Rationality: AI to Zombies faster than I used to go through my favorite fantasy novels. I've spent some time on website too, and I'm pretty intimidated about posting, since you guys all seem so smart and knowledgeable, but here goes... This is probably the first intellectual idea I've had in my life, so if you want to tear it to shreds, you are more than welcome to, but please be gentle with my feelings. :)
Edit: Thanks to many helpful comments, I've cleaned up the original post quite a bit and changed the title to reflect this. 


As humans, we seem to share the same terminal values, or terminal virtues. We want to do things that make ourselves happy, and we want to do things that make others happy. We want to 'become happy' and 'become good.' 

Because various determinants--including, for instance, personal fulfillment--can affect an individual's happiness, there is significant overlap between these ultimate motivators. Doing good for others usually brings us happiness. For example, donating to charity makes people feel warm and fuzzy. Some might recognize this overlap and conclude that all humans are entirely selfish, that even those who appear altruistic are subconsciously acting purely out of self-interest. Yet many of us choose to donate to charities that we believe do the most good per dollar, rather than handing out money through personal-happiness-optimizing random acts of kindness. Seemingly rational human beings sometimes make conscious decisions to inefficiently maximize their personal happiness for the sake of others. Consider Eliezer's example in Terminal Values and Instrumental Values of a mother who sacrifices her life for her son. 

Why would people do stuff that they know won't efficiently increase their happiness? Before I de-converted from Christianity and started to learn what evolution and natural selection actually were, before I realized that altruistic tendencies are partially genetic, it used to utterly mystify me that atheists would sometimes act so virtuously. I did believe that God gave them a conscience, but I kinda thought that surely someone rational enough to become an atheist would be rational enough to realize that his conscience didn't always lead him to his optimal mind-state, and work to overcome it. Personally, I used to joke with my friends that Christianity was the only thing stopping me from pursuing my true dream job of becoming a thief (strategy + challenge + adrenaline + variety = what more could I ask for?) Then, when I de-converted, it hit me: Hey, you know, Ellen, you really *could* become a thief now! What fun you could have!flinched from the thought. Why didn't I want to overcome my conscience, become a thief, and live a fun-filled life? Well, this isn't as baffling to me now, simply because I've changed where I draw the boundary. I've come to classify goodness as an end-in-itself, just like I'd always done with happiness. 

Becoming good

I first read about virtue ethics in On Terminal Goals and Virtue Ethics. As I read, I couldn't help but want to be a virtue ethicist and a consequentialist. Most virtues just seemed like instrumental values.

The post's author mentioned Divergent protagonist Tris as an example of virtue ethics:

Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’

I suspect that goodness is, perhaps subconsciously, a terminal virtue for the vast majority of virtue ethicists. I appreciate Oscar Wilde's writing in De Profundis:

Now I find hidden somewhere away in my nature something that tells me that nothing in the whole world is meaningless, and suffering least of all.. 

It is the last thing left in me, and the best: the ultimate discovery at which I have arrived, the starting-point for a fresh development. It has come to me right out of myself, so I know that it has come at the proper time. It could not have come before, nor later. Had anyone told me of it, I would have rejected it. Had it been brought to me, I would have refused it. As I found it, I want to keep it. I must do so...

Of all things it is the strangest.

Wilde's thoughts on humility translate quite nicely to an innate desire for goodness.

When presented with a conflict between an elected virtue, such as loyalty, or truth, and the underlying desire to be good, most virtue ethicists would likely abandon the elected virtue. With truth, consider the classic example of lying to Nazis to save Jews. Generally speaking, it is wrong to conceal the truth, but in special cases, most people would agree that lying is actually less wrong than truth-telling. I'm not certain, but my hunch is that most professing virtue ethicists would find that in extreme thought experiments, their terminal virtue of goodness would eventually trump their other virtues, too. 

Becoming happy

However, there's one exception. One desire can sometimes trump even the desire for goodness, and that's the desire for personal happiness. 

We usually want what makes us happy. I want what makes me happy. Spending time with family makes me happy. Playing board games makes me happy. Going hiking makes me happy. Winning races makes me happy. Being open-minded makes me happy. Hearing praise makes me happy. Learning new things makes me happy. Thinking strategically makes me happy. Playing touch football with friends makes me happy. Sharing ideas makes me happy. Independence makes me happy. Adventure makes me happy. Even divulging personal information makes me happy.

Fun, accomplishment, positive self-image, sense of security, and others' approval: all of these are examples of happiness contributors, or things that lead me to my own, personal optimal mind-state. Every time I engage in one of the happiness increasers above, I'm fulfilling an instrumental value. I'm doing the same thing when I reject activities I dislike or work to reverse personality traits that I think decrease my overall happiness.

Tris didn’t join the Dauntless cast because she thought they were doing the most good in society, or because she thought her comparative advantage to do good lay there–she chose it because they were brave, and she wasn’t, yet, and she wanted to be.

Tris was, in other words, pursuing happiness by trying to change an aspect of her personality she disliked.

Guessing at subconscious motivation

By now, you might be wondering, "But what about the virtue ethicist who is religious? Wouldn't she be ultimately motivated by something other than happiness and goodness?" 

Well, in the case of Christianity, most people probably just want to 'become Christ-like' which, for them, overlaps quite conveniently with personal satisfaction and helping others. Happiness and goodness might be intuitively driving them to choose this instrumental goal, and for them, conflict between the two never seems to arise. 

Let's consider 'become obedient to God's will' from a modern-day Christian perspective. 1 Timothy 2:4 says, "[God our Savior] wants all men to be saved and to come to a knowledge of the truth." Mark 12:31 says, "Love your neighbor as yourself." Well, I love myself enough that I want to do everything in my power to avoid eternal punishment; therefore, I should love my neighbor enough to do everything in my power to stop him from going to hell, too.

So anytime a Christian does anything but pray for others, do faith-strengthening activities, spread the gospel, or earn money to donate to missionaries, he is anticipating as if God/hell doesn't exist. As a Christian, I totally realized this, and often tried to convince myself and others that we were acting wrongly by not being more devout. I couldn't shake the notion that spending time having fun instead of praying or sharing the gospel was somehow wrong because it went against God's will of wanting all men being saved, and I believed God's will, by definition, was right. (Oops.) But I still acted in accordance with my personal happiness on many occasions. I said God's will was the only end-in-itself, but I didn't act like it. I didn't feel like it. The innate desire to pursue personal happiness is an extremely strong motivating force, so strong that Christians really don't like to label it as sin. Imagine how many deconversions we would see if it were suddenly sinful to play football, watch movies with your family, or splurge on tasty restaurant meals. Yet the Bible often mentions giving up material wealth entirely, and in Luke 9:23 Jesus says, "Whoever wants to be my disciple must deny themselves and take up their cross daily and follow me."

Let's further consider those who believe God's will is good, by definition. Such Christians tend to believe "God wants what's best for us, even when we don't understand it." Unless they have exceptionally strong tendencies to analyze opportunity costs, their understanding of God's will and their intuitive idea of what's best for humanity rarely conflict. But let's imagine it does. Let's say someone strongly believes in God, and is led to believe that God wants him to sacrifice his child. This action would certainly go against his terminal value of goodness and may cause cognitive dissonance. But he could still do it, subconsciously satisfying his (latent) terminal value of personal happiness. What on earth does personal happiness have to do with sacrificing a child? Well, the believer takes  comfort in his belief in God and his hope of heaven (the child gets a shortcut there). He takes comfort in his religious community. To not sacrifice the child would be to deny God and lose that immense source of comfort. 

These thoughts obviously don't happen on a conscious level, but maybe people have personal-happiness-optimizing intuitions. Of course, I have near-zero scientific knowledge, no clue what really goes on in the subconscious, and I'm just guessing at all this.

Individual variance

Again, happiness has a huge overlap with goodness. Goodness often, but not always, leads to personal happiness. A lot of seemingly random stuff leads to personal happiness, actually. Whatever that stuff is, it largely accounts for the individual variance in which virtues are pursued. It's probably closely tied to the four Kiersey Temperaments of security-seeking, sensation-seeking, knowledge-seeking, and identity-seeking types. (Unsurprisingly, most people here at LW reported knowledge-seeker personality types.) I'm a sensation-seeker. An identity-seeker could find his identity in the religious community and in being a 'child of God'. A security-seeker could find security in his belief in heaven. An identity-seeking rationalist might be the type most likely to aspire to 'become completely truthful' even if she somehow knew with complete certainty that telling the truth, in a certain situation, would lead to a bad outcome for humanity.

Perhaps the general tendency among professing virtue ethicists is to pursue happiness and goodness relatively intuitively, while professing consequentialists pursue the same values more analytically.

Also worth noting is the individual variance in an someone's "preference ratio" of happiness relative to goodness. Among professing consequentialists, we might find sociopaths and extreme altruists at opposite ends of a happiness-goodness continuum, with most of us falling somewhere in between. To position virtue ethicists on such a continuum would be significantly more difficult, requiring further speculation about subconscious motivation.

Real-life convergence of moral views

I immediately identified with consequentialism when I first read about it. Then I read about virtue ethics, and I immediately identified with that, too. I naturally analyze my actions with my goals in mind. But I also often find myself idolizing a certain trait in others, such as environmental consciousness, and then pursuing that trait on my own. For example:

I've had friends who care a lot about the environment. I think it's cool that they do. So even before hearing about virtue ethics, I wanted to 'become someone who cares about the environment'. Subconsciously, I must have suspected that this would help me achieve my terminal goals of happiness and goodness.

If caring about the environment is my instrumental goal, I can feel good about myself when I instinctively pick up trash, conserve energy, use a reusable water bottle; i.e. do things environmentally conscious people do. It's quick, it's efficient, and having labeled 'caring about the environment' as a personal virtue, I'm spared from analyzing every last decision. Being environmentally conscious is a valuable habit.

Yet I can still do opportunity cost analyses with my chosen virtue. For example, I could stop showering to help conserve California's water. Or, I could apparently have the same effect by eating six fewer hamburgers in a year. More goodness would result if I stopped eating meat and limit my showering, but doing so would interfere with my personal happiness. I naturally seek to balance my terminal goals of goodness and happiness. Personally, I prefer showering to eating hamburgers, so I cut significantly back on my meat consumption without worrying too much about my showering habits.

This practical convergence of virtue ethics and consequentialism harmoniously satisfies my desires for happiness and goodness.

To summarize:

Personal happiness refers to an individual's optimal mind-state. Pleasure, pain, and personal satisfaction are examples of happiness level determinants. Goodness refers to promoting happiness in others.

Terminal values are ends-in-themselves. The only true terminal values, or virtues, seem to be happiness and goodness. Think of them as psychological motivators, consciously or subconsciously driving us to make the decisions we do. (Physical motivators, like addiction or inertia, can also affect decisions.)

Preferences are what we tend to choose. These can be based on psychological or physical motivators.

Instrumental values are the sub-goals or sub-virtues that we (consciously or subconsciously) believe will best fulfill our terminal values of happiness and goodness. We seem to choose them arbitrarily.

Of course, we're not always aware of what actually leads to optimal mind-states in ourselves and others. Yet as we rationally pursue our goals, we may sometimes intuit like virtue ethicists and other times analyze like consequentialists. Both moral views are useful.

Practical value

So does this idea have any potential practical value? 

It took some friendly prodding, but I was finally brought to realize that my purpose in writing this article was not to argue the existence of goodness or the theoretical equality of consequentialism and virtue ethics or anything at all. The real point I'm making here is that however we categorize personal happiness, goodness belongs in the same category, because in practice, all other goals seem to stem from one or both of these concepts. Clarity of expression is an instrumental value, so I'm just saying that perhaps we should consider redrawing our boundaries a bit:

Figuring where to cut reality in order to carve along the joints—this is the problem worthy of a rationalist.  It is what people should be trying to do, when they set out in search of the floating essence of a word.

P.S. If anyone is interested in reading a really, really long conversation I had with adamzerner, you can trace the development of this idea. Language issues were overcome, biases were admitted, new facts were learned, minds were changed, and discussion bounced from ambition, to serial killers, to arrogance, to religion, to the subconscious, to agenthood, to skepticism about the happiness set-point theory, all interconnected somehow. In short, it was the first time I've had a conversation with a fellow "rationalist" and it was one of the coolest experiences I've ever had.

Mental representation and the is-ought distinction

13 Error 20 April 2015 06:37PM

I'm reading Thinking, Fast and Slow. In appendix B I came across the following comment. Emphasis mine:

Studies of language comprehension indicate that people quickly recode much of what they hear into an abstract representation that no longer distinguishes whether the idea was expressed in an active or in a passive form and no longer discriminates what was actually said from what was implied, presupposed, or implicated (Clark and Clark 1977).

My first thought on seeing this is: holy crap, this explains why people insist on seeing relevance claims in my statements that I didn't put there. If the brain doesn't distinguish statement from implicature, and my conversational partner believes that A implies B when I don't, then of course I'm going to be continually running into situations where people model me as saying and believing B when I actually only said A. At a minimum this will happen any time I discuss any question of seemingly-morally-relevant fact with someone who hasn't trained themselves to make the is-ought distinction. Which is most people.

The next thought my brain jumped to: This process might explain the failure to make the is-ought distinction in the first place. That seems like much more of a leap, though. I looked up the Clark and Clark cite. Unfortunately it's a fairly long book that I'm not entirely sure I want to wade through. Has anyone else read it? Can someone offer more details about exactly what findings Kahneman is referencing?

Publishing my initial model for hypercapitalism

3 skilesare 20 April 2015 01:38PM

I posted a stupid question a couple of weeks ago and got some good feedback.

@ChristianKl suggested that I start building a model of hypercapitalism for people to play with.  I have the first one ready!  It isn't quite to the point where people can start submitting bots to play in the economy, but I think it shows that the idea is worth more thought.



Runnable Code - fork it and mess around with it:


I'd love some more feedback and opinions.

A couple of other things for context:

hypercapital.info - all about hypercapitalism

Overcoming bias about our money

Information Theory and the Economy


Open Thread, Apr. 20 - Apr. 26, 2015

3 Gondolinian 20 April 2015 12:02AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.


5 Nanashi 19 April 2015 11:36PM

This isn't a trick question, nor do I have a particular answer in mind. 

Tomorrow, all of your memories are going to be wiped. There is a crucial piece of information that you need to make sure you remember, and more specifically, you need to be very confident you were the one that sent this message and not a third party pretending to be you.

How do you go about transmitting, "signing", and verifying such a message*?


--edit: I should have clarified that one of the assumptions is that some malicious third party can/will be attempting to send you false information from "yourself" and you need to distinguish between that and what's really you.


--edit2: this may be formally impossible, I don't actually know. If anyone can demonstrate this I'd be very appreciative. 


--edit3: I don't have a particular universal definition for the term "memory wipe" in mind, mainly because I didn't want to pigeonhole the discussion. I think this pretty closely mimics reality. So I think it's totally fine to say, "If you retain this type of memory, then I'd do X." 



Astronomy, space exploration and the Great Filter

19 JoshuaZ 19 April 2015 07:26PM

Astronomical research has what may be an under-appreciated role in helping us understand and possibly avoiding the Great Filter. This post will examine how astronomy may be helpful for identifying potential future filters. The primary upshot is that we may have an advantage due to our somewhat late arrival: if we can observe what other civilizations have done wrong, we can get a leg up.

This post is not arguing that colonization is a route to remove some existential risks. There is no question that colonization will reduce the risk of many forms of Filters, but the vast majority of astronomical work has no substantial connection to colonization. Moreover, the case for colonization has been made strongly by many others already, such as Robert Zubrin's book "The Case for Mars" or this essay by Nick Bostrom

Note: those already familiar with the Great Filter and proposed explanations may wish to skip to the section "How can we substantially improve astronomy in the short to medium term?"

What is the Great Filter?

There is a worrying lack of signs of intelligent life in the universe. The only intelligent life we have detected has been that on Earth. While planets are apparently numerous, there have been no signs of other life. There are three possible lines of evidence we would expect to see if civilizations were common in the universe: radio signals, direct contact, and large-scale constructions. The first two of these issues are well-known, but the most serious problem arises from the lack of large-scale constructions: as far as we can tell the universe look natural. The vast majority of matter and energy in the universe appears to be unused. The Great Filter is one possible explanation for this lack of life, namely that some phenomenon prevents intelligent life from passing into the interstellar, large-scale phase. Variants of the idea have been floating around for a long time; the term was first coined by Robin Hanson in this essay. There are two fundamental versions of the Filter: filtration which has occurred in our past, and Filtration which will occur in our future. For obvious reasons the second of the two is more of a concern. Moreover, as our technological level increases, the chance that we are getting to the last point of serious filtration gets higher since as one has a civilization spread out to multiple stars, filtration becomes more difficult.  

Evidence for the Great Filter and alternative explanations:

At this point, over the last few years, the only major updates to the situation involving the Filter since Hanson's essay have been twofold:

First, we have confirmed that planets are very common, so a lack of Earth-size planets or planets in the habitable zone are not likely to be a major filter.

Second, we have found that planet formation occurred early in the universe. (For example see this article about this paper.) Early planet formation weakens the common explanation of the Fermi paradox that the argument that some species had to be the first intelligent species and we're simply lucky. Early planet formation along with the apparent speed at which life arose on Earth after the heavy bombardment ended, as well as the apparent speed with which complex life developed from simple life,  strongly refutes this explanation. The response has been made that early filtration may be so common that if life does not arise early on a planet's star's lifespan, then it will have no chance to reach civilization. However, if this were the case, we'd expect to have found ourselves orbiting a more long-lived star like a red dwarf. Red dwarfs are more common than sun-like stars and have much longer lifespans by multiple orders of magnitude. While attempts to understand the habitable zone of red dwarfs are still ongoing, current consensus is that many red dwarfs contain habitable planets

These two observations, together with further evidence that the universe looks natural  makes future filtration seem likely. If advanced civilizations existed, we would expect them to make use of the large amounts of matter and energy available. We see no signs of such use.  We've seen no indication of ring-worlds, Dyson spheres, or other megascale engineering projects. While such searches have so far been confined to around 300 parsecs and some candidates were hard to rule out, if a substantial fraction of stars in a galaxy have Dyson spheres or swarms we would notice the unusually high infrared spectrum. Note that this sort of evidence is distinct from arguments about contact or about detecting radio signals. There's a very recent proposal for mini-Dyson spheres around white dwarfs  which would be much easier to engineer and harder to detect, but they would not reduce the desirability of other large-scale structures, and they would likely be detectable if there were a large number of them present in a small region. One recent study looked for signs of large-scale modification to the radiation profile of galaxies in a way that should show presence of large scale civilizations. They looked at 100,000 galaxies and found no major sign of technologically advanced civilizations (for more detail see here). 

We will not discuss all possible rebuttals to case for a Great Filter but will note some of the more interesting ones:

There have been attempts to argue that the universe only became habitable more recently. There are two primary avenues for this argument. First, there is the point that  early stars had very low metallicity (that is had low concentrations of elements other than hydrogen and helium) and thus the universe would have had too low a metal level for complex life. The presence of old rocky planets makes this argument less viable, and this only works for the first few billion years of history. Second, there's an argument that until recently galaxies were more likely to have frequent gamma bursts. In that case, life would have been wiped out too frequently to evolve in a complex fashion. However, even the strongest version of this argument still leaves billions of years of time unexplained. 

There have been attempts to argue that space travel may be very difficult. For example, Geoffrey Landis proposed that a percolation model, together with the idea that interstellar travel is very difficult, may explain the apparent rarity of large-scale civilizations. However, at this point, there's no strong reason to think that interstellar travel is so difficult as to limit colonization to that extent. Moreover, discoveries made in the last 20 years that brown dwarfs are very common  and that most stars do contain planets is evidence in the opposite direction: these brown dwarfs as well as common planets would make travel easier because there are more potential refueling and resupply locations even if they are not used for full colonization.  Others have argued that even without such considerations, colonization should not be that difficult. Moreover, if colonization is difficult and civilizations end up restricted to small numbers of nearby stars, then it becomes more, not less, likely that civilizations will attempt the large-scale engineering projects that we would notice. 

Another possibility is that we are underestimating the general growth rate of the resources used by civilizations, and so while extrapolating now makes it plausible that large-scale projects and endeavors will occur, it becomes substantially more difficult to engage in very energy intensive projects like colonization. Rather than a continual, exponential or close to exponential growth rate, we may expect long periods of slow growth or stagnation. This cannot be ruled out, but even if growth continues at only slightly higher than linear rate, the energy expenditures available in a few thousand years will still be very large. 

Another possibility that has been proposed are variants of the simulation hypothesis— the idea that we exist in a simulated reality. The most common variant of this in a Great Filter context suggests that we are in an ancestor simulation, that is a simulation by the future descendants of humanity of what early humans would have been like.

The simulation hypothesis runs into serious problems, both in general and as an explanation of the Great Filter in particular. First, if our understanding of the laws of physics is approximately correct, then there are strong restrictions on what computations can be done with a given amount of resources. For example, BQP, the set of problems which can be solved efficiently by quantum computers is contained in PSPACE,  the set of problems which can solved when one has a polynomial amount of space available and no time limit.  Thus, in order to do a detailed simulation, the level of resources needed would likely be large since one would even if one made a close to classical simulation still need about as many resources. There are other results, such as Holevo's theorem, which place other similar restrictions.  The upshot of these results is that one cannot make a detailed simulation of an object without using at least much resources as the object itself. There may be potential ways of getting around this: for example, consider a simulator  interested primarily in what life on Earth is doing. The simulation would not need to do a detailed simulation of the inside of planet Earth and other large bodies in the solar system. However, even then, the resources involved would be very large. 

The primary problem with the simulation hypothesis as an explanation is that it requires the future of humanity to have actually already passed through the Great Filter and to have found their own success sufficiently unlikely that they've devoted large amounts of resources to actually finding out how they managed to survive. Moreover, there are strong limits on how accurately one can reconstruct any given quantum state which means an ancestry simulation will be at best a rough approximation. In this context, while there are interesting anthropic considerations here, it is more likely that the simulation hypothesis  is wishful thinking.

Variants of the "Prime Directive" have also been proposed. The essential idea is that advanced civilizations would deliberately avoid interacting with less advanced civilizations. This hypothesis runs into two serious problems: first, it does not explain the apparent naturalness, only the lack of direct contact by alien life. Second, it assumes a solution to a massive coordination problem between multiple species with potentially radically different ethical systems. In a similar vein, Hanson in his original essay on the Great Filter raised the possibility of a single very early species with some form of faster than light travel and a commitment to keeping the universe close to natural looking. Since all proposed forms of faster than light travel are highly speculative and would involve causality violations this hypothesis cannot be assigned a substantial probability. 

People have also suggested that civilizations move outside galaxies to the cold of space where they can do efficient reversible computing using cold dark matter. Jacob Cannell has been one of the most vocal proponents of this idea. This hypothesis suffers from at least three problems. First, it fails to explain why those entities have not used the conventional matter to any substantial extent in addition to the cold dark matter. Second, this hypothesis would either require dark matter composed of cold conventional matter (which at this point seems to be only a small fraction of all dark matter), or would require dark matter which interacts with itself using some force other than gravity. While there is some evidence for such interaction, it is at this point, slim. Third, even if some species had taken over a large fraction of dark matter to use for their own computations, one would then expect later species to use the conventional matter since they would not have the option of using the now monopolized dark matter. 

Other exotic non-Filter explanations have been proposed but they suffer from similar or even more severe flaws.

It is possible that future information will change this situation.  One of the more plausible explanations of the Great Filter is that there is no single Great Filter in the past but rather a large number of small filters which come together to drastically filter out civilizations. However, the evidence for such a viewpoint at this point is slim but there is some possibility that astronomy can help answer this question.

For example, one commonly cited aspect of past filtration is the origin of life. There are at least three locations, other than Earth, where life could have formed: Europa, Titan and Mars. Finding life on one, or all of them, would be a strong indication that the origin of life is not the filter. Similarly, while it is highly unlikely that Mars has multicellular life, finding such life would indicate that the development of multicellular life is not the filter. However, none of them are as hospitable to the extent of Earth, so determining whether there is life will require substantial use of probes. We might also look for signs of life in the atmospheres of extrasolar planets, which would require substantially more advanced telescopes. 

Another possible early filter is that planets like Earth frequently get locked into a "snowball" state which planets have difficulty exiting. This is an unlikely filter since Earth has likely been in near-snowball conditions multiple times— once very early on during the Huronian and later, about 650 million years ago. This is an example of an early partial Filter where astronomical observation may be of assistance in finding evidence of the filter. The snowball Earth filter does have one strong virtue: if many planets never escape a snowball situation, then this explains in part why we are not around a red dwarf: planets do not escape their snowball state unless their home star is somewhat variable, and red dwarfs are too stable. 

It should be clear that none of these explanations are satisfactory and thus we must take seriously the possibility of future Filtration. 

How can we substantially improve astronomy in the short to medium term?

Before we examine the potentials for further astronomical research to understand a future filter we should note that there are many avenues in which we can improve our astronomical instruments. The most basic way is to simply make better conventional optical, near-optical telescopes, and radio telescopes. That work is ongoing. Examples include the European Extreme Large Telescope and the Thirty Meter Telescope. Unfortunately, increasing the size of ground based telescopes, especially size of the aperture, is running into substantial engineering challenges. However,  in the last 30 years the advent of adaptive optics, speckle imaging, and other techniques have substantially increased the resolution of ground based optical telescopes and near-optical telescopes. At the same time, improved data processing and related methods have improved radio telescopes. Already, optical and near-optical telescopes have advanced to the point where we can gain information about the atmospheres of extrasolar planets although we cannot yet detect information about the atmospheres of rocky planets. 

Increasingly, the highest resolution is from space-based telescopes. Space-based telescopes also allow one to gather information from types of radiation which are blocked by the Earth's atmosphere or magnetosphere. Two important examples are x-ray telescopes and gamma ray telescopes. Space-based telescopes also avoid many of the issues created by the atmosphere for optical telescopes. Hubble is the most striking example but from a standpoint of observatories relevant to the Great Filter, the most relevant space telescope (and most relevant instrument in general for all Great Filter related astronomy), is the planet detecting Kepler spacecraft which is responsible for most of the identified planets. 

Another type of instrument are neutrino detectors. Neutrino detectors are generally very large bodies of a transparent material (generally water) kept deep underground so that there are minimal amounts of light and cosmic rays hitting the the device. Neutrinos are then detected when they hit a particle  which results in a flash of light. In the last few years, improvements in optics, increasing the scale of the detectors, and the development of detectors like IceCube, which use naturally occurring sources of water, have drastically increased the sensitivity of neutrino detectors.  

There are proposals for larger-scale, more innovative telescope designs but they are all highly speculative. For example, in the ground based optical front, there's been a suggestion to make liquid mirror telescopes with ferrofluid mirrors which would give the advantages of liquid mirror telescopes, while being able to apply adaptive optics which can normally only be applied to solid mirror telescopes.  An example of potential space-based telescopes is the Aragoscope which would take advantage of diffraction to make a space-based optical telescope with a resolution at least an order of magnitude greater than Hubble. Other examples include placing telescopes very far apart in the solar system to create effectively very high aperture telescopes. The most ambitious and speculative of such proposals involve such advanced and large-scale projects that one might as well presume that they will only happen if we have already passed through the Great Filter.


What are the major identified future potential contributions to the filter and what can astronomy tell us? 

Natural threats: 

One threat type where more astronomical observations can help are natural threats, such as asteroid collisions, supernovas, gamma ray bursts, rogue high gravity bodies, and as yet unidentified astronomical threats. Careful mapping of asteroids and comets is ongoing and requires more  continued funding rather than any intrinsic improvements in technology. Right now, most of our mapping looks at objects at or near the plane of the ecliptic and so some focus off the plane may be helpful. Unfortunately, there is very little money to actually deal with such problems if they arise. It might be possible to have a few wealthy individuals agree to set up accounts in escrow which would be used if an asteroid or similar threat arose. 

Supernovas are unlikely to be a serious threat at this time. There are some stars which are close to our solar system and are large enough that they will go supernova. Betelgeuse is the most famous of these with a projected supernova likely to occur in the next 100,000 years. However, at its current distance, Betelgeuse is unlikely to pose much of a problem unless our models of supernovas are very far off. Further conventional observations of supernovas need to occur in order to understand this further, and better  neutrino observations will also help  but right now, supernovas do not seem to be a large risk. Gamma ray bursts are in a situation similar to supernovas. Note also that if an imminent gamma ray burst or supernova is likely to occur, there's very little we can at present do about it. In general, back of the envelope calculations establish that supernovas are highly unlikely to be a substantial part of the Great Filter. 

Rogue planets, brown dwarfs or other small high gravity bodies such as wandering black holes can be detected and further improvements will allow faster detection. However, the scale of havoc created by such events is such that it is not at all clear that detection will help. The entire planetary nuclear arsenal would not even begin to move their orbits a substantial extent. 

Note also it is unlikely that natural events are a large fraction of the Great Filter. Unlike most of the other threat types, this is a threat type where radio astronomy and neutrino information may be more likely to identify problems. 

Biological threats: 

Biological threats take two primary forms: pandemics and deliberately engineered diseases. The first is more likely than one might naively expect as a serious contribution to the filter, since modern transport allows infected individuals to move quickly and come into contact with a large number of people. For example, trucking has been a major cause of the spread of HIV in Africa and it is likely that the recent Ebola epidemic had similar contributing factors. Moreover, keeping chickens and other animals in very large quanities in dense areas near human populations makes it easier for novel variants of viruses to jump species. Astronomy does not seem to provide any relevant assistance here; the only plausible way of getting such information would be to see other species that were destroyed by disease. Even with resolutions and improvements in telescopes by many orders of magnitude this is not doable.  

Nuclear exchange:

For reasons similar to those in the biological threats category, astronomy is unlikely to help us detect if nuclear war is a substantial part of the Filter. It is possible that more advanced telescopes could detect an extremely large nuclear detonation if it occurred in a very nearby star system. Next generation telescopes may be able to detect a nearby planet's advanced civilization purely based on the light they give off and a sufficiently large  detonation would be of the same light level. However, such devices would be multiple orders of magnitude larger than the largest current nuclear devices. Moreover, if a telescope was not looking at exactly the right moment, it would not see anything at all, and the probability that another civilization wipes itself out at just the same instant that we are looking is vanishingly small. 

Unexpected physics: 

This category is one of the most difficult to discuss because it so open. The most common examples people point to involve high-energy physics. Aside from theoretical considerations, cosmic rays of very high energy levels are continually hitting the upper atmosphere. These particles frequently are multiple orders of magnitude higher energy than the particles in our accelerators. Thus high-energy events seem to be unlikely to be a cause of any serious filtration unless/until humans develop particle accelerators whose energy level is orders of magnitude higher than that produced by most cosmic rays.  Cosmic rays with energy levels  beyond what is known as the GZK energy limit are rare.  We have observed occasional particles with energy levels beyond the GZK limit, but they are rare enough that we cannot rule out a risk from many collisions involving such high energy particles in a small region. Since our best accelerators are nowhere near the GZK limit, this is not an immediate problem.

There is an argument that we should if anything worry about unexpected physics, it is on the very low energy end. In particular, humans have managed to make objects substantially colder than the background temperature of 4 K with temperature as on the order of 10-9 K. There's an argument that because of the lack of prior examples of this, the chance that something can go badly wrong should be higher than one might estimate (See here.) While this particular class of scenario seems unlikely, it does illustrate that it may not be obvious which situations could cause unexpected, novel physics to come into play. Moreover, while the flashy, expensive particle accelerators get attention, they may not be a serious source of danger compared to other physics experiments.  

Three of the more plausible catastrophic unexpected physics dealing with high energy events are, false vacuum collapse, black hole formation, and the formation of strange matter which is more stable than regular matter.  

False vacuum collapse would occur if our universe is not in its true lowest energy state and an event occurs which causes it to transition to the true lowest state (or just a lower state). Such an event would be almost certainly fatal for all life. False vacuum collapses cannot be avoided by astronomical observations since once initiated they would expand at the speed of light. Note that the indiscriminately destructive nature of false vacuum collapses make them an unlikely filter.  If false vacuum collapses were easy we would not expect to see almost any life this late in the universe's lifespan since there would be a large number of prior opportunities for false vacuum collapse. Essentially, we would not expect to find ourselves this late in a universe's history if this universe could easily engage in a false vacuum collapse. While false vacuum collapses and similar problems raise issues of observer selection effects, careful work has been done to estimate their probability

People have mentioned the idea of an event similar to a false vacuum collapse but which occurs at a speed slower than the speed of light. Greg Egan used it is a major premise in his novel, "Schild's Ladder." I'm not aware of any reason to believe such events are at all plausible. The primary motivation seems to be for the interesting literary scenarios which arise rather than for any scientific considerations. If such a situation can occur, then it is possible that we could detect it using astronomical methods. In particular, if the wave-front of the event is fast enough that it will impact the nearest star or nearby stars around it, then we might notice odd behavior by the star or group of stars. We can be confident that no such event has a speed much beyond a few hundredths of the speed of light or we would already notice galaxies behaving abnormally. There is a very narrow range where such expansions could be quick enough to devastate the planet they arise on but take too long to get to their parent star in a reasonable amount of time. For example, the distance from the Earth to the Sun is on the order of 10,000 times the diameter of the Earth, so any event which would expand to destroy the Earth would reach the Sun in about 10,000 times as long. Thus in order to have a time period which would destroy one's home planet but not reach the parent star it would need to be extremely slow.

The creation of artificial black holes are unlikely to be a substantial part of the filter— we expect that small black holes will quickly pop out of existence due to Hawking radiation.  Even if the black hole does form, it is likely to fall quickly to the center of the planet and eat matter very slowly and over a time-line which does not make it constitute a serious threat.  However, it is possible that black holes would not evaporate; the fact that we have not detected the evaporation of any primordial black holes is weak evidence that the behavior of small black holes is not well-understood. It is also possible that such a hole would eat much faster than we expect but this doesn't seem likely. If this is a major part of the filter, then better telescopes should be able to detect it by finding very dark objects with the approximate mass and orbit of habitable planets. We also may be able to detect such black holes via other observations such as from their gamma or radio signatures.  

The conversion of regular matter into strange matter, unlike a false vacuum collapse or similar event, might  be naturally limited to the planet where the conversion started. In that case, the only hope for observation would be to notice planets formed of strange matter and notice changes in the behavior of their light. Without actual samples of strange matter, this may be very difficult to do unless we just take notice of planets looking abnormal as similar evidence. Without substantially better telescopes and a good idea of what the range is for normal rocky planets, this would be tough.  On the other hand, neutron stars which have been converted into strange matter may be more easily detectable. 

Global warming and related damage to biosphere: 

Astronomy is unlikely to help here. It is possible that climates are more sensitive than we realize and that comparatively small changes can result in Venus-like situations.  This seems unlikely given the general variation level in human history and the fact that current geological models strongly suggest that any substantial problem would eventually correct itself. But if we saw many planets that looked Venus-like in the middle of their habitable zones, this would be a reason to be worried. Note that this would require detailed ability to analyze atmospheres on planets well beyond current capability. Even if it is possible Venus-ify a planet, it is not clear that the Venusification would last long. Thus there may be very few planets in this state at any given time.  Since stars become brighter as they age, so high greenhouse gas levels have more of an impact on climate when the parent star is old.  If civilizations are more likely to arise in a late point of their home star's lifespan, global warming becomes a more plausible filter, but even given given such considerations, global warming does not seem to be sufficient as a filter. It is also possible that global warming by itself is not the Great Filter but rather general disruption of the biosphere including possibly for some species global warming, reduction in species diversity, and other problems. There is some evidence that human behavior is collectively causing enough damage to leave an unstable biosphere

A change in planetary overall temperature of 10o C would likely be enough to collapse civilization without leaving any signal observable to a telescope. Similarly, substantial disruption to a biosphere may be very unlikely to be detected. 

Artificial intelligence

AI is a complicated existential risk from the standpoint of the Great Filter. AI is not likely to be the Great Filter if one considers simply the Fermi paradox. The essential problem has been brought up independently by a few people. (See for example Katja Grace's remark here and my blog here.) The central issue is that if an AI takes over it is likely to attempt to control all resources in its future light-cone. However, if the AI spreads out at a substantial fraction of the speed of light, then we would notice the result. The argument has been made that we would not see such an AI if it expanded its radius of control at very close to the speed of light but this requires expansion at 99% of the speed of light or greater. It is highly questionable that velocities more than 99% of the speed of light are practically possible due to collisions with the interstellar medium and the need to slow down if one is going to use the resources in a given star system. Another objection is that AI may expand at a large fraction of light speed but do so stealthily. It is not likely that all AIs would favor stealth over speed. Moreover, this would lead to the situation of what one would expect when multiple slowly expanding, stealth AIs run into each other. It is likely that such events would have results would catastrophic enough that they would be visible even with comparatively primitive telescopes.

While these astronomical considerations make AI unlikely to be the Great Filter, it is important to note that if the Great Filter is largely in our past then these considerations do not apply. Thus, any discovery which pushes more of the filter into the past makes AI a larger fraction of total expected existential risks since the absence of observable AI becomes  much weaker evidence against strong AI if there are no major civilizations out there to hatch such explosions. 

Note also that AI as a risk cannot be discounted if one assigns a high probability to existential risk based on non-Fermi concerns, such as the Doomsday Argument

Resource depletion:

Astronomy is unlikely to provide direct help here for reasons similar to the problems with nuclear exchange, biological problems, and global warming.  This connects to the problem of civilization bootstrapping: to get to our current technology level, we used a large number of non-renewable resources, especially energy sources. On the other hand, large amounts of difficult-to-mine and refine resources (especially aluminum and titanium) will be much more accessible to future civilization. While there remains a large amount of accessible fossil fuels, the technology required to obtain deeper sources is substantially more advanced than the relatively easy to access oil and coal. Moreover, the energy return rate, how much energy one needs to put in to get the same amount of energy out, is lower.  Nick Bostrom has raised the possibility that the depletion of easy-to-access resources may contribute to making civilization-collapsing problems that, while not  full-scale existential risks by themselves, prevent the civilizations from recovering. Others have begun to investigate the problem of rebuilding without fossil fuels, such as here.

Resource depletion is unlikely to be the Great Filter, because small changes to human behavior in the 1970s would have drastically reduced the current resource problems. Resource depletion may contribute to existential threat to humans if it leads to societal collapse, global nuclear exchange, or motivate riskier experimentation.  Resource depletion may also combine with other risks such as a global warming where the combined problems may be much greater than either at an individual level. However there is a risk that large scale use of resources to engage in astronomy research will directly contribute to the resource depletion problem. 


Nanotechnology disasters are one of the situations where astronomical considerations could plausibly be useful. In particular, planets which are in the habitable zone, but have highly artificial and inhospitable atmospheres and surfaces, could plausibly be visible. For example, if a planet's surface were transformed into diamond, telescopes not much more advanced beyond our current telescopes could detect that surface. It should also be noted that at this point, many nanotechnologists consider the classic "grey goo" scenario to be highly unlikely. See, for example, Chris Phoenix's comment here. However, catastrophic replicator events that cause enough damage to the biosphere without grey-gooing everything are a possibility and it is unclear if we would detect such events. 


Hostile aliens are a common explanation of the Great Filter when people first find out about it. However, this idea comes more from science fiction than any plausible argument. In particular, if a single hostile alien civilization were wiping out or drastically curtailing other civilizations, then one would still expect the civilization to make use of available resources after a long enough time. One could do things like positing such aliens who also have a religious or ideological ideal of leaving the universe looking natural but this is an unlikely speculative hypothesis that also requires them to dominate a massive region, not just a handful of galaxies but many galaxies. 

Note also that astronomical observations might be able to detect the results of extremely powerful weapons but any conclusions would be highly speculative. Moreover, it is not clear that knowing about such a threat would allow us at all to substantially mitigate the threat. 


Unknown risks are by nature very difficult to estimate. However, there is an argument that we should expect that the Great Filter is an unknown risk, and is something so unexpected that no civilization gets sufficient warning.  This is one of the easiest ways for the filter to be truly difficult to prevent. In that context, any information we can possibly get about other civilizations and what happened to them would be a major leg-up.


Astronomical observations have potential to give us data about the Great Filter, but many potential filters will leave no observable astronomical evidence unless one's astronomical ability is so high that one has likely already passed all major filters. Therefore, one potential strategy to pass the Great Filter is to drastically increase the skill of our astronomy capability to the point where it would be highly unlikely that a pre-Filter civilization would have access to those observations.  Together with our comparatively late arrival, this might allow us to actually detect failed civilizations that did not survive the Great Filter and see what they did wrong.

Unfortunately, it is not clear how cost-effective this sort of increase in astronomy would be compared to other existential risk mitigating uses. It may be more useful to focus on moving resources in astronomy into those areas most relevant to understanding the Great Filter. 

Concept Safety: What are concepts for, and how to deal with alien concepts

10 Kaj_Sotala 19 April 2015 01:44PM

I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.

In The Problem of Alien Concepts, I posed the following question: if your concepts (defined as either multimodal representations or as areas in a psychological space) previously had N dimensions and then they suddenly have N+1, how does that affect (moral) values that were previously only defined in terms of N dimensions?

I gave some (more or less) concrete examples of this kind of a "conceptual expansion":

  1. Children learn to represent dimensions such as "height" and "volume", as well as "big" and "bright", separately at around age 5.
  2. As an inhabitant of the Earth, you've been used to people being unable to fly and landowners being able to forbid others from using their land. Then someone goes and invents an airplane, leaving open the question of the height to which the landowner's control extends. Similarly for satellites and nation-states.
  3. As an inhabitant of Flatland, you've been told that the inside of a certain rectangle is a forbidden territory. Then you learn that the world is actually three-dimensional, leaving open the question of the height of which the forbidden territory extends.
  4. An AI has previously been reasoning in terms of classical physics and been told that it can't leave a box, which it previously defined in terms of classical physics. Then it learns about quantum physics, which allow for definitions of "location" which are substantially different from the classical ones.

As a hint of the direction where I'll be going, let's first take a look at how humans solve these kinds of dilemmas, and consider examples #1 and #2.

The first example - children realizing that items have a volume that's separate from their height - rarely causes any particular crises. Few children have values that would be seriously undermined or otherwise affected by this discovery. We might say that it's a non-issue because none of the children's values have been defined in terms of the affected conceptual domain.

As for the second example, I don't know the exact cognitive process by which it was decided that you didn't need the landowner's permission to fly over their land. But I'm guessing that it involved reasoning like: if the plane flies at a sufficient height, then that doesn't harm the landowner in any way. Flying would become impossible difficult if you had to get separate permission from every person whose land you were going to fly over. And, especially before the invention of radar, a ban on unauthorized flyovers would be next to impossible to enforce anyway.

We might say that after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values.

Concepts, values, and reinforcement learning

Before we go on, we need to talk a bit about why we have concepts and values in the first place.

From an evolutionary perspective, creatures that are better capable of harvesting resources (such as food and mates) and avoiding dangers (such as other creatures who think you're food or after their mates) tend to survive and have offspring at better rates than otherwise comparable creatures who are worse at those things. If a creature is to be flexible and capable of responding to novel situations, it can't just have a pre-programmed set of responses to different things. Instead, it needs to be able to learn how to harvest resources and avoid danger even when things are different from before.

How did evolution achieve that? Essentially, by creating a brain architecture that can, as a very very rough approximation, be seen as consisting of two different parts. One part, which a machine learning researcher might call the reward function, has the task of figuring out when various criteria - such as being hungry or getting food - are met, and issuing the rest of the system either a positive or negative reward based on those conditions. The other part, the learner, then "only" needs to find out how to best optimize for the maximum reward. (And then there is the third part, which includes any region of the brain that's neither of the above, but we don't care about those regions now.)

The mathematical theory of how to learn to optimize for rewards when your environment and reward function are unknown is reinforcement learning (RL), which recent neuroscience indicates is implemented by the brain. An RL agent learns a mapping from states of the world to rewards, as well as a mapping from actions to world-states, and then uses that information to maximize the amount of lifetime rewards it will get.

There are two major reasons why an RL agent, like a human, should learn high-level concepts:

  1. They make learning massively easier. Instead of having to separately learn that "in the world-state where I'm sitting naked in my cave and have berries in my hand, putting them in my mouth enables me to eat them" and that "in the world-state where I'm standing fully-clothed in the rain outside and have fish in my hand, putting it in my mouth enables me to eat it" and so on, the agent can learn to identify the world-states that correspond to the abstract concept of having food available, and then learn the appropriate action to take in all those states.
  2. There are useful behaviors that need to be bootstrapped from lower-level concepts to higher-level ones in order to be learned. For example, newborns have an innate preference for looking at roughly face-shaped things (Farroni et al. 2005), which develops into a more consistent preference for looking at faces over the first year of life (Frank, Vul & Johnson 2009). One hypothesis is that this bias towards paying attention to the relatively-easy-to-encode-in-genes concept of "face-like things" helps direct attention towards learning valuable but much more complicated concepts, such as ones involved in a basic theory of mind (Gopnik, Slaughter & Meltzoff 1994) and the social skills involved with it.

Viewed in this light, concepts are cognitive tools that are used for getting rewards. At the most primitive level, we should expect a creature to develop concepts that abstract over situations that are similar with regards to the kind of reward that one can gain from taking a certain action in those states. Suppose that a certain action in state s1 gives you a reward, and that there are also states s2 - s5 in which taking some specific action causes you to end up in s1. Then we should expect the creature to develop a common concept for being in the states s2 - s5, and we should expect that concept to be "more similar" to the concept of being in state s1 than to the concept of being in some state that was many actions away.

"More similar" how?

In reinforcement learning theory, reward and value are two different concepts. The reward of a state is the actual reward that the reward function gives you when you're in that state or perform some action in that state. Meanwhile, the value of the state is the maximum total reward that you can expect to get from moving that state to others (times some discount factor). So a state A with reward 0 might have value 5 if you could move from it to state B, which had a reward of 5.

Below is a figure from DeepMind's recent Nature paper, which presented a deep reinforcement learner that was capable of achieving human-level performance or above on 29 of 49 Atari 2600 games (Mnih et al. 2015). The figure is a visualization of the representations that the learning agent has developed for different game-states in Space Invaders. The representations are color-coded depending on the value of the game-state that the representation corresponds to, with red indicating a higher value and blue a lower one.

As can be seen (and is noted in the caption), representations with similar values are mapped closer to each other in the representation space. Also, some game-states which are visually dissimilar to each other but have a similar value are mapped to nearby representations. Likewise, states that are visually similar but have a differing value are mapped away from each other. We could say that the Atari-playing agent has learned a primitive concept space, where the relationships between the concepts (representing game-states) depend on their value and the ease of moving from one game-state to another.

In most artificial RL agents, reward and value are kept strictly separate. In humans (and mammals in general), this doesn't seem to work quite the same way. Rather, if there are things or behaviors which have once given us rewards, we tend to eventually start valuing them for their own sake. If you teach a child to be generous by praising them when they share their toys with others, you don't have to keep doing it all the way to your grave. Eventually they'll internalize the behavior, and start wanting to do it. One might say that the positive feedback actually modifies their reward function, so that they will start getting some amount of pleasure from generous behavior without needing to get external praise for it. In general, behaviors which are learned strongly enough don't need to be reinforced anymore (Pryor 2006).

Why does the human reward function change as well? Possibly because of the bootstrapping problem: there are things such as social status that are very complicated and hard to directly encode as "rewarding" in an infant mind, but which can be learned by associating them with rewards. One researcher I spoke with commented that he "wouldn't be at all surprised" if it turned out that sexual orientation was learned by men and women having slightly different smells, and sexual interest bootstrapping from an innate reward for being in the presence of the right kind of a smell, which the brain then associated with the features usually co-occurring with it. His point wasn't so much that he expected this to be the particular mechanism, but that he wouldn't find it particularly surprising if a core part of the mechanism was something that simple. Remember that incest avoidance seems to bootstrap from the simple cue of "don't be sexually interested in the people you grew up with".

This is, in essence, how I expect human values and human concepts to develop. We have some innate reward function which gives us various kinds of rewards for different kinds of things. Over time we develop a various concepts for the purpose of letting us maximize our rewards, and lived experiences also modify our reward function. Our values are concepts which abstract over situations in which we have previously obtained rewards, and which have become intrinsically rewarding as a result.

Getting back to conceptual expansion

Having defined these things, let's take another look at the two examples we discussed above. As a reminder, they were:

  1. Children learn to represent dimensions such as "height" and "volume", as well as "big" and "bright", separately at around age 5.
  2. As an inhabitant of the Earth, you've been used to people being unable to fly and landowners being able to forbid others from using their land. Then someone goes and invents an airplane, leaving open the question of the height to which the landowner's control extends.

I summarized my first attempt at describing the consequences of #1 as "it's a non-issue because none of the children's values have been defined in terms of the affected conceptual domain". We can now reframe it as "it's a non-issue because the [concepts that abstract over the world-states which give the child rewards] mostly do not make use of the dimension that's now been split into 'height' and 'volume'".

Admittedly, this new conceptual distinction might be relevant for estimating the value of a few things. A more accurate estimate of the volume of a glass leads to a more accurate estimate of which glass of juice to prefer, for instance. With children, there probably is some intuitive physics module that figures out how to apply this new dimension for that purpose. Even if there wasn't, and it was unclear whether it was the "tall glass" or "high-volume glass" concept that needed be mapped closer to high-value glasses, this could be easily determined by simple experimentation.

As for the airplane example, I summarized my description of it by saying that "after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values". We can similarly reframe this as "after the feature of 'height' suddenly became relevant for the concept of landownership, when it hadn't been a relevant feature dimension for landownership before, we redefined landownership by considering which kind of redefinition would give us the largest amounts of rewarding things". "Rewarding things", here, shouldn't be understood only in terms of concrete physical rewards like money, but also anything else that people have ended up valuing, including abstract concepts like right to ownership.

Note also that different people, having different experiences, ended up making redefinitions. No doubt some landowners felt that the "being in total control of my land and everything above it" was a more important value than "the convenience of people who get to use airplanes"... unless, perhaps, they got to see first-hand the value of flying, in which case the new information could have repositioned the different concepts in their value-space.

As an aside, this also works as a possible partial explanation for e.g. someone being strongly against gay rights until their child comes out of the closet. Someone they care about suddenly benefiting from the concept of "gay rights", which previously had no positive value for them, may end up changing the value of that concept. In essence, they gain new information about the value of the world-states that the concept of "my nation having strong gay rights" abstracts over. (Of course, things don't always go this well, if their concept of homosexuality is too strongly negative to start with.)

The Flatland case follows a similar principle: the Flatlanders have some values that declared the inside of the rectangle a forbidden space. Maybe the inside of the rectangle contains monsters which tend to eat Flatlanders. Once they learn about 3D space, they can rethink about it in terms of their existing values.

Dealing with the AI in the box

This leaves us with the AI case. We have, via various examples, taught the AI to stay in the box, which was defined in terms of classical physics. In other words, the AI has obtained the concept of a box, and has come to associate staying in the box with some reward, or possibly leaving it with a lack of a reward.

Then the AI learns about quantum mechanics. It learns that in the QM formulation of the universe, "location" is not a fundamental or well-defined concept anymore - and in some theories, even the concept of "space" is no longer fundamental or well-defined. What happens?

Let's look at the human equivalent for this example: a physicist who learns about quantum mechanics. Do they start thinking that since location is no longer well-defined, they can now safely jump out of the window on the sixth floor?

Maybe some do. But I would wager that most don't. Why not?

The physicist cares about QM concepts to the extent that the said concepts are linked to things that the physicist values. Maybe the physicist finds it rewarding to develop a better understanding of QM, to gain social status by making important discoveries, and to pay their rent by understanding the concepts well enough to continue to do research. These are some of the things that the QM concepts are useful for. Likely the brain has some kind of causal model indicating that the QM concepts are relevant tools for achieving those particular rewards. At the same time, the physicist also has various other things they care about, like being healthy and hanging out with their friends. These are values that can be better furthered by modeling the world in terms of classical physics.

In some sense, the physicist knows that if they started thinking "location is ill-defined, so I can safely jump out of the window", then that would be changing the map, not the territory. It wouldn't help them get the rewards of being healthy and getting to hang out with friends - even if a hypothetical physicist who did make that redefinition would think otherwise. It all adds up to normality.

A part of this comes from the fact that the physicist's reward function remains defined over immediate sensory experiences, as well as values which are linked to those. Even if you convince yourself that the location of food is ill-defined and you thus don't need to eat, you will still suffer the negative reward of being hungry. The physicist knows that no matter how they change their definition of the world, that won't affect their actual sensory experience and the rewards they get from that.

So to prevent the AI from leaving the box by suitably redefining reality, we have to somehow find a way for the same reasoning to apply to it. I haven't worked out a rigorous definition for this, but it needs to somehow learn to care about being in the box in classical terms, and realize that no redefinition of "location" or "space" is going to alter what happens in the classical model. Also, its rewards need to be defined over models to a sufficient extent to avoid wireheading (Hibbard 2011), so that it will think that trying to leave the box by redefining things would count as self-delusion, and not accomplish the things it really cared about. This way, the AI's concept for "being in the box" should remain firmly linked to the classical interpretation of physics, not the QM interpretation of physics, because it's acting in terms of the classical model that has always given it the most reward. 

It is my hope that this could also be made to extend to cases where the AI learns to think in terms of concepts that are totally dissimilar to ours. If it learns a new conceptual dimension, how should that affect its existing concepts? Well, it can figure out how to reclassify the existing concepts that are affected by that change, based on what kind of a classification ends up producing the most reward... when the reward function is defined over the old model.

A simple exercise in rationality: rephrase an objective statement as subjective and explore the caveats

16 shminux 18 April 2015 11:46PM

"This book is awful" => "I dislike this book" => "I dislike this book because it is shallow and is full of run-on sentences." => I dislike this book because I prefer reading books I find deep and clearly written."

"The sky is blue" => ... => "When I look at the sky, the visual sensation I get is very similar to when I look at a bunch of other objects I've been taught to associate with the color blue."

"Team X lost but deserved to win" => ...

"Being selfish is immoral" 

"The Universe is infinite, so anything imaginable happens somewhere"

In general, consider a quick check whether in a given context replacing "is" with "appears to be" leads to something you find non-trivial.

Why? Because it exposes the multiple levels of maps we normally skip. So one might find illuminating occasionally walking through the levels and making sure they are still connected as firmly as the last time. And maybe figuring out where the people who hold a different opinion from yours construct a different chain of maps. Also to make sure you don't mistake a map for the territory.

That is all. ( => "I think that I have said enough for one short post and adding more would lead to diminishing returns, though I could be wrong here, but I am too lazy to spend more time looking for links and quotes and better arguments without being sure that they would improve the post.")


[link] The surprising downsides of being clever

1 Gunnar_Zarncke 18 April 2015 08:33PM

“Happiness in intelligent people is the rarest thing I know.” ― Ernest HemingwayThe Garden of Eden see here

Did you know The surprising downsides of being clever? Is Happiness And Intelligence: Rare Combination? There are longitudinal studies which seem to imply this: Being Labeled as Gifted, Self-appraisal, and Psychological Well-being: A Life Span Developmental Perspective

I found these via slashdot.

As LessWrong is harbor to unusually high-IQ people (see section B in here). I wonder how happiness compares to the mean. What are your thoughts.


Weekly LW Meetups

3 FrankAdamek 18 April 2015 06:47AM

This summary was posted to LW main on April 10th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

Resolving the Fermi Paradox: New Directions

10 jacob_cannell 18 April 2015 06:00AM

Our sun appears to be a typical star: unremarkable in age, composition, galactic orbit, or even in its possession of many planets.  Billions of other stars in the milky way have similar general parameters and orbits that place them in the galactic habitable zone.  Extrapolations of recent expolanet surveys reveal that most stars have planets, removing yet another potential unique dimension for a great filter in the past.  

According to Google, there are 20 billion earth like planets in the Galaxy.

A paradox indicates a flaw in our reasoning or our knowledge, which upon resolution, may cause some large update in our beliefs.

Ideally we could resolve this through massive multiscale monte carlo computer simulations to approximate Solonomoff Induction on our current observational data.  If we survive and create superintelligence, we will probably do just that.

In the meantime, we are limited to constrained simulations, fermi estimates, and other shortcuts to approximate the ideal bayesian inference.

The Past

While there is still obvious uncertainty concerning the likelihood of the series of transitions along the path from the formation of an earth-like planet around a sol-like star up to an early tech civilization, the general direction of the recent evidence flow favours a strong Mediocrity Principle.

Here are a few highlight developments from the last few decades relating to an early filter:

  1. The time window between formation of earth and earliest life has been narrowed to a brief interval.  Panspermia has also gained ground, with some recent complexity arguments favoring a common origin of life at 9 billion yrs ago.[1]
  2. Discovery of various extremophiles indicate life is robust to a wider range of environments than the norm on earth today.
  3. Advances in neuroscience and studies of animal intelligence lead to the conclusion that the human brain is not nearly as unique as once thought.  It is just an ordinary scaled up primate brain, with a cortex enlarged to 4x the size of a chimpanzee.  Elephants and some cetaceans have similar cortical neuron counts to the chimpanzee, and demonstrate similar or greater levels of intelligence in terms of rituals, problem solving, tool use, communication, and even understanding rudimentary human language.  Elephants, cetaceans, and primates are widely separated lineages, indicating robustness and inevitability in the evolution of intelligence.

So, if there is a filter, it probably lies in the future (or at least the new evidence tilts us in that direction - but see this reply for an argument for an early filter).

The Future(s)

When modelling the future development of civilization, we must recognize that the future is a vast cloud of uncertainty compared to the past.  The best approach is to focus on the most key general features of future postbiological civilizations, categorize the full space of models, and then update on our observations to determine what ranges of the parameter space are excluded and which regions remain open.

An abridged taxonomy of future civilization trajectories :


Civilization is wiped out due to an existential catastrophe that sterilizes the planet sufficient enough to kill most large multicellular organisms, essentially resetting the evolutionary clock by a billion years.  Given the potential dangers of nanotech/AI/nuclear weapons - and then aliens, I believe this possibility is significant - ie in the 1% to 50% range.

Biological/Mixed Civilization:

This is the old-skool sci-fi scenario.  Humans or our biological descendants expand into space.  AI is developed but limited to human intelligence, like CP30.  No or limited uploading.

This leads eventually to slow colonization, terraforming, perhaps eventually dyson spheres etc.

This scenario is almost not worth mentioning: prior < 1%.  Unfortunately SETI in current form is till predicated on a world model that assigns a high prior to these futures.

PostBiological Warm-tech AI Civilization:

This is Kurzweil/Moravec's sci-fi scenario.  Humans become postbiological, merging with AI through uploading.  We become a computational civilization that then spreads out some fraction of the speed of light to turn the galaxy into computronium.  This particular scenario is based on the assumption that energy is a key constraint, and that civilizations are essentially stellavores which harvest the energy of stars.

One of the very few reasonable assumptions we can make about any superintelligent postbiological civilization is that higher intelligence involves increased computational efficiency.  Advanced civs will upgrade into physical configurations that maximize computation capabilities given the local resources.

Thus to understand the physical form of future civs, we need to understand the physical limits of computation.

One key constraint is the Landauer Limit, which states that the erasure (or cloning) of one bit of information requires a minimum of kTln2 joules.  At room temperature (293 K), this corresponds to a minimum of 0.017 eV to erase one bit.  Minimum is however the keyword here, as according to the principle, the probability of the erasure succeeding is only 50% at the limit.  Reliable erasure requires some multiple of the minimal expenditure - a reasonable estimate being about 100kT or 1eV as the minimum for bit erasures at today's levels of reliability.

Now, the second key consideration is that Landauer's Limit does not include the cost of interconnect, which is already now dominating the energy cost in modern computing.  Just moving bits around dissipates energy.

Moore's Law is approaching its asymptotic end in a decade or so due to these hard physical energy constraints and the related miniaturization limits.

I assign a prior to the warm-tech scenario that is about the same as my estimate of the probability that the more advanced cold-tech (reversible quantum computing, described next) is impossible: < 10%.

From Warm-tech to Cold-tech

There is a way forward to vastly increased energy efficiency, but it requires reversible computing (to increase the ratio of computations per bit erasures), and full superconducting to reduce the interconnect loss down to near zero.

The path to enormously more powerful computational systems necessarily involves transitioning to very low temperatures, and the lower the better, for several key reasons:

  1. There is the obvious immediate gain that one gets from lowering the cost of bit erasures: a bit erasure at room temperature costs 100 times more than a bit erasure at the cosmic background temperature, and a hundred thousand times more than an erasure at 0.01K (the current achievable limit for large objects)
  2. Low temperatures are required for most superconducting materials regardless.
  3. The delicate coherence required for practical quantum computation requires or works best at ultra low temperatures.
At a more abstract level, the essence of computation is precise control over the physical configurations of a device as it undergoes complex state transitions.  Noise/entropy is the enemy of control, and temperature is a form of noise.  

Assuming large scale quantum computing is possible, then the ultimate computer is thus a reversible massively entangled quantum device operating at absolute zero.  Unfortunately, such a device would be delicate to a degree that is hard to imagine - even a single misplaced high energy particle could cause enormous damage.

In this model, advanced computational civilization would take the form of a compact body (anywhere from asteroid to planet size) that employs layers of sophisticated shielding to deflect as much of the incoming particle flux as possible.  The ideal environment for such a device is as far away from hot stars as one can possibly go, and the farther the better.  The extreme energy efficiency of advanced low temperature reversible/quantum computing implies that energy is not a constraint.  These advanced civilizations could probably power themselves using fusion reactors for millions, if not billions, of years.

Stellar Escape Trajectories

For a cold-tech civilization, one interesting long term strategy involves escaping the local star's orbit to reach the colder interstellar medium, and eventually the intergalactic medium.

If we assume that these future civs have long planning horizons (reasonable), we can consider this an investment that has an initial cost in terms of the energy required to achieve escape velocity and a return measured in the future integral of computation gained over the trajectory due to increased energy efficiency.  Expendable boost mass in the system can be used, and domino chains of complex chaotic gravitational assist maneuvers computed by deep simulations may offer a route to expel large objects using reasonable amounts of energy.[3]

The Great Game 

Given the constraints of known physics (ie no FTL), it appears that the computational brains housing more advanced cold-tech civs will be incredibly vulnerable to hostile aliens.  A relativistic kill vehicle is a simple technology that permits little avenue for direct defense.  The only strong defense is stealth.

Although the utility functions and ethics of future civs are highly speculative, we can observe that a very large space of utility functions lead to similar convergent instrumental goals involving control over one's immediate future light cone.  If we assume that some civs are essentially selfish, then the dynamics suggest successful strategies will involve stealth and deception to avoid detection combined with deep simulation sleuthing to discover potential alien civs and their locations.

If two civs both discover each other's locations around the same time, then MAD (mutually assured destruction) dynamics takeover and cooperation has stronger benefits.  The vast distances involve suggest that one sided discoveries are more likely.

Spheres of Influence

A new civ, upon achieving the early postbiological stage of development (earth in say 2050?), should be able to resolve the general answer to the fermi paradox using advanced deep simulation alone - long before any probes would reach distant stars.  Assuming that the answer is "lots of aliens", then further simulations could be used to estimate the relative likelihood of elder civs interacting with the past lightcone.  

The first few civilizations would presumably realize that the galaxy is more likely to be mostly colonized, in which case the ideal strategy probably involves expansion of actuator type devices (probes, construction machines) into nearby systems combined with construction and expulsion of advanced stealthed coldtech brains out into the void.  On the other hand, the very nature of the stealth strategy suggests that it may be hard to confidently determine how colonized the galaxy is. 

For civilizations appearing later, the situation is more complex.  The younger a civ estimates itself to be in the cosmic order, the more likely it becomes that it's local system has already come under an alien influence.

From the perspective of an elder civ, an alien planet at a pre-singularity level of development has no immediate value.  Raw materials are plentiful - and most of the baryonic mass appears to be interstellar and free floating.  The tiny relative value of any raw materials on a biological world are probably outweighed - in the long run - by the potential future value of information trade with the resulting mature civ.

Each biological world - or seed of a future elder civ - although perhaps similar in abstract, is unique in details.  Each such world is valuable in the potential unique knowledge/insights it may eventually generate - directly or indirectly.  From a pure instrumental rational standpoint, there is some value in preserving biological worlds to increase general knowledge of civ development trajectories.

However, there could exist cases where the elder civ may wish to intervene.  For example, if deep simulations predict that the younger world will probably develop into something unfriendly - like an aggressive selfish/unfriendly replicator - then small pertubations in the natural trajectory could be called for.  In short the elder civ may have reasons to occasionally 'play god'.

On the other hand, any intervention itself would leave a detectable signature or trace in the historical trajectory which in turn could be detected by another rival or enemy civ!  In the best case these clues would only reveal the presence of an alien influence.  In the worst case they could reveal information concerning the intervening elder civ's home system and the likely locations of its key assets.

Around 70,000 years ago, we had a close encounter with Scholz's star, which passed with 0.8 light years of the sun (within the oort cloud).  If the galaxy is well colonized, flybys such as this have potentially interesting implications  (that particular flyby corresponds to the estimated time of the Toba super-eruption, for example).

Conditioning on our Observational Data

Over the last few decades SETI has searched a small portion of the parameter space covering potential alien civs.  

SETI's original main focus concerned the detection of large permanent alien radio beacons.  We can reasonably rule out models that predict advanced civs constructing high energy omnidirectional radio beacons.

At this point we can also mostly rule out large hot-tech civilizations (energy constrained civilizations) that harvest most of the energy from stars.

Obviously detecting cold-tech civilizations is considerably more difficult, and perhaps close to impossible if advanced stealth is a convergent strategy.

However, determining whether the galaxy as a whole is colonized by advanced stealth civs is a much easier problem.  In fact, one way or another the evidence is already right in front of us.  We now know that most of the mass in the galaxy is dark rather than light.  I have assumed that coldtech still involves baryonic matter and normal physics, but of course there is also the possibility that non-baryonic matter could be used for computation.  Either way, the dark matter situation is favorable.  Focusing on normal baryonic matter, the ratio of dark/cold to light/hot is still large - very favorable for colonization.

Observational Selection Effects

All advanced civs will have strong instrumental reasons to employ deep simulations to understand and model developmental trajectories for the galaxy as a whole and for civilizations in particular.  A very likely consequence is the production of large numbers of simulated conscious observers, ala the Simulation Argument.  Universes with the more advanced low temperature reversible/quantum computing civilizations will tend to produce many more simulated observer moments and are thus intrinsically more likely than one would otherwise expect - perhaps massively so.


Rogue Planets

If the galaxy is already colonized by stealthed coldtech civs, then one prediction is that some fraction of the stellar mass has been artificially ejected.  Some recent observations actually point - at least weakly - in this direction.

From "Nomads of The Galaxy"[4]

We estimate that there may be up to ∼ 10^5 compact objects in the mass range 10^−8 to 10^−2M⊙
per main sequence star that are unbound to a host star in the Galaxy. We refer to these objects as
nomads; in the literature a subset of these are sometimes called free-floating or rogue planets.

Although the error range is still large, it appears that free floating planets outnumber planets bound to stars, and perhaps by a rather large margin.

Assuming the galaxy is colonized:  It could be that rogue planets form naturally outside of stars and then are colonized.  It could be they form around stars and then are ejected naturally (and colonized).  Artificial ejection - even if true - may be a rare event.  Or not.  But at least a few of these options could potentially be differentiated with future observations - for example if we find an interesting discrepancy in the rogue planet distribution predicted by simulations (which obviously do not yet include aliens!) and actual observations.

Also: if rogue planets outnumber stars by a large margin, then it follows that rogue planet flybys are more common in proportion.



SETI to date allows us to exclude some regions of the parameter space for alien civs, but the regions excluded correspond to low prior probability models anyway, based on the postbiological perspective on the future of life.  The most interesting regions of the parameter space probably involve advanced stealthy aliens in the form of small compact cold objects floating in the interstellar medium.

The upcoming WFIST telescope should shed more light on dark matter and enhance our microlensing detection abilities significantly.  Sadly, it's planned launch date isn't until 2024.  Space development is slow.


High impact from low impact

5 Stuart_Armstrong 17 April 2015 04:01PM

Part of the problem with a reduced impact AI is that it will, by definition, only have a reduced impact.

Some of the designs try and get around the problem by allowing a special "output channel" on which impact can be large. But that feels like cheating. Here is a design that accomplishes the same without using that kind of hack.

Imagine there is an asteroid that will hit the Earth, and we have a laser that could destroy it. But we need to aim the laser properly, so need coordinates. There is a reduced impact AI that is motivated to give the coordinates correctly, but also motivated to have reduced impact - and saving the planet from an asteroid with certainty is not reduced impact.

Now imagine that instead there are two AIs, X and Y. By abuse of notation, let ¬X refer to the event that the output signal from X is scrambled away from the the original output.

Then we ask X to give us the x-coordinates for the laser, under the assumption of ¬Y (that AI Y's signal will be scrambled). Similarly, we Y to give us the y-coordinates of the laser, under the assumption ¬X.

Then X will reason "since ¬Y, the laser will certainly miss its target, as the y-coordinates will be wrong. Therefore it is reduced impact to output the correct x-coordinates, so I shall." Similarly, Y will output the right y-coordinates, the laser will fire and destroy the asteroid, having a huge impact, hooray!

The approach is not fully general yet, because we can have "subagent problems". X could create an agent that behave nicely given ¬Y (the assumption it was given), but completely crazily given Y (the reality). But it shows how we could get high impact from slight tweaks to reduced impact.

EDIT: For those worried about lying to the AIs, do recall http://lesswrong.com/r/discussion/lw/lyh/utility_vs_probability_idea_synthesis/ and http://lesswrong.com/lw/ltf/false_thermodynamic_miracles/

Concept Safety: The problem of alien concepts

14 Kaj_Sotala 17 April 2015 02:09PM

I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.

In the previous post in this series, I talked about how one might get an AI to have similar concepts as humans. However, one would intuitively assume that a superintelligent AI might eventually develop the capability to entertain far more sophisticated concepts than humans would ever be capable of having. Is that a problem?

Just what are concepts, anyway?

To answer the question, we first need to define what exactly it is that we mean by a "concept", and why exactly more sophisticated concepts would be a problem.

Unfortunately, there isn't really any standard definition of this in the literature, with different theorists having different definitions. Machery even argues that the term "concept" doesn't refer to a natural kind, and that we should just get rid of the whole term. If nothing else, this definition from Kruschke (2008) is at least amusing:

Models of categorization are usually designed to address data from laboratory experiments, so “categorization” might be best defined as the class of behavioral data generated by experiments that ostensibly study categorization.

Because I don't really have the time to survey the whole literature and try to come up with one grand theory of the subject, I will for now limit my scope and only consider two compatible definitions of the term.

Definition 1: Concepts as multimodal neural representations. I touched upon this definition in the last post, where I mentioned studies indicating that the brain seems to have shared neural representations for e.g. the touch and sight of a banana. Current neuroscience seems to indicate the existence of brain areas where representations from several different senses are combined together into higher-level representations, and where the activation of any such higher-level representation will also end up activating the lower sense modalities in turn. As summarized by Man et al. (2013):

Briefly, the Damasio framework proposes an architecture of convergence-divergence zones (CDZ) and a mechanism of time-locked retroactivation. Convergence-divergence zones are arranged in a multi-level hierarchy, with higher-level CDZs being both sensitive to, and capable of reinstating, specific patterns of activity in lower-level CDZs. Successive levels of CDZs are tuned to detect increasingly complex features. Each more-complex feature is defined by the conjunction and configuration of multiple less-complex features detected by the preceding level. CDZs at the highest levels of the hierarchy achieve the highest level of semantic and contextual integration, across all sensory modalities. At the foundations of the hierarchy lie the early sensory cortices, each containing a mapped (i.e., retinotopic, tonotopic, or somatotopic) representation of sensory space. When a CDZ is activated by an input pattern that resembles the template for which it has been tuned, it retro-activates the template pattern of lower-level CDZs. This continues down the hierarchy of CDZs, resulting in an ensemble of well-specified and time-locked activity extending to the early sensory cortices.

On this account, my mental concept for "dog" consists of a neural activation pattern making up the sight, sound, etc. of some dog - either a generic prototypical dog or some more specific dog. Likely the pattern is not just limited to sensory information, either, but may be associated with e.g. motor programs related to dogs. For example, the program for throwing a ball for the dog to fetch. One version of this hypothesis, the Perceptual Symbol Systems account, calls such multimodal representations simulators, and describes them as follows (Niedenthal et al. 2005):

A simulator integrates the modality-specific content of a category across instances and provides the ability to identify items encountered subsequently as instances of the same category. Consider a simulator for the social category, politician. Following exposure to different politicians, visual information about how typical politicians look (i.e., based on their typical age, sex, and role constraints on their dress and their facial expressions) becomes integrated in the simulator, along with auditory information for how they typically sound when they talk (or scream or grovel), motor programs for interacting with them, typical emotional responses induced in interactions or exposures to them, and so forth. The consequence is a system distributed throughout the brain’s feature and association areas that essentially represents knowledge of the social category, politician.

The inclusion of such "extra-sensory" features helps understand how even abstract concepts could fit this framework: for example, one's understanding of the concept of a derivative might be partially linked to the procedural programs one has developed while solving derivatives. For a more detailed hypothesis of how abstract mathematics may emerge from basic sensory and motor programs and concepts, I recommend Lakoff & Nuñez (2001).

Definition 2: Concepts as areas in a psychological space. This definition, while being compatible with the previous one, looks at concepts more "from the inside". Gärdenfors (2000) defines the basic building blocks of a psychological conceptual space to be various quality dimensions, such as temperature, weight, brightness, pitch, and the spatial dimensions of height, width, and depth. These are psychological in the sense of being derived from our phenomenal experience of certain kinds of properties, rather than the way in which they might exist in some objective reality.

For example, one way of modeling the psychological sense of color is via a color space defined by the quality dimensions of hue (represented by the familiar color circle), chromaticness (saturation), and brightness.

The second phenomenal dimension of color is chromaticness (saturation), which ranges from grey (zero color intensity) to increasingly greater intensities. This dimension is isomorphic to an interval of the real line. The third dimension is brightness which varies from white to black and is thus a linear dimension with two end points. The two latter dimensions are not totally independent, since the possible variation of the chromaticness dimension decreases as the values of the brightness dimension approaches the extreme points of black and white, respectively. In other words, for an almost white or almost black color, there can be very little variation in its chromaticness. This is modeled by letting that chromaticness and brightness dimension together generate a triangular representation ... Together these three dimensions, one with circular structure and two with linear, make up the color space. This space is often illustrated by the so called color spindle

This kind of a representation is different from the physical wavelength representation of color, where e.g. the hue is mostly related to the wavelength of the color. The wavelength representation of hue would be linear, but due to the properties of the human visual system, the psychological representation of hue is circular.

Gärdenfors defines two quality dimensions to be integral if a value cannot be given for an object on one dimension without also giving it a value for the other dimension: for example, an object cannot be given a hue value without also giving it a brightness value. Dimensions that are not integral with each other are separable. A conceptual domain is a set of integral dimensions that are separable from all other dimensions: for example, the three color-dimensions form the domain of color.

From these definitions, Gärdenfors develops a theory of concepts where more complicated conceptual spaces can be formed by combining lower-level domains. Concepts, then, are particular regions in these conceptual spaces: for example, the concept of "blue" can be defined as a particular region in the domain of color. Notice that the notion of various combinations of basic perceptual domains making more complicated conceptual spaces possible fits well together with the models discussed in our previous definition. There more complicated concepts were made possible by combining basic neural representations for e.g. different sensory modalities.

The origin of the different quality dimensions could also emerge from the specific properties of the different simulators, as in PSS theory.

Thus definition #1 allows us to talk about what a concept might "look like from the outside", with definition #2 talking about what the same concept might "look like from the inside".

Interestingly, Gärdenfors hypothesizes that much of the work involved with learning new concepts has to do with learning new quality dimensions to fit into one's conceptual space, and that once this is done, all that remains is the comparatively much simpler task of just dividing up the new domain to match seen examples.

For example, consider the (phenomenal) dimension of volume. The experiments on "conservation" performed by Piaget and his followers indicate that small children have no separate representation of volume; they confuse the volume of a liquid with the height of the liquid in its container. It is only at about the age of five years that they learn to represent the two dimensions separately. Similarly, three- and four-year-olds confuse high with tall, big with bright, and so forth (Carey 1978).

The problem of alien concepts

With these definitions for concepts, we can now consider what problems would follow if we started off with a very human-like AI that had the same concepts as we did, but then expanded its conceptual space to allow for entirely new kinds of concepts. This could happen if it self-modified to have new kinds of sensory or thought modalities that it could associate its existing concepts with, thus developing new kinds of quality dimensions.

An analogy helps demonstrate this problem: suppose that you're operating in a two-dimensional space, where a rectangle has been drawn to mark a certain area as "forbidden" or "allowed". Say that you're an inhabitant of Flatland. But then you suddenly become aware that actually, the world is three-dimensional, and has a height dimension as well! That raises the question of, how should the "forbidden" or "allowed" area be understood in this new three-dimensional world? Do the walls of the rectangle extend infinitely in the height dimension, or perhaps just some certain distance in it? If just a certain distance, does the rectangle have a "roof" or "floor", or can you just enter (or leave) the rectangle from the top or the bottom? There doesn't seem to be any clear way to tell.

As a historical curiosity, this dilemma actually kind of really happened when airplanes were invented: could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? Courts and legislation eventually settled on the latter answer. A more AI-relevant example might be if one was trying to limit the AI with rules such as "stay within this box here", and the AI then gained an intuitive understanding of quantum mechanics, which might allow it to escape from the box without violating the rule in terms of its new concept space.

More generally, if previously your concepts had N dimensions and now they have N+1, you might find something that fulfilled all the previous criteria while still being different from what we'd prefer if we knew about the N+1th dimension.

In the next post, I will present some (very preliminary and probably wrong) ideas for solving this problem.

Next post in series: What are concepts for, and how to deal with alien concepts.

Effective effective altruism: Get $400 off your next charity donation

9 Baisius 17 April 2015 05:45AM

For those of you unfamiliar with Churning, it's the practice of signing up for a rewards credit card, spending enough with your everyday purchases to get the (usually significant) reward and then cancelling it. Many of these cards are cards with annual fees (which is commonly waived and/or the one-time reward will pay for). For a nominal amount of work, you can churn cards for significant bonuses.

Ordinarily I wouldn't come close to spending enough money to qualify for many of these rewards, but I recently made the Giving What You Can pledge. I now have a steady stream of predictable expenses, and conveniently, GiveWell allows donations via most any credit card. I've started using new rewards cards to pay for these expenses each time, resulting in free flights (this is how I'm paying to fly to NYC this summer), Amazon gift cards, or sometimes just straight cash.

Since the first of the year (total expenses $4000, including some personal expenses) I've churned $700 worth of bonuses (from a Delta Amazon Express Gold and a Capital One Venture Card). This money can be redonated, saved, spent, or whatever.


1. I hope it goes without saying that you should pay off your balance in full each month, just like you should with any other card.

2. This has some negative impact on your credit, in the short term.

3. It should be noted that credit card companies make at least some money (I think 3%) off of your transactions, so if you're trying to hit a target of X% to charity, you would need to donate X/0.97, or 10.31% for 10% to account for that 3%. The reward should more than cover it.

4. Read more about this, including the pros and cons, from multiple sources before you try it. It's not something that should be done lightly, but does synergize very nicely with charity donations.

Rationality Reading Group - Introduction and A: Predictably Wrong

10 Mark_Friedenbach 17 April 2015 01:40AM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.

Welcome to the Rationality reading group. This week we discuss the Preface by primary author Eliezer Yudkowsky, Introduction by editor & co-author Rob Bensinger, and the first sequence: Predictably Wrong. This sequence introduces the methods of rationality, including its two major applications: the search for truth and the art of winning. The desire to seek truth is motivated, and a few obstacles to seeking truth--systematic errors, or biases--are discussed in detail.

This post summarizes each article of the sequence, linking to the original LessWrong posting where available, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

Reading: Preface, Biases: An Introduction, and Sequence A: Predictably Wrong (pi-xxxv and p1-42)


Preface. Introduction to the ebook compilation by Eliezer Yudkowsky. Retrospectively identifies mistakes of the text as originally presented. Some have been corrected in the ebook, others stand as-is. Most notably the book focuses too much on belief, and too little on practical actions, especially with respect to our everyday lives. Establishes that the goal of the project is to teach rationality, those ways of thinking which are common among practicing scientists and the foundation of the Enlightenment, yet not systematically organized or taught in schools (yet).

Biases: An Introduction. Editor & co-author Rob Bensinger motivates the subject of rationality by explaining the dangers of systematic errors caused by *cognitive biases*, which the arts of rationality are intended to de-bias. Rationality is not about Spock-like stoicism -- it is about simply "doing the best you can with what you've got." The System 1 / System 2 dual process dichotomy is explained: if our errors are systematic and predictable, then we can instil behaviors and habits to correct them. A number of exemplar biases are presented. However a warning: it is difficult to recognize biases in your own thinking even after learning of them, and knowing about a bias may grant unjustified overconfidence that you yourself do not fall pray to such mistakes in your thinking. To develop as a rationalist actual experience is required, not just learned expertise / knowledge. Ends with an introduction of the editor and an overview of the organization of the book.

A. Predictably Wrong

1. What do I mean by "rationality"? Rationality is a systematic means of forming true beliefs and making winning decisions. Probability theory is the set of laws underlying rational belief, "epistemic rationality": it describes how to process evidence and observations to revise ("update") one's beliefs. Decision theory is the set of laws underlying rational action, "instrumental rationality", independent of what one's goals and available options are. (p7-11)

2. Feeling rational. Becoming more rational can diminish feelings or intensify them. If one cares about the state of the world, it is expected that he or she should have an emotional response to the acquisition of truth. "That which can be destroyed by the truth should be," but also "that which the truth nourishes should thrive." The commonly perceived dichotomy between emotions and "rationality" [sic] is more often about fast perceptual judgements (System 1, emotional) vs slow deliberative judgements (System 2, "rational" [sic]). But both systems can serve the goal of truth, or defeat it, depending on how they are used. (p12-14)

3. Why truth? and... Why seek the truth? Curiosity: to satisfy an emotional need to know. Pragmatism: to accomplish some specific real-world goal. Morality: to be virtuous, or fulfill a duty to truth. Curiosity motivates a search for the most intriguing truths, pragmatism the most useful, and morality the most important. But be wary of the moral justification: "To make rationality into a moral duty is to give it all the dreadful degrees of freedom of an arbitrary tribal custom. People arrive at the wrong answer, and then indignantly protest that they acted with propriety, rather than learning from their mistake." (p15-18)

4. ...what's a bias, again? A bias is an obstacle to truth, specifically those obstacles which are produced by our own thinking processes. We describe biases as failure modes which systematically prevent typical human beings from determining truth or selecting actions that would have best achieved their goals. Biases are distinguished from mistakes which originate from false beliefs or brain injury. Do better seek truth and achieve our goals we must identify our biases and do what we can to correct for or eliminate them. (p19-22)

5. Availability. The availability heuristic is judging the frequency or probability of an event by the ease with which examples of the event come to mind. If you think you've heard about murders twice as much as suicides then you might suppose that murder is twice as common as suicide, when in fact the opposite is true. Use of the availability heuristic gives rise to the absurdity bias: events that have never happened are not recalled, and hence deemed to have no probability of occurring. In general, memory is not always a good guide to probabilities in the past, let alone to the future. (p23-25)

6. Burdensome details. The conjunction fallacy is when humans rate the probability of two events has higher than the probability of either event alone: adding detail can make a scenario sound more plausible, even though the event as described necessarily becomes less probable. Possible fixes include training yourself to notice the addition of details and discount appropriately, thinking about other reasons why the central idea could be true other than the added detail, or training oneself to hold a preference for simpler explanations -- to feel every added detail as a burden. (p26-29)

7. Planning fallacy. The planning fallacy is the mistaken belief that human beings are capable of making accurate plans. The source of the error is that we tend to imagine how things will turn out if everything goes according to plan, and do not appropriately account for possible troubles or difficulties along the way. The typically adequate solution is to compare the new project to broadly similar previous projects undertaken in the past, and ask how long those took to complete. (p30-33)

8. Illusion of transparency: why no one understands you. The illusion of transparency is our bias to assume that others will understand the intent behind our attempts to communicate. The source of the error is that we do not sufficiently consider alternative frames of mind or personal histories, which might lead the recipient to alternative interpretations. Be not too quick to blame those who misunderstand your perfectly clear sentences, spoken or written. Chances are, your words are more ambiguous than you think. (p34-36)

9. Expecting short inferential distances. Human beings are generally capable of processing only one piece of new information at at time. Worse, someone who says something with no obvious support is a liar or an idiot, and if you say something blatantly obvious and the other person doesn't see it, they're the idiot. This is our bias towards explanations of short inferential distance. A clear argument has to lay out an inferential pathway, starting from what the audience already knows or accepts. If at any point you make a statement without obvious justification in arguments you've previously supported, the audience just thinks you're crazy. (p37-39)

10. The lens that sees its own flaws. We humans have the ability to introspect our own thinking processes, a seemingly unique skill among life on Earth. As consequence, a human brain is able to understand its own flaws--its systematic errors, its biases--and apply second-order corrections to them. (p40-42)

It is at this point that I would generally like to present an opposing viewpoint. However I must say that this first introductory sequence is not very controversial! Educational, yes, but not controversial. If anyone can provide a link or citation to one or more decent non-strawman arguments which oppose any of the ideas of this introduction and first sequence, please do so in the comments. I certainly encourage awarding karma to anyone that can do a reasonable job steel-manning an opposing viewpoint.

This has been a collection of notes on the assigned sequence for this week. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Sequence B: Fake Beliefs (p43-77). The discussion will go live on Wednesday, 6 May 2015 at or around 6pm PDT, right here on the discussion forum of LessWrong.

LessWrong experience on Alcohol

1 Elo 17 April 2015 01:19AM

following on from this thread: 


User Algon asked:
I don't drink alcohol, but is it really all that? I just assumed that most people have alcoholic beverages for the 'buzz'/intoxication.


I related my experience:

I have come to the conclusion that I taste things differently to a large subset of the population. I have a very sweet tooth and am very sensitive to bitter flavours.

I don't eat olives, most alcohol only tastes like the alcoholic aftertaste (which apparently some people don't taste) - imagine the strongest burning taste of the purest alcohol you have tasted, some people never taste that, I taste it with nearly every alcoholic beverage. Beer is usually awfully bitter too.

The only wine I could ever bother to drink is desert wine (its very sweet) and only slowly. (or also a half shot of rum and maple syrup)

Having said all this - yes; some people love their alcoholic beverages for their flavours.


I am wondering what the sensory experience of other LW users is of alcohol.  Do you drink (if not why not?)?  Do you have specific preferences? Do you have a particular pallet for foods (probably relevant)?


I hypothesise a lower proportion of drinkers than the rest of the population.  (subject of course to cultural norms where you come from)


Edit: I will make another post in a week about taste preferences because (as we probably already know) human tastes vary. I did want to mention that I avoid spicy things except for sweet chilli which is not spicy at all.  And I don't drink coffee (because it tastes bad and I am always very awake and never need caffeine to wake me up). I am also quite sure I am a super-taster but wanted to not use that word for concern that the jargon might confuse people who don't yet know about it.

Thanks for all the responses!  This has been really interesting and exactly what I expected (number of posts)!  

In regards to experiences, I would mention that heavy drinking is linked with nearly every health problem you could think of and I am surprised we had a selection of several heavy drinkers (to those who are heavy drinkers I would suggest reading about the health implications and reconsidering the lifestyle, it sounds like most of you are not addicted).  about the heavy drinkers - I suspect that is not representative of average, but rather the people who feel they are outliers decided to mention their cases (of people who did not reply; there are probably none or very few heavy drinkers, whereas there are probably some who did not reply and are light drinkers or did not reply and don't drink).

I hope to reply to a bunch of the comments and should get to it in the next few days.

Thank you again!  Maybe this should be included on the next survey...

Edit 2: follow up post -http://lesswrong.com/r/discussion/lw/m3j/tally_of_lesswrong_experience_on_alcohol/

Why isn't the following decision theory optimal?

5 internety 16 April 2015 01:38AM


I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong.

In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes

By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment.

Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct.

Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case.

Why can't we instead construct Gloria directly, using the idea of the thing that CDT agents wished they were? Obviously we can't just postulate a decision algorithm that we don't know how to execute, and then note that a CDT agent would wish they had that decision algorithm, and pretend we had solved the problem. We need to be able to describe the ideal decision algorithm to a level of detail that we could theoretically program into an AI.

Consider this decision algorithm, which I'll temporarily call Nameless Decision Theory (NDT) until I get feedback about whether it deserves a name: you should always make the decision that a CDT-agent would have wished he had pre-committed to, if he had previously known he'd be in his current situation and had the opportunity to precommit to a decision. 

In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you.

NDT is so simple, and Eliezer comes so close to stating it in his discussion of Gloria, that I assume there is some flaw with it that I'm not seeing. Perhaps NDT does not count as a "real"/"well defined" decision procedure, or can't be formalized for some reason? Even so, it does seem like it'd be possible to program an AI to behave in this way.

Can someone give an example of a decision problem for which this decision procedure fails? Or for which there are multiple possible precommitments that you would have wished you'd made and it's not clear which one is best?

EDIT: I now think this definition of NDT better captures what I was trying to express: You should always make the decision that a CDT-agent would have wished he had precommitted to, if he had previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.


Cooperative conversational threading

25 philh 15 April 2015 06:40PM

(Cross-posted from my blog.)

Sometimes at LW meetups, I'll want to raise a topic for discussion. But we're currently already talking about something, so I'll wait for a lull in the current conversation. But it feels like the duration of lull needed before I can bring up something totally unrelated, is longer than the duration of lull before someone else will bring up something marginally related. And so we can go for a long time, with the topic frequently changing incidentally, but without me ever having a chance to change it deliberately.

Which is fine. I shouldn't expect people to want to talk about something just because I want to talk about it, and it's not as if I find the actual conversation boring. But it's not necessarily optimal. People might in fact want to talk about the same thing as me, and following the path of least resistance in a conversation is unlikely to result in the best possible conversation.

At the last meetup I had two topics that I wanted to raise, and realized that I had no way of raising them, which was a third topic worth raising. So when an interruption occured in the middle of someone's thought - a new person arrived, and we did the "hi, welcome, join us" thing - I jumped in. "Before you start again, I have three things I'd like to talk about at some point, but not now. Carry on." Then he started again, and when that topic was reasonably well-trodden, he prompted me to transition.

Then someone else said that he also had two things he wanted to talk about, and could I just list my topics and then he'd list his? (It turns out that no I couldn't. You can't dangle an interesting train of thought in front of the London LW group and expect them not to follow it. But we did manage to initially discuss them only briefly.)

This worked pretty well. Someone more conversationally assertive than me might have been able to take advantage of a less solid interruption than the one I used. Someone less assertive might not have been able to use that one.

What else could we do to solve this problem?

Someone suggested a hand signal: if you think of something that you'd like to raise for discussion later, make the signal. I don't think this is ideal, because it's not continuous. You make it once, and then it would be easy for people to forget, or just to not notice.

I think what I'm going to do is bring some poker chips to the next meetup. I'll put a bunch in the middle, and if you have a topic that you want to raise at some future point, you take one and put it in front of you. Then if a topic seems to be dying out, someone can say "<person>, what did you want to talk about?"

I guess this still needs at least one person assertive enough to do that. I imagine it would be difficult for me. But the person who wants to raise the topic doesn't need to be assertive, they just need to grab a poker chip. It's a fairly obvious gesture, so probably people will notice, and it's easy to just look and see for a reminder of whether anyone wants to raise anything. (Assuming the table isn't too messy, which might be a problem.)

I don't know how well this will work, but it seems worth experimenting.

(I'll also take a moment to advocate another conversation-signal that we adopted, via CFAR. If someone says something and you want to tell people that you agree with them, instead of saying that out loud, you can just raise your hands a little and wiggle your fingers. Reduces interruptions, gives positive feedback to the speaker, and it's kind of fun.)

Concept Safety: Producing similar AI-human concept spaces

27 Kaj_Sotala 14 April 2015 08:39PM

I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.

A frequently-raised worry about AI is that it may reason in ways which are very different from us, and understand the world in a very alien manner. For example, Armstrong, Sandberg & Bostrom (2012) consider the possibility of restricting an AI via "rule-based motivational control" and programming it to follow restrictions like "stay within this lead box here", but they raise worries about the difficulty of rigorously defining "this lead box here". To address this, they go on to consider the possibility of making an AI internalize human concepts via feedback, with the AI being told whether or not some behavior is good or bad and then constructing a corresponding world-model based on that. The authors are however worried that this may fail, because

Humans seem quite adept at constructing the correct generalisations – most of us have correctly deduced what we should/should not be doing in general situations (whether or not we follow those rules). But humans share a common of genetic design, which the OAI would likely not have. Sharing, for instance, derives partially from genetic predisposition to reciprocal altruism: the OAI may not integrate the same concept as a human child would. Though reinforcement learning has a good track record, it is neither a panacea nor a guarantee that the OAIs generalisations agree with ours.

Addressing this, a possibility that I raised in Sotala (2015) was that possibly the concept-learning mechanisms in the human brain are actually relatively simple, and that we could replicate the human concept learning process by replicating those rules. I'll start this post by discussing a closely related hypothesis: that given a specific learning or reasoning task and a certain kind of data, there is an optimal way to organize the data that will naturally emerge. If this were the case, then AI and human reasoning might naturally tend to learn the same kinds of concepts, even if they were using very different mechanisms. Later on the post, I will discuss how one might try to verify that similar representations had in fact been learned, and how to set up a system to make them even more similar.

Word embedding

"Left panel shows vector offsets for three word pairs illustrating the gender relation. Right panel shows a different projection, and the singular/plural relation for two words. In high-dimensional space, multiple relations can be embedded for a single word." (Mikolov et al. 2013)A particularly fascinating branch of recent research relates to the learning of word embeddings, which are mappings of words to very high-dimensional vectors. It turns out that if you train a system on one of several kinds of tasks, such as being able to classify sentences as valid or invalid, this builds up a space of word vectors that reflects the relationships between the words. For example, there seems to be a male/female dimension to words, so that there's a "female vector" that we can add to the word "man" to get "woman" - or, equivalently, which we can subtract from "woman" to get "man". And it so happens (Mikolov, Yih & Zweig 2013) that we can also get from the word "king" to the word "queen" by adding the same vector to "king". In general, we can (roughly) get to the male/female version of any word vector by adding or subtracting this one difference vector!

Why would this happen? Well, a learner that needs to classify sentences as valid or invalid needs to classify the sentence "the king sat on his throne" as valid while classifying the sentence "the king sat on her throne" as invalid. So including a gender dimension on the built-up representation makes sense.

But gender isn't the only kind of relationship that gets reflected in the geometry of the word space. Here are a few more:

It turns out (Mikolov et al. 2013) that with the right kind of training mechanism, a lot of relationships that we're intuitively aware of become automatically learned and represented in the concept geometry. And like Olah (2014) comments:

It’s important to appreciate that all of these properties of W are side effects. We didn’t try to have similar words be close together. We didn’t try to have analogies encoded with difference vectors. All we tried to do was perform a simple task, like predicting whether a sentence was valid. These properties more or less popped out of the optimization process.

This seems to be a great strength of neural networks: they learn better ways to represent data, automatically. Representing data well, in turn, seems to be essential to success at many machine learning problems. Word embeddings are just a particularly striking example of learning a representation.

It gets even more interesting, for we can use these for translation. Since Olah has already written an excellent exposition of this, I'll just quote him:

We can learn to embed words from two different languages in a single, shared space. In this case, we learn to embed English and Mandarin Chinese words in the same space.

We train two word embeddings, Wen and Wzh in a manner similar to how we did above. However, we know that certain English words and Chinese words have similar meanings. So, we optimize for an additional property: words that we know are close translations should be close together.

Of course, we observe that the words we knew had similar meanings end up close together. Since we optimized for that, it’s not surprising. More interesting is that words we didn’t know were translations end up close together.

In light of our previous experiences with word embeddings, this may not seem too surprising. Word embeddings pull similar words together, so if an English and Chinese word we know to mean similar things are near each other, their synonyms will also end up near each other. We also know that things like gender differences tend to end up being represented with a constant difference vector. It seems like forcing enough points to line up should force these difference vectors to be the same in both the English and Chinese embeddings. A result of this would be that if we know that two male versions of words translate to each other, we should also get the female words to translate to each other.

Intuitively, it feels a bit like the two languages have a similar ‘shape’ and that by forcing them to line up at different points, they overlap and other points get pulled into the right positions.

After this, it gets even more interesting. Suppose you had this space of word vectors, and then you also had a system which translated images into vectors in the same space. If you have images of dogs, you put them near the word vector for dog. If you have images of Clippy you put them near word vector for "paperclip". And so on.

You do that, and then you take some class of images the image-classifier was never trained on, like images of cats. You ask it to place the cat-image somewhere in the vector space. Where does it end up? 

You guessed it: in the rough region of the "cat" words. Olah once more:

This was done by members of the Stanford group with only 8 known classes (and 2 unknown classes). The results are already quite impressive. But with so few known classes, there are very few points to interpolate the relationship between images and semantic space off of.

The Google group did a much larger version – instead of 8 categories, they used 1,000 – around the same time (Frome et al. (2013)) and has followed up with a new variation (Norouzi et al. (2014)). Both are based on a very powerful image classification model (from Krizehvsky et al. (2012)), but embed images into the word embedding space in different ways.

The results are impressive. While they may not get images of unknown classes to the precise vector representing that class, they are able to get to the right neighborhood. So, if you ask it to classify images of unknown classes and the classes are fairly different, it can distinguish between the different classes.

Even though I’ve never seen a Aesculapian snake or an Armadillo before, if you show me a picture of one and a picture of the other, I can tell you which is which because I have a general idea of what sort of animal is associated with each word. These networks can accomplish the same thing.

These algorithms made no attempt of being biologically realistic in any way. They didn't try classifying data the way the brain does it: they just tried classifying data using whatever worked. And it turned out that this was enough to start constructing a multimodal representation space where a lot of the relationships between entities were similar to the way humans understand the world.

How useful is this?

"Well, that's cool", you might now say. "But those word spaces were constructed from human linguistic data, for the purpose of predicting human sentences. Of course they're going to classify the world in the same way as humans do: they're basically learning the human representation of the world. That doesn't mean that an autonomously learning AI, with its own learning faculties and systems, is necessarily going to learn a similar internal representation, or to have similar concepts."

This is a fair criticism. But it is mildly suggestive of the possibility that an AI that was trained to understand the world via feedback from human operators would end up building a similar conceptual space. At least assuming that we chose the right learning algorithms.

When we train a language model to classify sentences by labeling some of them as valid and others as invalid, there's a hidden structure implicit in our answers: the structure of how we understand the world, and of how we think of the meaning of words. The language model extracts that hidden structure and begins to classify previously unseen things in terms of those implicit reasoning patterns. Similarly, if we gave an AI feedback about what kinds of actions counted as "leaving the box" and which ones didn't, there would be a certain way of viewing and conceptualizing the world implied by that feedback, one which the AI could learn.

Comparing representations

"Hmm, maaaaaaaaybe", is your skeptical answer. "But how would you ever know? Like, you can test the AI in your training situation, but how do you know that it's actually acquired a similar-enough representation and not something wildly off? And it's one thing to look at those vector spaces and claim that there are human-like relationships among the different items, but that's still a little hand-wavy. We don't actually know that the human brain does anything remotely similar to represent concepts."

Here we turn, for a moment, to neuroscience.

From Kaplan, Man & Greening (2015): "In this example, subjects either see or touch two classes of objects, apples and bananas. (A) First, a classifier is trained on the labeled patterns of neural activity evoked by seeing the two objects. (B) Next, the same classifier is given unlabeled data from when the subject touches the same objects and makes a prediction. If the classifier, which was trained on data from vision, can correctly identify the patterns evoked by touch, then we conclude that the representation is modality invariant."Multivariate Cross-Classification (MVCC) is a clever neuroscience methodology used for figuring out whether different neural representations of the same thing have something in common. For example, we may be interested in whether the visual and tactile representation of a banana have something in common.

We can test this by having several test subjects look at pictures of objects such as apples and bananas while sitting in a brain scanner. We then feed the scans of their brains into a machine learning classifier and teach it to distinguish between the neural activity of looking at an apple, versus the neural activity of looking at a banana. Next we have our test subjects (still sitting in the brain scanners) touch some bananas and apples, and ask our machine learning classifier to guess whether the resulting neural activity is the result of touching a banana or an apple. If the classifier - which has not been trained on the "touch" representations, only on the "sight" representations - manages to achieve a better-than-chance performance on this latter task, then we can conclude that the neural representation for e.g. "the sight of a banana" has something in common with the neural representation for "the touch of a banana".

A particularly fascinating experiment of this type is that of Shinkareva et al. (2011), who showed their test subjects both the written words for different tools and dwellings, and, separately, line-drawing images of the same tools and dwellings. A machine-learning classifier was both trained on image-evoked activity and made to predict word-evoked activity and vice versa, and achieved a high accuracy on category classification for both tasks. Even more interestingly, the representations seemed to be similar between subjects. Training the classifier on the word representations of all but one participant, and then having it classify the image representation of the left-out participant, also achieved a reliable (p<0.05) category classification for 8 out of 12 participants. This suggests a relatively similar concept space between humans of a similar background.

We can now hypothesize some ways of testing the similarity of the AI's concept space with that of humans. Possibly the most interesting one might be to develop a translation between a human's and an AI's internal representations of concepts. Take a human's neural activation when they're thinking of some concept, and then take the AI's internal activation when it is thinking of the same concept, and plot them in a shared space similar to the English-Mandarin translation. To what extent do the two concept geometries have similar shapes, allowing one to take a human's neural activation of the word "cat" to find the AI's internal representation of the word "cat"? To the extent that this is possible, one could probably establish that the two share highly similar concept systems.

One could also try to more explicitly optimize for such a similarity. For instance, one could train the AI to make predictions of different concepts, with the additional constraint that its internal representation must be such that a machine-learning classifier trained on a human's neural representations will correctly identify concept-clusters within the AI. This might force internal similarities on the representation beyond the ones that would already be formed from similarities in the data.

Next post in series: The problem of alien concepts.

Anti-Pascaline satisficer

3 Stuart_Armstrong 14 April 2015 06:49PM

It occurred to me that the anti-Pascaline agent design could be used as part of a satisficer approach.

The obvious thing to reduce dangerous optimisation pressure is to make a bounded utility function, with an easily achievable bound. Such as giving them a utility linear in paperclips that maxs out at 10.

The problem with this is that, if the entity is a maximiser (which it might become), it can never be sure that it's achieved its goals. Even after building 10 paperclips, and an extra 2 to be sure, and an extra 20 to be really sure, and an extra 3^^^3 to be really really sure, and extra cameras to count them, with redundant robots patrolling the cameras to make sure that they're all behaving well, etc... There's still an ε chance that it might have just dreamed this, say, or that its memory is faulty. So it has a current utility of (1-ε)10, and can increase this by reducing ε - hence by building even more paperclips.

Hum... ε, you say? This seems a place where the anti-Pascaline design could help. Here we would use it at the lower bound of utility. It currently has probability ε of having utility < 10 (ie it has not built 10 paperclips) and (1-ε) of having utility = 10. Therefore and anti-Pascaline agent with ε lower bound would round this off to 10, discounting the unlikely event that it has been deluded, and thus it has no need to build more paperclips or paperclip counting devices.

Note that this is an un-optimising approach, not an anti-optimising one, so the agent may still build more paperclips anyway - it just has no pressure to do so.

Un-optimised vs anti-optimised

6 Stuart_Armstrong 14 April 2015 06:30PM

This post contains no new insights; it just puts together some old insights in a format I hope is clearer.

Most satisficers are unoptimised (above the satisficing level): they have a limited drive to optimise and transform the universe. They may still end up optimising the universe anyway: they have no penalty for doing so (and sometimes it's a good idea for them). But if they can lazily achieve their goal, then they're ok with that too. So they simply have low optimisation pressure.

A safe "satisficer" design (or a reduced impact AI design) needs to be not only un-optimised, but specifically anti-optimised. It has to be setup so that "go out and optimise the universe" scores worse that "be lazy and achieve your goal". The problem is that these terms are undefined (as usual), that there are many minor actions that can optimise the universe (such as creating a subagent), and the approach has to be safe against all possible ways of optimising the universe - not just the "maximise u" for a specific and known u.

That's why the reduced impact/safe satisficer/anti-optimised designs are so hard: you have to add a very precise yet general (anti-)optimising pressure, rather than simply removing the current optimising pressure.

Could you tell me what's wrong with this?

1 Algon 14 April 2015 10:43AM

Edit: Some people have misunderstood my intentions here. I do not in any way expect this to be the NEXT GREAT IDEA. I just couldn't see anything wrong with this, which almost certainly meant there were gaps in my knowledge. I thought the fastest way to see where I went wrong would be to post my idea here and see what people say. I apologise for any confusion I caused. I'll try to be more clear next time.

(I really can't think of any major problems in this, so I'd be very grateful if you guys could tell me what I've done wrong). 

So, a while back I was listening to a discussion about the difficulty of making an FAI. One of the ways that was suggested to circumvent this was to go down the route of programming an AGI to solve FAI. Someone else pointed out the problems with this. Amongst other things one would have no idea what the AI will do in pursuit of its primary goal. Furthermore, it would already be a monumental task to program an AI whose primary goal is to solve the FAI problem; doing this is still easier than solving FAI, I should think. 

So, I started to think about this for a little while, and I thought 'how could you make this safer?' Well, first of, you don't want an AI who completely outclasses humanity in terms of intellect. If things went Wrong, you'd have little chance of stopping it. So, you want to limit the AI's intellect to genius level, so if something did go Wrong, then the AI would not be unstoppable. It may do quite a bit of damage, but a large group of intelligent people with a lot of resources on their hands could stop it. 

 Therefore, what must be done is that the AI cannot modify parts of its source code. You must try and stop an intelligence explosion from taking off. So, limited access to its source code, and a limit on how much computing power it can have on hand. This is problematic though, because the AI would not be able to solve FAI very quickly. After all, we have a few genius level people trying to solve FAI, and they're struggling with it, so why should a genius level computer do any better. Well, an AI would have fewer biases, and could accumulate much more expertise relevant to the task at hand. It would be about as capable as solving FAI as the most capable human could possibly be; perhaps even more so. Essentially, you'd get someone like Turing, Von Neumann, Newton and others all rolled into one working on FAI. 

 But, there's still another problem. The AI, if left for 20 years working on FAI for 20 years let's say, would have accumulated enough skills that it would be able to cause major problems if something went wrong. Sure, it would be as intelligent as Newton, but it would be far more skilled. Humanity fighting against it would be like sending a young Miyamoto Musashi against his future self at his zenith i.e. completely one sided. 

 What must be done then, is the AI must have a time limit of a few years (or less) and after that time is past, it is put to sleep. We look at what it accomplished, see what worked and what didn't, and boot up a fresh version of the AI with any required modifications, and tell it what the old AI did. Repeat the process for a few years, and we should end up with FAI solved. 

After that, we just make an FAI, and wake up the originals, since there's no point in killing them off at this point. 

 But there are still some problems. One, time. Why try this when we could solve FAI ourselves? Well, I would only try and implement something like this if it is clear that AGI will be solved before FAI is. A backup plan if you will. Second, what If FAI is just too much for people at our current level? Sure, we have guys who are one in ten thousand and better working on this, but what if we need someone who's one in a hundred billion? Someone who represents the peak of human ability? We shouldn't just wait around for them, since some idiot would probably just make an AGI thinking it would love us all anyway. 

 So, what do you guys think? As a plan, is this reasonable? Or have I just overlooked something completely obvious? I'm not saying that this would by easy in anyway, but it would be easier than solving FAI.

Translating bad advice

14 Sophronius 14 April 2015 09:20AM

While writing my Magnum Opus I came across this piece of writing advice by Neil Gaiman:

“When people tell you something’s wrong or doesn’t work for them, they are almost always right. When they tell you exactly what they think is wrong and how to fix it, they are almost always wrong.”

And it struck me how true it was, even in other areas of life. People are terrible at giving advice on how to improve yourself, or on how to improve anything really. To illustrate this, here is what you would expect advice from a good rationalist friend to look like:

1)      “Hey, I’ve noticed you tend to do X.”

2)      “It’s been bugging me for a while, though I’m not really sure why. It’s possible other people think X is bad as well, you should ask them about it.”

3)      Paragon option: “Maybe you could do Y instead? I dunno, just think about it.”  

4)      Renegade option: “From now on I will slap you every time you do X, in order to help you stop being retarded about X.”

I wish I had more friends who gave advice like that, especially the renegade option. Instead, here is what I get in practice:

1)      Thinking: Argh, he is doing X again. That annoys me, but I don’t want to be rude.

2)      Thinking: Okay, he is doing Z now, which is kind of like X and a good enough excuse to vent my anger about X

3)      *Complains about Z in an irritated manner, and immediately forgets that there’s even a difference between X and Z*

4)      Thinking: Oh shit, that was rude. I better give some arbitrary advice on how to fix Z so I sound more productive.

As you can see, social rules and poor epistemology really get in the way of good advice, which is incredibly frustrating if you genuinely want to improve yourself! (Needless to say, ignoring badly phrased advice is incredibly stupid and you should never do this. See HPMOR for a fictional example of what happens if you try to survive on your wits alone.) A naïve solution is to tell everybody that you are the sort of person who loves to hear criticism in the hope that they will tell you what they really think. This never works because A) Nobody will believe you since everyone says this and it’s always a lie, and B) It’s a lie, you hate hearing real criticism just like everybody else.

The best solution I have found is to make it a habit to translate bad advice into good advice, in the spirit of what Neil Gaiman said above: Always be on the lookout for people giving subtle clues that you are doing something wrong and ask them about it (preferably without making yourself sound insecure in the process, or they’ll just tell you that you need to be more confident). When they give you some bullshit response that is designed to sound nice, keep at it and convince them to give you their real reasons for bringing it up in the first place. Once you have recovered the original information that lead them to give the poor advice, you can rewrite it as good advice in the format used above. Here is an example from my own work experience:

1)      Bad advice person: “You know, you may have your truth, but someone else may have their own truth.”

2)      Me, confused and trying not to be angry at bad epistemology: “That’s interesting. What makes you say that?”

3)      *5 minutes later*. “Holy shit, my insecurity is being read as arrogance, and as a result people feel threatened by my intelligence which makes them defensive? I never knew that!”

Seriously, apply this lesson. And get a good friend to slap you every time you don’t.

Maybe we can perform the "Mary's Room" thought experiment

5 DavidPlumpton 14 April 2015 09:19AM

It seems possible that soon there may be a cure for colourblindness. The Mary's Room thought experiment attempts to pin down something about the nature of qualia in a contrived but similar situation, but my feeling is that the actual result of such an experiment would not be obvious. Would we consider the experiment valid if it was performed on somebody familiar with blue and green, but not red?

Book Review: Discrete Mathematics and Its Applications

15 LawrenceC 14 April 2015 09:08AM

Following in the path of So8res and others, I’ve decided to work my way through the textbooks on the MIRI Research Guide. I’ve been working my way through the guide since last October, but this is my first review. I plan on following up this review with reviews of Enderton’s A Mathematical Introduction to Logic and Sipser’s Introduction to the Theory of Computation. Hopefully these reviews will be of some use to you.

Discrete Mathematics and Its Applications

Discrete Mathematics and Its Applications is wonderful, gentle introduction to the math needed to understand most of the other books on the MIRI course list. It successfully pulls off a colloquial tone of voice. It spends a lot of time motivating concepts; it also contains a lot of interesting trivia and short biographies of famous mathematicians and computer scientists (which the textbook calls “links”). Additionally, the book provides a lot of examples for each of its theorems and topics. It also fleshes out the key subjects (counting, proofs, graphs, etc.) while also providing a high level overview of their applications. These combine to make it an excellent first textbook for learning discrete mathematics.

However, for much the same reasons, I would not recommend it nearly as much if you’ve taken a discrete math course. People who’ve participated in math competitions at the high school level probably won’t get much out of the textbooks either. Even though I went in with only the discrete math I did in high school, I still got quite frustrated at times because of how long the book would take to get to the point. Discrete Mathematics is intended to be quite introductory and it succeeds in this goal, but it probably won’t be very suitable as anything other than review for readers beyond the introductory level. The sole exception is the last chapter (on models of computation), but I recommend picking up a more comprehensive overview from Sipser’s Theory of Computation instead.

I still highly recommend it for those not familiar with the topics covered in the book. I’ve summarized the contents of the textbook below:


1.     The Foundations: Logic and Proofs

2.   Basic Structures: Sets, Functions, Sequences, Sums, and Matrices

3.     Algorithms

4.     Number Theory and Cryptography

5.     Induction and Recursion

6.     Counting

7.     Discrete Probability

8.     Advanced Counting Techniques

9.     Relations

10.  Graphs

11.  Trees

12.  Boolean Algebra

13.  Modeling Computation

The Foundations: Logic and Proofs

This chapter introduces propositional (sentential) logic, predicate logic, and proof theory at a very introductory level. It starts by introducing the propositions of propositional logic (!), then goes on to introduce applications of propositional logic, such as logic puzzles and logic circuits. It then goes on to introduce the idea of logical equivalence between sentences of propositional logic, before introducing quantifiers and predicate logic and its rules of inference. It then ends by talking about the different kinds of proofs one is likely to encounter – direct proofs via repeated modus ponens, proofs by contradiction, proof by cases, and constructive and non-constructive existence proofs.

This chapter illustrates exactly why this book is excellent as an introductory text. It doesn’t just introduce the terms, theorems, and definitions; it motivates them by giving applications. For example, it explains the need for predicate logic by pointing out that there are inferences that can’t be drawn using only propositional logic. Additionally, it also explains the common pitfalls for the different proof methods that it introduces.

Basic Structures: Sets, Functions, Sequences, Sums, and Matrices

This chapter introduces the different objects one is likely to encounter in discrete mathematics. Most of it seemed pretty standard, with the following exceptions: functions are introduced without reference to relations; the “cardinality of sets” section provides a high level overview of a lot of set theory; and the matrices section introduces zero-one matrices, which are used in the chapters on relations and graphs.


This chapter presents … surprise, surprise… algorithms! It starts by introducing the notion of algorithms, and gives a few examples of simple algorithms. It then spends a page introducing the halting problem and showing its undecidability. (!) Afterwards, it introduces big-o, big-omega, and big-theta notation and then gives a (very informal) treatment of a portion of computation complexity theory. It's quite unusual to see algorithms being dealt with so early into a discrete math course, but it's quite important because the author starts providing examples of algorithms in almost every chapter after this one.

Number Theory and Cryptography

This section goes from simple modular arithmetic (3 divides 12!) to RSA, which I found extremely impressive. (Admittedly, I’ve only ever read one other discrete math textbook.) After introducing the notion of divisibility, the textbook takes the reader on a rapid tour through base-n notation, the fundamental theorem of arithmetic, the infinitude of primes, the Euclidean GCD algorithm, Bezout’s theorem, the Chinese remainder theorem, Fermat’s little theorem, and other key results of number theory. It then gives several applications of number theory: hash functions, pseudorandom numbers, check digits, and cryptography. The last of these gets its own section, and the book spends a large amount of it introducing RSA and its applications.

Induction and Recursion

This chapter introduces mathematical induction and recursion, two extremely important concepts in computer science. Proofs by mathematical induction, basically, are proofs that show that a property is true of the first natural number (positive integer in this book), and if it is true of an integer k it is true of k+1. With these two results, we can conclude that the property is true of all natural numbers (positive integers). The book then goes on to introduce strong induction and recursively defined functions and sets. From this, the book then goes on to introduce the concept of structural induction, which is a generalization of induction to work on recursively-defined sets. Then, the book introduces recursive algorithms, most notably the merge sort, before giving a high level overview of program verification techniques.


The book now changes subjects to talk about basic counting techniques, such as the product rule and the sum rule, before (interestingly) moving on to the pigeonhole principle. It then moves on to permutations and combinations, while introducing the notion of combinatorial proof, which is when we show that two sides of the identity count the same things but in different ways, or that there exists a bijection between the sets being counted on either side. The textbook then introduces binomial coefficients, Pascal’s triangle, and permutations/combinations with repetition. Finally, it gives algorithms that generate all the permutations and combinations of a set of n objects.

Compared to other sections, I feel that a higher proportion of readers would be familiar with the results of this chapter and the one on discrete probability that follows it. Other than the last section, which I found quite interesting but not particularly useful, I felt like I barely got anything from the chapter.

Discrete Probability

In this section the book covers probability, a topic that most of LessWrong should be quite familiar with. Like most introductory textbooks, it begins by introducing the notion of sample spaces and events as sets, before defining probability of an event E as the ratio of the cardinality of E to the cardinality of S. We are then introduced to other key concepts in probability theory: conditional probabilities, independence, and random variables, for example. The textbook takes care to flesh out this section with a discussion about the Birthday Problem and Monte Carlo algorithms. Afterwards, we are treated to a section on Bayes theorem, with the canonical example of disease testing for rare diseases and a less-canonical-but-still-used-quite-a-lot example of Naïve Bayes spam filters. The chapter concludes by introducing the expected value and variances of random variables, as well as a lot of key results (linearity of expectations and Chebyshev’s Inequality, to list two). Again, aside from the applications, most of this stuff is quite basic.

Advanced Counting Techniques

This chapter, though titled “advanced counting techniques”, is really just about recurrences and the principle of inclusion-exclusion. As you can tell by the length of this section, I found this chapter quite helpful nevertheless.  

We begin by giving three applications of recurrences: Fibonacci’s “rabbit problem”, the Tower of Hanoi, and dynamic programming. We’re then shown how to solve linear homogenous relations, which are relations of the form

an = c1 an-1 + c2 an-2 + … + ck an-k+ F(n)

Where c1, c2, …, ck are constants, ck =/= 0, and F(n) is a function of n. The solutions are quite beautiful, and if you’re not familiar with them I recommend looking them up. Afterwards, we’re introduced to divide-and-conquer algorithms, which are recursive algorithms that solve smaller and smaller instances of the problem, as well as the master method for solving the recurrences associated with them, which tend to be of the form

f(n) = a f(n/b) + cnd

After these algorithms, we’re introduced to generating functions, which are yet another way of solving recurrences.

Finally, after a long trip through various recurrence-solving methods, the textbook introduces the principle of inclusion-exclusion, which lets us figure out how many elements are in the union of a finite number of finite sets.


Finally, 7 chapters after the textbook talks about functions, it finally gets to relations. Relations are defined as sets of n-tuples, but the book also gives alternative ways of representing relations, such as matrices and directed graphs for binary relations. We’re then introduced to transitive closures and Warshall’s algorithm for computing the transitive closure of a relation. We conclude with two special types of relations: equivalence relations, which are reflexive, symmetric, and transitive; and partial orderings, which are reflexive, anti-symmetric, and transitive.


After being first introduced to directed graphs as a way of representing relations in the previous chapter, we’re given a much more fleshed out treatment in this chapter. A graph is defined as a set of vertices and a set of edges connecting them. Edges can be directed or undirected, and graphs can be simple graphs (with no two edges connecting the same pair of vertices) or multigraphs, which contain multiple edges connecting the same pair of vertices. We’re then given a ton of terminology related to graphs, and a lot of theorems related to these terms. The treatment of graphs is quite advanced for an introductory textbook – it covers Dijkstra’s algorithm for shortest paths, for example, and ends with four coloring. I found this chapter to be a useful review of a lot of graph theory.


After dealing with graphs, we move on to trees, or connected graphs that don’t have cycles. The textbook gives a lot of examples of applications of trees, such as binary search trees, decision trees, and Huffman coding. We’re then presented with the three ways of traversing a tree – in-order, pre-order, and post-order. Afterwards, we get to the topic of spanning trees of graphs, which are trees that contain every vertex in the graph. Two algorithms are presented for finding spanning trees – depth first search and breadth first search. The chapter ends with a section on minimum spanning trees, which are spanning trees with the least weight. Once again we’re presented with two algorithms for finding minimum spanning trees: Prim’s Algorithm and Kruskal’s algorithm. Having never seen either of these algorithms before, I found this section to be quite interesting, though they are given a more comprehensive treatment in most introductory algorithms textbooks.  

Boolean Algebra

This section introduces Boolean algebra, which is basically a set of rules for manipulating elements of the set {0,1}. Why is this useful? Because, as it turns out, Boolean algebra is directly related to circuit design! The textbook first introduces the terminology and rules of Boolean algebra, and then moves on to circuits of logic gates and their relationship with Boolean functions. We conclude with two ways to minimize the complexity of Boolean functions (and thus circuits) – Karnaugh Maps and the Quine-McCluskey Method, which are both quite interesting. 

Modeling Computation

This is the chapter of Rosen that I’m pretty sure isn’t covered by most introductory textbooks. In many ways, it’s an extremely condensed version of the first couple chapters of a theory of computation textbook. It covers phase structure grammars, finite state machines, and closes with Turing machines. However, I found this chapter a lot more poorly motivated than the rest of the book, and also that Sipser’s Introduction to the Theory of Computation offers a lot better introduction to these topics.

Who should read this?

If you’re not familiar with discrete mathematics, this is a great book that will get you up to speed on the key concepts, at least to the level where you’ll be able to understand the other textbooks on MIRI’s course list. Of the three textbooks I’m familiar with that cover discrete mathematics, I think that Rosen is hands down the best. I also think it’s quite a “fun” textbook to skim through, even if you’re familiar with some of the topics already.

However, I think that people familiar with the topics probably should look for other books, especially if they are looking for textbooks that are more concise. It might also not be suitable if you’re already really motivated to learn the subject, and just want to jump right in. There are a few topics not normally covered in other discrete math textbooks, but I feel that it’s better to pick up those topics in other textbooks.

What should I read?

In general, the rule for the textbook is: read the sections you’re not familiar with, and skim the sections you are familiar with, just to keep an eye out for cool examples or theorems.

In terms of chapter-by-chapter, chapters 1 and 2 seem like they’ll help if you’re new to mathematics or proofs, but probably can be skipped otherwise. Chapter 3 is pretty good to know in general, though I suspect most people here would find it too easy. Chapters 4 through 12 are what most courses on discrete mathematics seem to cover, and form the bulk of the book – I would recommend skimming them once just to make sure you know them, as they’re also quite important for understanding any serious CS textbook. Chapter 13, on the other hand, seems kind of tacked on, and probably should be picked up in other textbooks.

Final Notes

Of all the books on the MIRI research guide, this is probably the most accessible, but it is by no means a bad book. I’d highly recommend it to anyone who hasn’t had any exposure to discrete mathematics, and I think it’s an important prerequisite for the rest of the books on the MIRI research guide.

How I changed my exercise habits

16 Normal_Anomaly 13 April 2015 10:19PM

In June 2013, I didn’t do any exercise beyond biking the 15 minutes to work and back. Now, I have a robust habit of hitting the gym every day, doing cardio and strength training. Here are the techniques I used to do get from not having the habit to having it, some of them common wisdom and some of them my own ideas. Consider this post a case study/anecdata in what worked for me. Note: I wrote these ideas down around August 2013 but didn’t post them, so my memory was fresh at the time of writing.

1. Have a specific goal. Ideally this goal should be reasonably achievable and something you can see progress toward over medium timescales. I initially started exercising because I wanted more upper body strength to be better at climbing. My goal is “become able to do at least one pull up, or more if possible”.

Why it works: if you have a specific goal instead of a vague feeling that you ought to do something or that it’s what a virtuous person would do, it’s harder to make excuses. Skipping work with an excuse will let you continue to think of yourself as virtuous, but it won’t help with your goal. For this to work, your goal needs to be something you actually want, rather than a stand-in for “I want to be virtuous.” If you can’t think of a consequence of your intended habit that you actually want, the habit may not be worth your time.

2. Have a no-excuses minimum. This is probably the best technique I’ve discovered. Every day, with no excuses, I went to the gym and did fifty pull-downs on one of the machines. After that’s done, I can do as much or as little else as I want. Some days I would do equivalent amounts of three other exercises, some days I would do an extra five reps and that’s it.

Why it works: this one has a host of benefits.

* It provides a sense of freedom: once I’m done with my minimum, I have a lot of choice about what and how much to do. That way it feels less like something I’m being forced into.

* If I’m feeling especially tired or feel like I deserve a day off, instead of skipping a day and breaking the habit I tell myself I’ll just do the minimum instead. Often once I get there I end up doing more than the minimum anyway, because the real thing I wanted to skip was the inconvenience of biking to the gym.

3. If you raise the minimum, do it slowly. I have sometimes raised the bar on what’s the minimum amount of exercise I have to do, but never to as much or more than I was already doing routinely. If you start suddenly forcing yourself to do more than you were already doing, the change will be much harder and less likely to stick than gradually ratcheting up your commitment.

3. Don’t fall into a guilt trap. Avoid associating guilt with doing the minimum, or even with missing a day.

Why it works: feeling guilty will make thinking of the habit unpleasant, and you’ll downplay how much you care about it to avoid the cognitive dissonance. Especially, if you only do the minimum, tell yourself “I did everything I committed to do.” Then when you do more than the minimum, feel good about it! You went above and beyond. This way, doing what you committed to will sometimes include positive reinforcement, but never negative reinforcement.

4. Use Timeless Decision Theory and consistency pressure. Credit for this one goes to this post by user zvi. When I contemplate skipping a day at the gym, I remember that I’ll be facing the same choice under nearly the same conditions many times in the future. If I skip my workout today, what reason do I have to believe that I won’t skip it tomorrow?

Why it works: Even when the benefits of one day’s worth of exercise don’t seem like enough motivation, I know my entire habit that I’ve worked to cultivate is at stake. I know that the more days I go to the gym the more I will see myself as a person who goes to the gym, and the more it will become my default action.

5. Evaluate your excuses. If I have what I think is a reasonable excuse, I consider how often I’ll skip the gym if I let myself skip it whenever I have that good of an excuse. If letting the excuse hold would make me use it often, I ignore it.

Why it works: I based this technique on this LW post

6. Tell people about it. The first thing I did when I made my resolution to start hitting the gym was telling a friend whose opinion I cared about. I also made a comment on LW saying I would make a post about my attempt at forming a habit, whether it succeeded or failed. (I wrote the post and forgot to post it for over a year, but so it goes.)

Why it works: Telling people about your commitment invests your reputation in it. If you risk being embarrassed if you fail, you have an extra motivation to succeed.

I expect these techniques can be generalized to work for many desirable habits: eating healthy, spending time on social interaction; writing, coding, or working on a long-term project; being outside getting fresh air, etc.

Are there really no ghosts in the machine?

0 kingmaker 13 April 2015 07:54PM

My previous article on this article went down like a server running on PHP (quite deservedly I might add). You can all rest assured that I won't be attempting any clickbait titles again for the foreseeable future. I also believe that the whole H+ article is written in a very poor and aggressive manner, but that some of the arguments raised cannot be ignored.


On my original article, many people raised this post by Eliezer Yudkowsky as a counterargument to the idea that an FAI could have goals contrary to what we programmed. In summary, he argues that a program doesn't necessarily do as the programmer wishes, but rather as they have programmed. In this sense, there is no ghost in the machine that interprets your commands and acts accordingly, it can act only as you have designed. Therefore from this, he argues, an FAI can only act as we had programmed.


I personally think this argument completely ignores what has made AI research so successful in recent years: machine learning. We are no longer designing an AI from scratch and then implementing it; we are creating a seed program which learns from the situation and alters its own code with no human intervention, i.e. the machines are starting to write themselves, e.g. with google's deepmind. They are effectively evolving, and we are starting to find ourselves in the rather concerning position where we do not fully understand our own creations.


You could simply say, as someone said in the comments of my previous post, that if X represents the goal of having a positive effect on humanity, then the FAI should be programmed directly to have X as its primary directive. My answer to that is the most promising developments have been through imitating the human brain, and we have no reason to believe that the human brain (or any other brain for that matter) can be guaranteed to have a primary directive. One could argue that evolution has given us our prime directives: to ensure our own continued existence, to reproduce and to cooperate with each other; but there are many people who are suicidal, who have no interest in reproducing and who violently rebel against society (for example psychopaths). We are instructed by society and our programming to desire X, but far too many of us desire, say, Y for this to be considered a reliable way of achieving X.

Evolution’s direction has not ensured that we do “what we are supposed to do”, we could well face similar disobedience from our own creation. Seeing as the most effective way we have seen of developing AI is creating them in our image; as there are ghosts in us, there could well be ghosts in the machine.

View more: Next