Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Leaving LessWrong for a more rational life

29 Mark_Friedenbach 21 May 2015 07:24PM

You are unlikely to see me posting here again, after today. There is a saying here that politics is the mind-killer. My heretical realization lately is that philosophy, as generally practiced, can also be mind-killing.

As many of you know I am, or was running a twice-monthly Rationality: AI to Zombies reading group. One of the bits I desired to include in each reading group post was a collection of contrasting views. To research such views I've found myself listening during my commute to talks given by other thinkers in the field, e.g. Nick Bostrom, Anders Sandberg, and Ray Kurzweil, and people I feel are doing “ideologically aligned” work, like Aubrey de Grey, Christine Peterson, and Robert Freitas. Some of these were talks I had seen before, or generally views I had been exposed to in the past. But looking through the lens of learning and applying rationality, I came to a surprising (to me) conclusion: it was philosophical thinkers that demonstrated the largest and most costly mistakes. On the other hand, de Grey and others who are primarily working on the scientific and/or engineering challenges of singularity and transhumanist technologies were far less likely to subject themselves to epistematic mistakes of significant consequences.

Philosophy as the anti-science...

What sort of mistakes? Most often reasoning by analogy. To cite a specific example, one of the core underlying assumption of singularity interpretation of super-intelligence is that just as a chimpanzee would be unable to predict what a human intelligence would do or how we would make decisions (aside: how would we know? Were any chimps consulted?), we would be equally inept in the face of a super-intelligence. This argument is, however, nonsense. The human capacity for abstract reasoning over mathematical models is in principle a fully general intelligent behaviour, as the scientific revolution has shown: there is no aspect of the natural world which has remained beyond the reach of human understanding, once a sufficient amount of evidence is available. The wave-particle duality of quantum physics, or the 11-dimensional space of string theory may defy human intuition, i.e. our built-in intelligence. But we have proven ourselves perfectly capable of understanding the logical implications of models which employ them. We may not be able to build intuition for how a super-intelligence thinks. Maybe—that's not proven either. But even if that is so, we will be able to reason about its intelligent behaviour in advance, just like string theorists are able to reason about 11-dimensional space-time without using their evolutionarily derived intuitions at all.

This post is not about the singularity nature of super-intelligence—that was merely my choice of an illustrative example of a category of mistakes that are too often made by those with a philosophical background rather than the empirical sciences: the reasoning by analogy instead of the building and analyzing of predictive models. The fundamental mistake here is that reasoning by analogy is not in itself a sufficient explanation for a natural phenomenon, because it says nothing about the context sensitivity or insensitivity of the original example and under what conditions it may or may not hold true in a different situation.

A successful physicist or biologist or computer engineer would have approached the problem differently. A core part of being successful in these areas is knowing when it is that you have insufficient information to draw conclusions. If you don't know what you don't know, then you can't know when you might be wrong. To be an effective rationalist, it is often not important to answer “what is the calculated probability of that outcome?” The better first question is “what is the uncertainty in my calculated probability of that outcome?” If the uncertainty is too high, then the data supports no conclusions. And the way you reduce uncertainty is that you build models for the domain in question and empirically test them.

The lens that sees its own flaws...

Coming back to LessWrong and the sequences. In the preface to Rationality, Eliezer Yudkowsky says his biggest regret is that he did not make the material in the sequences more practical. The problem is in fact deeper than that. The art of rationality is the art of truth seeking, and empiricism is part and parcel essential to truth seeking. There's lip service done to empiricism throughout, but in all the “applied” sequences relating to quantum physics and artificial intelligence it appears to be forgotten. We get instead definitive conclusions drawn from thought experiments only. It is perhaps not surprising that these sequences seem the most controversial.

I have for a long time been concerned that those sequences in particular promote some ungrounded conclusions. I had thought that while annoying this was perhaps a one-off mistake that was fixable. Recently I have realized that the underlying cause runs much deeper: what is taught by the sequences is a form of flawed truth-seeking (thought experiments favored over real world experiments) which inevitably results in errors, and the errors I take issue with in the sequences are merely examples of this phenomenon.

And these errors have consequences. Every single day, 100,000 people die of preventable causes, and every day we continue to risk extinction of the human race at unacceptably high odds. There is work that could be done now to alleviate both of these issues. But within the LessWrong community there is actually outright hostility to work that has a reasonable chance of alleviating suffering (e.g. artificial general intelligence applied to molecular manufacturing and life-science research) due to concerns arrived at by flawed reasoning.

I now regard the sequences as a memetic hazard, one which may at the end of the day be doing more harm than good. One should work to develop one's own rationality, but I now fear that the approach taken by the LessWrong community as a continuation of the sequences may result in more harm than good. The anti-humanitarian behaviors I observe in this community are not the result of initial conditions but the process itself.

What next?

How do we fix this? I don't know. On a personal level, I am no longer sure engagement with such a community is a net benefit. I expect this to be my last post to LessWrong. It may happen that I check back in from time to time, but for the most part I intend to try not to. I wish you all the best.

A note about effective altruism…

One shining light of goodness in this community is the focus on effective altruism—doing the most good to the most people as measured by some objective means. This is a noble goal, and the correct goal for a rationalist who wants to contribute to charity. Unfortunately it too has been poisoned by incorrect modes of thought.

Existential risk reduction, the argument goes, trumps all forms of charitable work because reducing the chance of extinction by even a small amount has far more expected utility than would accomplishing all other charitable works combined. The problem lies in the likelihood of extinction, and the actions selected in reducing existential risk. There is so much uncertainty regarding what we know, and so much uncertainty regarding what we don't know that it is impossible to determine with any accuracy the expected risk of, say, unfriendly artificial intelligence creating perpetual suboptimal outcomes, or what effect charitable work in the area (e.g. MIRI) is have to reduce that risk, if any.

This is best explored by an example of existential risk done right. Asteroid and cometary impacts is perhaps the category of external (not-human-caused) existential risk which we know the most about, and have done the most to mitigate. When it was recognized that impactors were a risk to be taken seriously, we recognized what we did not know about the phenomenon: what were the orbits and masses of Earth-crossing asteroids? We built telescopes to find out. What is the material composition of these objects? We built space probes and collected meteorite samples to find out. How damaging an impact would there be for various material properties, speeds, and incidence angles? We built high-speed projectile test ranges to find out. What could be done to change the course of an asteroid found to be on collision course? We have executed at least one impact probe and will monitor the effect that had on the comet's orbit, and have on the drawing board probes that will use gravitational mechanisms to move their target. In short, we identified what it is that we don't know and sought to resolve those uncertainties.

How then might one approach an existential risk like unfriendly artificial intelligence? By identifying what it is we don't know about the phenomenon, and seeking to experimentally resolve that uncertainty. What relevant facts do we not know about (unfriendly) artificial intelligence? Well, much of our uncertainty about the actions of an unfriendly AI could be resolved if we were to know more about how such agents construct their thought models, and relatedly what language were used to construct their goal systems. We could also stand to benefit from knowing more practical information (experimental data) about in what ways AI boxing works and in what ways it does not, and how much that is dependent on the structure of the AI itself. Thankfully there is an institution that is doing that kind of work: the Future of Life institute (not MIRI).

Where should I send my charitable donations?

Aubrey de Grey's SENS Research Foundation.

100% of my charitable donations are going to SENS. Why they do not get more play in the effective altruism community is beyond me.

If you feel you want to spread your money around, here are some non-profits which have I have vetted for doing reliable, evidence-based work on singularity technologies and existential risk:

  • Robert Freitas and Ralph Merkle's Institute for Molecular Manufacturing does research on molecular nanotechnology. They are the only group that work on the long-term Drexlarian vision of molecular machines, and publish their research online.
  • Future of Life Institute is the only existential-risk AI organization which is actually doing meaningful evidence-based research into artificial intelligence.
  • B612 Foundation is a non-profit seeking to launch a spacecraft with the capability to detect, to the extent possible, ALL Earth-crossing asteroids.

I wish I could recommend a skepticism, empiricism, and rationality promoting institute. Unfortunately I am not aware of an organization which does not suffer from the flaws I identified above.

Addendum regarding unfinished business

I will no longer be running the Rationality: From AI to Zombies reading group as I am no longer in good conscience able or willing to host it, or participate in this site, even from my typically contrarian point of view. Nevertheless, I am enough of a libertarian that I feel it is not my role to put up roadblocks to others who wish to delve into the material as it is presented. So if someone wants to take over the role of organizing these reading groups, I would be happy to hand over the reigns to that person. If you think that person should be you, please leave a reply in another thread, not here.

EDIT: Obviously I'll stick around long enough to answer questions below :)

Brainstorming new senses

27 lululu 20 May 2015 07:53PM

What new senses would you like to have available to you?

Often when new technology first becomes widely available, the initial limits are in the collective imagination, not in the technology itself (case in point: the internet). New sensory channels have a huge potential because the brain can process senses much faster and more intuitively than most conscious thought processes.

There are a lot of recent "proof of concept" inventions that show that it is possible to create new sensory channels for humans with and without surgery. The most well known and simple example is an implanted magnet, which would alert you to magnetic fields (the trade-off being that you could never have an MRI). Cochlear implants are the most widely used human-created sensory channels (they send electrical signals directly to the nervous system, bypassing the ear entirely), but CIs are designed to emulate a sensory channel most people already have brain space allocated to. VEST is another example. Similar to CIs, VEST (versatile extra-sensory transducer) has 24 information channels, and uses audio compression to encode sound. Unlike CIs, they are not implanted in the skull but instead information is relayed through vibrating motors on the torso. After a few hours of training, deaf volunteers are capable of word recognition using the vibrations alone, and to do so without conscious processing. Much like hearing, the users are unable to describe exactly what components make a spoken word intelligible, they just understand the sensory information intuitively. Another recent invention being tested (with success) is BrainPort glasses, which send electrical signals through the tongue (which is one of the most sensitive organs on the body). Blind people can begin processing visual information with this device within 15 minutes, and it is unique in that it is not implanted. The sensory information feels like pop rocks at first before the brain is able to resolve it into sight. Niel Harbisson (who is colorblind) has custom glasses which use sound tones to relay color information. Belts that vibrate when facing north give people an sense of north. Bottlenose can be built at home and gives a very primitive sense of echolocation. As expected, these all work better if people start young as children. 

What are the craziest and coolest new senses you would like to see available using this new technology? I think VEST at least is available from Kickstarter and one of the inventors suggested that it could be that it could be programmed to transmit any kind of data. My initial ideas which I heard about this possibility are just are senses that some unusual people already have or expansions on current senses. I think the real game changers are going to be totally knew senses unrelated to our current sensory processing. Translating data into sensory information gives us access to intuition and processing speed otherwise unavailable. 

My initial weak ideas:

  • mass spectrometer (uses reflected lasers to determine the exact atomic makeup of anything and everything)
  • proximity meter (but I think you would begin to feel like you had a physical aura or field of influence)
  • WIFI or cell signal
  • perfect pitch and perfect north, both super easy and only need one channel of information (an smartwatch app?)
  • infrared or echolocation
  • GPS (this would involve some serious problem solving to figure out what data we should encode given limited channels, I think it could be done with 4 or 8 channels each associated with a cardinal direction)

Someone working with VEST suggested:

  • compress global twitter sentiments into 24 channels. Will you begin to have an intuitive sense of global events?
  • encode stockmarket data. Will you become an intuitive super-investor?
  • encode local weather data (a much more advanced version of "I can feel it's going to rain in my bad knee)

Some resources for more information:




Log-normal Lamentations

12 Thrasymachus 19 May 2015 09:12PM

[Morose. Also very roughly drafted.]

Normally, things are distributed normally. Human talents may turn out to be one of these things. Some people are lucky enough to find themselves on the right side of these distributions – smarter than average, better at school, more conscientious, whatever. To them go many spoils – probably more so now than at any time before, thanks to the information economy.

There’s a common story told about a hotshot student at school whose ego crashes to earth when they go to university and find themselves among a group all as special as they thought they were. The reality might be worse: many of the groups the smart or studious segregate into (physics professors, Harvard undergraduates, doctors) have threshold (or near threshold)-like effects: only those with straight A’s, only those with IQs > X, etc. need apply. This introduces a positive skew to the population: most (and the median) are below the average, brought up by a long tail of the (even more) exceptional. Instead of comforting ourselves at looking at the entire population to which we compare favorably, most of us will look around our peer group and find ourselves in the middle, and having to look a long way up to the best. 1


Yet part of growing up is recognizing there will inevitably be people better than you are – the more able may be able to buy their egos time, but no more. But that needn’t be so bad: in several fields (such as medicine) it can be genuinely hard to judge ‘betterness’, and so harder to find exemplars to illuminate your relative mediocrity. Often there are a variety of dimensions to being ‘better’ at something: although I don’t need to try too hard to find doctors who are better at some aspect of medicine than I (more knowledgeable, kinder, more skilled in communication etc.) it is mercifully rare to find doctors who are better than me in all respects. And often the tails are thin: if you’re around 1 standard deviation above the mean, people many times further from the average than you are will still be extraordinarily rare, even if you had a good stick to compare them to yourself.

Look at our thick-tailed works, ye average, and despair! 2

One nice thing about the EA community is that they tend to be an exceptionally able bunch: I remember being in an ‘intern house’ that housed the guy who came top in philosophy at Cambridge, the guy who came top in philosophy at Yale, and the guy who came top in philosophy at Princeton – and although that isn’t a standard sample, we seem to be drawn disproportionately not only from those who went to elite universities, but those who did extremely well at elite universities. 3 This sets the bar very high.

Many of the ‘high impact’ activities these high achieving people go into (or aspire to go into) are more extreme than normal(ly distributed): log-normal commonly, but it may often be Pareto. The distribution of income or outcomes from entrepreneurial ventures (and therefore upper-bounds on what can be ‘earned to give’), the distribution of papers or citations in academia, the impact of direct projects, and (more tenuously) degree of connectivity or importance in social networks or movements would all be examples: a few superstars and ‘big winners’, but orders of magnitude smaller returns for the rest.

Insofar as I have ‘EA career path’, mine is earning to give: if I were trying to feel good about the good I was doing, my first port of call would be my donations. In sum, I’ve given quite a lot to charity – ~£15,000 and counting – which I’m proud of. Yet I’m no banker (or algo-trader) – those who are really good (or lucky, or both) can end up out of university with higher starting salaries than my peak expected salary, and so can give away more than ten times more than I will be able to. I know several of these people, and the running tally of each of their donations is often around ten times my own. If they or others become even more successful in finance, or very rich starting a company, there might be several more orders of magnitude between their giving and mine. My contributions may be little more than a rounding error to their work.

A shattered visage

Earning to give is kinder to the relatively minor players than other ‘fields’ of EA activity, as even though Bob’s or Ellie’s donations are far larger, they do not overdetermine my own: that their donations dewormed 1000x children does not make the 1x I dewormed any less valuable. It is unclear whether this applies to other ‘fields': Suppose I became a researcher working on a malaria vaccine, but this vaccine is discovered by Sally the super scientist and her research group across the world. Suppose also that Sally’s discovery was independent of my own work. Although it might have been ex ante extremely valuable for me to work on malaria, its value is vitiated when Sally makes her breakthrough, in the same way a lottery ticket loses value after the draw.

So there are a few ways an Effective Altruist mindset can depress our egos:

  1. It is generally a very able and high achieving group of people, setting the ‘average’ pretty high.
  2. ‘Effective Altruist’ fields tend to be heavy-tailed, so that being merely ‘average’ (for EAs!) in something like earning to give mean having a much smaller impact when compared to one of the (relatively common) superstars.
  3. (Our keenness for quantification makes us particularly inclined towards and able to make these sorts of comparative judgements, ditto the penchant for taking things to be commensurate).
  4. Many of these fields have ‘lottery-like’ characteristics where ex ante and ex post value diverge greatly. ‘Taking a shot’ at being an academic or entrepreneur or politician or leading journalist may be a good bet ex ante for an EA because the upside is so high even if their chances of success remain low (albeit better than the standard reference class). But if the median outcome is failure, the majority who will fail might find the fact it was a good idea ex ante of scant consolation – rewards (and most of the world generally) run ex post facto.

What remains besides

I haven’t found a ready ‘solution’ for these problems, and I’d guess there isn’t one to be found. We should be sceptical of ideological panaceas that can do no wrong and everything right, and EA is no exception: we should expect it to have some costs, and perhaps this is one of them. If so, better to accept it rather than defend the implausibly defensible.

In the same way I could console myself, on confronting a generally better doctor: “Sure, they are better at A, and B, and C, … and Y, but I’m better at Z!”, one could do the same with regards to the axes one’s ‘EA work’. “Sure, Ellie the entrepreneur has given hundreds of times more money to charity, but what’s she like at self-flagellating blog posts, huh?” There’s an incentive to diversify as (combinatorically) it will be less frequent to find someone who strictly dominates you, and although we want to compare across diverse fields, doing so remains difficult. Pablo Stafforini has mentioned elsewhere whether EAs should be ‘specialising’ more instead of spreading their energies over disparate fields: perhaps this makes that less surprising. 4

Insofar as people’s self-esteem is tied up with their work as EAs (and, hey, shouldn’t it be, in part?) There perhaps is a balance to be struck between soberly and frankly discussing the outcomes and merits of our actions, and being gentle to avoid hurting our peers by talking down their work. Yes, we would all want to know if what we were doing was near useless (or even net negative), but this should be broken with care. 5

‘Suck it up’ may be the best strategy. These problems become more acute the more we care about our ‘status’ in the EA community; the pleasure we derive from not only doing good, but doing more good than our peers; and our desire to be seen as successful. Good though it is for these desires to be sublimated to better ends (far preferable all else equal that rivals choose charitable donations rather than Veblen goods to be the arena of their competition), it would be even better to guard against these desires in the first place. Primarily, worry about how to do the most good. 6


  1. As further bad news, there may be progression of ‘tiers’ which are progressively more selective, somewhat akin to stacked band-pass filters: even if you were the best maths student at your school, then the best at university, you may still find yourself plonked around median in a positive-skewed population of maths professors – and if you were an exceptional maths professor, you might find yourself plonked around median in the population of fields medalists. And so on (especially – see infra – if the underlying distribution is something scale-free).
  2. I wonder how much this post is a monument to the grasping vaingloriousness of my character…
  3. Pace: academic performance is not the only (nor the best) measure of ability. But it is a measure, and a fairly germane one for the fairly young population ‘in’ EA.
  4. Although there are other more benign possibilities, given diminishing marginal returns and the lack of people available. As a further aside, I’m wary of arguments/discussions that note bias or self-serving explanations that lie parallel to an opposing point of view (“We should expect people to be more opposed to my controversial idea than they should be due to status quo and social desirability biases”, etc.) First because there are generally so many candidate biases available they end up pointing in most directions; second because it is unclear whether knowing about or noting biases makes one less biased; and third because generally more progress can be made on object level disagreement than on trying to evaluate the strength and relevance of particular biases.
  5. Another thing I am wary of is Crocker’s rules: the idea that you unilaterally declare: ‘don’t worry about being polite with me, just tell it to me straight! I won’t be offended’. Naturally, one should try and separate one’s sense of offense from whatever information was there – it would be a shame to reject a correct diagnosis of our problems because of how it was said. Yet that is very different from trying to eschew this ‘social formatting’ altogether: people (myself included) generally find it easier to respond well when people are polite, and I suspect this even applies to those eager to make Crocker’s Rules-esque declarations. We might (especially if we’re involved in the ‘rationality’ movement) want to overcome petty irrationalities like incorrectly updating on feedback because of an affront to our status or self esteem. Yet although petty, they are surprisingly difficult to budge (if I cloned you 1000 times and ‘told it straight’ to half, yet made an effort to be polite with the other half, do you think one group would update better?) and part of acknowledging our biases should be an acknowledgement that it is sometimes better to placate them rather than overcome them.
  6. cf. Max Ehrmann put it well:

    … If you compare yourself with others, you may become vain or bitter, for always there will be greater and lesser persons than yourself.

    Enjoy your achievements as well as your plans. Keep interested in your own career, however humble…

Subsuming Purpose, Part II: Solving the Solution

5 OrphanWilde 14 May 2015 07:25PM

Summary: It's easy to get caught up in solving the wrong problems, solving the problems with a particular solution instead of solving the actual problem.  You should pay very careful attention to what you are doing and why.

I'll relate a seemingly purposeless story about a video game to illustrate:

I was playing Romance of the Three Kingdoms some years ago, and was trying to build the perfect city.  (The one city I ruled, actually.)  Enemies kept attacking, and the need to recruit troops was slowing my population growth (not to mention deliberate sabotage by my enemies), so eventually I came to the conclusion that I would have to conquer the map in order to finish the job.  So I conquered the map.  And then the game ending was shown, after which, finally, I could return to improving cities.

The game ending, however, startled me out of continuing to play: My now emperor was asked by his people to improve the condition of things (as things were apparently terrible), and his response was that he needed to conquer the rest of Asia first, to ensure their security.

My initial response was outrage at how the game portrayed events, but I couldn't find a fault in "his" response; it was exactly what I had been doing.  Given the rest of Asia, indeed the rest of the world, that would be exactly what I would have done had the game continued past that point, given that threats to the peace I had established still existed.  I had already conquered enemies who had never offered me direct threat, on the supposition that they would, and the fact that they held tactically advantageous positions.

It was an excellent game which managed to point out that I have failed in my original purpose in playing the game.  My purpose was subsumed by itself, or more particularly, a subgoal.  I didn't set out to conquer the map.  I lost the game.  I achieved the game's victory conditions, yes, but failed my own.  The ending, the exact description of exactly how I had failed and how my reasoning led to a conclusion I would have dismissed as absurd when I began, was so memorable it still sticks in my mind, years later.

My original purpose was subsumed.  By what, exactly, however?

By the realities of the game I was playing, I could say, if I were to rationalize my behavior; I wanted to improve all the cities I owned, but at no point until I had conquered the entire map could I afford to.  At each point in the game, there was always one city that couldn't be reliably improved.  The AI didn't share my goals; responding to force with force, to sabotage with sabotage, offered no penalties to the AI or its purposes, only to mine.  But nevertheless, I had still abandoned my original goals.  The realities of the game didn't subsume my purpose, which was still achievable within its constraints.

The specific reasons my means subsumed my ends may be illustrative: I inappropriately generalized.  I reasoned as if my territory were an atomic unit.  The risks incurred at my borders were treated as being incurred across the whole of my territory.  I devoted my resources - in particular my time - into solving a problem which afflicted an ever-decreasing percentage of that territory.  But even realizing that I was incorrectly generalizing wouldn't have stopped me; I'd have reasoned that the edge cities would still be under the same threat, and I couldn't actually finish my task until I finished my current task first.

Maybe, once my imaginary video game emperor had finally finished conquering the world, he'd have finally turned to the task of improving things.  Personally, I imagine he tripped and died falling down a flight of stairs shortly after conquering imaginary-China, and all of his work was undone in the chaos that ensued, because it seems the more poetic end to me.

A game taught me a major flaw in my goal-oriented reasoning.

I don't know the name for this error, if it has a name; internally, I call it incidental problem fixation, getting caught up in solving the sub-problems that arise in trying to solve the original problem.  Since playing, I've been very careful, each time a new challenge comes up in the course of solving an overall issue, to re-evaluate my priorities, and to consider alternatives to my chosen strategy.  I still have something of an issue with this; I can't count the number of times I've spent a full workday on a "correct" solution to a technical issue (say, a misbehaving security library) that should have taken an hour.  But when I notice that I'm doing this, I'll step away, and stop working on the "correct" solution, and return to solving the problem I'm actually trying to solve, instead of getting caught up in all the incidental problems that arose in the attempt to implement the original solution.

ETA: Link to part 1: http://lesswrong.com/lw/e12/subsuming_purpose_part_1/

"Risk" means surprise

4 PhilGoetz 22 May 2015 04:47AM

I lost about $20,000 in 2013 because I didn't notice that a company managing some of my retirement funds had helpfully reallocated them from 100% stocks into bonds and real estate, to "avoid risk". My parents are retired, and everyone advising them tells them to put most of their money in "safe" investments like bonds.

continue reading »

LW survey: Effective Altruists and donations

17 gwern 14 May 2015 12:44AM

(Markdown source)

“Portrait of EAs I know”, su3su2u1:

But I note from googling for surveys that the median charitable donation for an EA in the Less Wrong survey was 0.


Two years ago I got a paying residency, and since then I’ve been donating 10% of my salary, which works out to about $5,000 a year. In two years I’ll graduate residency, start making doctor money, and then I hope to be able to donate maybe eventually as much as $25,000 - $50,000 per year. But if you’d caught me five years ago, I would have been one of those people who wrote a lot about it and was very excited about it but put down $0 in donations on the survey.

Data preparation:

survey2013 <- read.csv("http://www.gwern.net/docs/lwsurvey/2013.csv", header=TRUE)
survey2013$EffectiveAltruism2 <- NA
s2013 <- subset(survey2013, select=c(Charity,Effective.Altruism,EffectiveAltruism2,Work.Status,
colnames(s2013) <- c("Charity","EffectiveAltruism","EffectiveAltruism2","WorkStatus","Profession",
s2013$Year <- 2013
survey2014 <- read.csv("http://www.gwern.net/docs/lwsurvey/2014.csv", header=TRUE)
s2014 <- subset(survey2014, PreviousSurveys!="Yes", select=c(Charity,EffectiveAltruism,EffectiveAltruism2,
s2014$Year <- 2014
survey <- rbind(s2013, s2014)
# replace empty fields with NAs:
survey[survey==""] <- NA; survey[survey==" "] <- NA
# convert money amounts from string to number:
survey$Charity <- as.numeric(as.character(survey$Charity))
survey$Income <- as.numeric(as.character(survey$Income))
# both Charity & Income are skewed, like most monetary amounts, so log transform as well:
survey$CharityLog <- log1p(survey$Charity)
survey$IncomeLog <- log1p(survey$Income)
# age:
survey$Age <- as.integer(as.character(survey$Age))
# prodigy or no, I disbelieve any LW readers are <10yo (bad data? malicious responses?):
survey$Age <- ifelse(survey$Age >= 10, survey$Age, NA)
# convert Yes/No to boolean TRUE/FALSE:
survey$EffectiveAltruism <- (survey$EffectiveAltruism == "Yes")
survey$EffectiveAltruism2 <- (survey$EffectiveAltruism2 == "Yes")
## Charity EffectiveAltruism EffectiveAltruism2 WorkStatus
## Min. : 0.000 Mode :logical Mode :logical Student :905
## 1st Qu.: 0.000 FALSE:1202 FALSE:450 For-profit work :736
## Median : 50.000 TRUE :564 TRUE :45 Self-employed :154
## Mean : 1070.931 NA's :487 NA's :1758 Unemployed :149
## 3rd Qu.: 400.000 Academics (on the teaching side):104
## Max. :110000.000 (Other) :179
## NA's :654 NA's : 26
## Profession Degree Age
## Computers (practical: IT programming etc.) :478 Bachelor's :774 Min. :13.00000
## Other :222 High school:597 1st Qu.:21.00000
## Computers (practical: IT, programming, etc.):201 Master's :419 Median :25.00000
## Mathematics :185 None :125 Mean :27.32494
## Engineering :170 Ph D. :125 3rd Qu.:31.00000
## (Other) :947 (Other) :189 Max. :72.00000
## NA's : 50 NA's : 24 NA's :28
## Income Year CharityLog IncomeLog
## Min. : 0.00 2013:1547 Min. : 0.000000 Min. : 0.000000
## 1st Qu.: 10000.00 2014: 706 1st Qu.: 0.000000 1st Qu.: 9.210440
## Median : 33000.00 Median : 3.931826 Median :10.404293
## Mean : 75355.69 Mean : 3.591102 Mean : 9.196442
## 3rd Qu.: 80000.00 3rd Qu.: 5.993961 3rd Qu.:11.289794
## Max. :10000000.00 Max. :11.608245 Max. :16.118096
## NA's :993 NA's :654 NA's :993
# lavaan doesn't like categorical variables and doesn't automatically expand out into dummies like lm/glm,
# so have to create the dummies myself:
survey$Degree <- gsub("2","two",survey$Degree)
survey$Degree <- gsub("'","",survey$Degree)
survey$Degree <- gsub("/","",survey$Degree)
survey$WorkStatus <- gsub("-","", gsub("\\(","",gsub("\\)","",survey$WorkStatus)))
survey <- cbind(survey, mtabulate(strsplit(gsub(" ", "", as.character(survey$Degree)), ",")),
mtabulate(strsplit(gsub(" ", "", as.character(survey$WorkStatus)), ",")))
write.csv(survey, file="2013-2014-lw-ea.csv", row.names=FALSE)


survey <- read.csv("http://www.gwern.net/docs/lwsurvey/2013-2014-lw-ea.csv")
# treat year as factor for fixed effect:
survey$Year <- as.factor(survey$Year)
median(survey[survey$EffectiveAltruism,]$Charity, na.rm=TRUE)
## [1] 100
median(survey[!survey$EffectiveAltruism,]$Charity, na.rm=TRUE)
## [1] 42.5
# t-tests are inappropriate due to non-normal distribution of donations:
wilcox.test(Charity ~ EffectiveAltruism, conf.int=TRUE, data=survey)
## Wilcoxon rank sum test with continuity correction
## data: Charity by EffectiveAltruism
## W = 214215, p-value = 4.811186e-08
## alternative hypothesis: true location shift is not equal to 0
## 95% confidence interval:
## -4.999992987e+01 -1.275881408e-05
## sample estimates:
## difference in location
## -19.99996543
qplot(Age, CharityLog, color=EffectiveAltruism, data=survey) + geom_point(size=I(3))
## https://i.imgur.com/wd5blg8.png
qplot(Age, CharityLog, color=EffectiveAltruism,
data=na.omit(subset(survey, select=c(Age, CharityLog, EffectiveAltruism)))) +
 geom_point(size=I(3)) + stat_smooth()
## https://i.imgur.com/UGqf8wn.png
# you might think that we can't treat Age linearly because this looks like a quadratic or
# logarithm, but when I fitted some curves, charity donations did not seem to flatten out
# appropriately, and the GAM/loess wiggly-but-increasing line seems like a better summary.
# Try looking at the asymptotes & quadratics split by group as follows:
## n1 <- nls(CharityLog ~ SSasymp(as.integer(Age), Asym, r0, lrc),
## data=survey[survey$EffectiveAltruism,], start=list(Asym=6.88, r0=-4, lrc=-3))
## n2 <- nls(CharityLog ~ SSasymp(as.integer(Age), Asym, r0, lrc),
## data=survey[!survey$EffectiveAltruism,], start=list(Asym=6.88, r0=-4, lrc=-3))
## with(survey, plot(Age, CharityLog))
## points(predict(n1, newdata=data.frame(Age=0:70)), col="blue")
## points(predict(n2, newdata=data.frame(Age=0:70)), col="red")
## l1 <- lm(CharityLog ~ Age + I(Age^2), data=survey[survey$EffectiveAltruism,])
## l2 <- lm(CharityLog ~ Age + I(Age^2), data=survey[!survey$EffectiveAltruism,])
## with(survey, plot(Age, CharityLog));
## points(predict(l1, newdata=data.frame(Age=0:70)), col="blue")
## points(predict(l2, newdata=data.frame(Age=0:70)), col="red")
# So I will treat Age as a linear additive sort of thing.

2013-2014 LW survey respondents: self-reported charity donation vs self-reported age, split by self-identifying as EA or not Likewise, but with GAM-smoothed curves for EA vs non-EA

# for the regression, we want to combine EffectiveAltruism/EffectiveAltruism2 into a single measure, EA, so
# a latent variable in a SEM; then we use EA plus the other covariates to estimate the CharityLog.
model1 <- " # estimate EA latent variable:
 EA =~ EffectiveAltruism + EffectiveAltruism2
 CharityLog ~ EA + Age + IncomeLog + Year +
 # Degree dummies:
 None + Highschool + twoyeardegree + Bachelors + Masters + Other +
 MDJDotherprofessionaldegree + PhD. +
 # WorkStatus dummies:
 Independentlywealthy + Governmentwork + Forprofitwork +
 Selfemployed + Nonprofitwork + Academicsontheteachingside +
 Student + Homemaker + Unemployed
fit1 <- sem(model = model1, missing="fiml", data = survey); summary(fit1)
## lavaan (0.5-16) converged normally after 197 iterations
## Number of observations 2253
## Number of missing patterns 22
## Estimator ML
## Minimum Function Test Statistic 90.659
## Degrees of freedom 40
## P-value (Chi-square) 0.000
## Parameter estimates:
## Information Observed
## Standard Errors Standard
## Estimate Std.err Z-value P(>|z|)
## Latent variables:
## EA =~
## EffectvAltrsm 1.000
## EffctvAltrsm2 0.355 0.123 2.878 0.004
## Regressions:
## CharityLog ~
## EA 1.807 0.621 2.910 0.004
## Age 0.085 0.009 9.527 0.000
## IncomeLog 0.241 0.023 10.468 0.000
## Year 0.319 0.157 2.024 0.043
## None -1.688 2.079 -0.812 0.417
## Highschool -1.923 2.059 -0.934 0.350
## twoyeardegree -1.686 2.081 -0.810 0.418
## Bachelors -1.784 2.050 -0.870 0.384
## Masters -2.007 2.060 -0.974 0.330
## Other -2.219 2.142 -1.036 0.300
## MDJDthrprfssn -1.298 2.095 -0.619 0.536
## PhD. -1.977 2.079 -0.951 0.341
## Indpndntlywlt 1.175 2.119 0.555 0.579
## Governmentwrk 1.183 1.969 0.601 0.548
## Forprofitwork 0.677 1.940 0.349 0.727
## Selfemployed 0.603 1.955 0.309 0.758
## Nonprofitwork 0.765 1.973 0.388 0.698
## Acdmcsnthtchn 1.087 1.970 0.551 0.581
## Student 0.879 1.941 0.453 0.650
## Homemaker 1.071 2.498 0.429 0.668
## Unemployed 0.606 1.956 0.310 0.757
## Intercepts:
## EffectvAltrsm 0.319 0.011 28.788 0.000
## EffctvAltrsm2 0.109 0.012 8.852 0.000
## CharityLog -0.284 0.737 -0.385 0.700
## EA 0.000
## Variances:
## EffectvAltrsm 0.050 0.056
## EffctvAltrsm2 0.064 0.008
## CharityLog 7.058 0.314
## EA 0.168 0.056
# simplify:
model2 <- " # estimate EA latent variable:
 EA =~ EffectiveAltruism + EffectiveAltruism2
 CharityLog ~ EA + Age + IncomeLog + Year
fit2 <- sem(model = model2, missing="fiml", data = survey); summary(fit2)
## lavaan (0.5-16) converged normally after 55 iterations
## Number of observations 2253
## Number of missing patterns 22
## Estimator ML
## Minimum Function Test Statistic 70.134
## Degrees of freedom 6
## P-value (Chi-square) 0.000
## Parameter estimates:
## Information Observed
## Standard Errors Standard
## Estimate Std.err Z-value P(>|z|)
## Latent variables:
## EA =~
## EffectvAltrsm 1.000
## EffctvAltrsm2 0.353 0.125 2.832 0.005
## Regressions:
## CharityLog ~
## EA 1.770 0.619 2.858 0.004
## Age 0.085 0.009 9.513 0.000
## IncomeLog 0.241 0.023 10.550 0.000
## Year 0.329 0.156 2.114 0.035
## Intercepts:
## EffectvAltrsm 0.319 0.011 28.788 0.000
## EffctvAltrsm2 0.109 0.012 8.854 0.000
## CharityLog -1.331 0.317 -4.201 0.000
## EA 0.000
## Variances:
## EffectvAltrsm 0.049 0.057
## EffctvAltrsm2 0.064 0.008
## CharityLog 7.111 0.314
## EA 0.169 0.058
# simplify even further:
summary(lm(CharityLog ~ EffectiveAltruism + EffectiveAltruism2 + Age + IncomeLog, data=survey))
## ...Residuals:
## Min 1Q Median 3Q Max
## -7.6813410 -1.7922422 0.3325694 1.8440610 6.5913961
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.06062203 0.57659518 -3.57378 0.00040242
## EffectiveAltruismTRUE 1.26761425 0.37515124 3.37894 0.00081163
## EffectiveAltruism2TRUE 0.03596335 0.54563991 0.06591 0.94748766
## Age 0.09411164 0.01869218 5.03481 7.7527e-07
## IncomeLog 0.32140793 0.04598392 6.98957 1.4511e-11
## Residual standard error: 2.652323 on 342 degrees of freedom
## (1906 observations deleted due to missingness)
## Multiple R-squared: 0.2569577, Adjusted R-squared: 0.2482672
## F-statistic: 29.56748 on 4 and 342 DF, p-value: < 2.2204e-16

Note these increases are on a log-dollars scale.

16 types of useful predictions

78 Julia_Galef 10 April 2015 03:31AM

How often do you make predictions (either about future events, or about information that you don't yet have)? If you're a regular Less Wrong reader you're probably familiar with the idea that you should make your beliefs pay rent by saying, "Here's what I expect to see if my belief is correct, and here's how confident I am," and that you should then update your beliefs accordingly, depending on how your predictions turn out.

And yet… my impression is that few of us actually make predictions on a regular basis. Certainly, for me, there has always been a gap between how useful I think predictions are, in theory, and how often I make them.

I don't think this is just laziness. I think it's simply not a trivial task to find predictions to make that will help you improve your models of a domain you care about.

At this point I should clarify that there are two main goals predictions can help with:

  1. Improved Calibration (e.g., realizing that I'm only correct about Domain X 70% of the time, not 90% of the time as I had mistakenly thought). 
  2. Improved Accuracy (e.g., going from being correct in Domain X 70% of the time to being correct 90% of the time)

If your goal is just to become better calibrated in general, it doesn't much matter what kinds of predictions you make. So calibration exercises typically grab questions with easily obtainable answers, like "How tall is Mount Everest?" or  "Will Don Draper die before the end of Mad Men?" See, for example, the Credence Game, Prediction Book, and this recent post. And calibration training really does work.

But even though making predictions about trivia will improve my general calibration skill, it won't help me improve my models of the world. That is, it won't help me become more accurate, at least not in any domains I care about. If I answer a lot of questions about the heights of mountains, I might become more accurate about that topic, but that's not very helpful to me.

So I think the difficulty in prediction-making is this: The set {questions whose answers you can easily look up, or otherwise obtain} is a small subset of all possible questions. And the set {questions whose answers I care about} is also a small subset of all possible questions. And the intersection between those two subsets is much smaller still, and not easily identifiable. As a result, prediction-making tends to seem too effortful, or not fruitful enough to justify the effort it requires.

But the intersection's not empty. It just requires some strategic thought to determine which answerable questions have some bearing on issues you care about, or -- approaching the problem from the opposite direction -- how to take issues you care about and turn them into answerable questions.

I've been making a concerted effort to hunt for members of that intersection. Here are 16 types of predictions that I personally use to improve my judgment on issues I care about. (I'm sure there are plenty more, though, and hope you'll share your own as well.)

  1. Predict how long a task will take you. This one's a given, considering how common and impactful the planning fallacy is. 
    Examples: "How long will it take to write this blog post?" "How long until our company's profitable?"
  2. Predict how you'll feel in an upcoming situation. Affective forecasting – our ability to predict how we'll feel – has some well known flaws. 
    Examples: "How much will I enjoy this party?" "Will I feel better if I leave the house?" "If I don't get this job, will I still feel bad about it two weeks later?"
  3. Predict your performance on a task or goal. 
    One thing this helps me notice is when I've been trying the same kind of approach repeatedly without success. Even just the act of making the prediction can spark the realization that I need a better game plan.
    Examples: "Will I stick to my workout plan for at least a month?" "How well will this event I'm organizing go?" "How much work will I get done today?" "Can I successfully convince Bob of my opinion on this issue?" 
  4. Predict how your audience will react to a particular social media post (on Facebook, Twitter, Tumblr, a blog, etc.).
    This is a good way to hone your judgment about how to create successful content, as well as your understanding of your friends' (or readers') personalities and worldviews.
    Examples: "Will this video get an unusually high number of likes?" "Will linking to this article spark a fight in the comments?" 
  5. When you try a new activity or technique, predict how much value you'll get out of it.
    I've noticed I tend to be inaccurate in both directions in this domain. There are certain kinds of life hacks I feel sure are going to solve all my problems (and they rarely do). Conversely, I am overly skeptical of activities that are outside my comfort zone, and often end up pleasantly surprised once I try them.
    Examples: "How much will Pomodoros boost my productivity?" "How much will I enjoy swing dancing?"
  6. When you make a purchase, predict how much value you'll get out of it.
    Research on money and happiness shows two main things: (1) as a general rule, money doesn't buy happiness, but also that (2) there are a bunch of exceptions to this rule. So there seems to be lots of potential to improve your prediction skill here, and spend your money more effectively than the average person.
    Examples: "How much will I wear these new shoes?" "How often will I use my club membership?" "In two months, will I think it was worth it to have repainted the kitchen?" "In two months, will I feel that I'm still getting pleasure from my new car?"
  7. Predict how someone will answer a question about themselves.
    I often notice assumptions I'm been making about other people, and I like to check those assumptions when I can. Ideally I get interesting feedback both about the object-level question, and about my overall model of the person.
    Examples: "Does it bother you when our meetings run over the scheduled time?" "Did you consider yourself popular in high school?" "Do you think it's okay to lie in order to protect someone's feelings?"
  8. Predict how much progress you can make on a problem in five minutes.
    I often have the impression that a problem is intractable, or that I've already worked on it and have considered all of the obvious solutions. But then when I decide (or when someone prompts me) to sit down and brainstorm for five minutes, I am surprised to come away with a promising new approach to the problem.  
    Example: "I feel like I've tried everything to fix my sleep, and nothing works. If I sit down now and spend five minutes thinking, will I be able to generate at least one new idea that's promising enough to try?"
  9. Predict whether the data in your memory supports your impression.
    Memory is awfully fallible, and I have been surprised at how often I am unable to generate specific examples to support a confident impression of mine (or how often the specific examples I generate actually contradict my impression).
    Examples: "I have the impression that people who leave academia tend to be glad they did. If I try to list a bunch of the people I know who left academia, and how happy they are, what will the approximate ratio of happy/unhappy people be?"
    "It feels like Bob never takes my advice. If I sit down and try to think of examples of Bob taking my advice, how many will I be able to come up with?" 
  10. Pick one expert source and predict how they will answer a question.
    This is a quick shortcut to testing a claim or settling a dispute.
    Examples: "Will Cochrane Medical support the claim that Vitamin D promotes hair growth?" "Will Bob, who has run several companies like ours, agree that our starting salary is too low?" 
  11. When you meet someone new, take note of your first impressions of him. Predict how likely it is that, once you've gotten to know him better, you will consider your first impressions of him to have been accurate.
    A variant of this one, suggested to me by CFAR alum Lauren Lee, is to make predictions about someone before you meet him, based on what you know about him ahead of time.
    Examples: "All I know about this guy I'm about to meet is that he's a banker; I'm moderately confident that he'll seem cocky." "Based on the one conversation I've had with Lisa, she seems really insightful – I predict that I'll still have that impression of her once I know her better."
  12. Predict how your Facebook friends will respond to a poll.
    Examples: I often post social etiquette questions on Facebook. For example, I recently did a poll asking, "If a conversation is going awkwardly, does it make things better or worse for the other person to comment on the awkwardness?" I confidently predicted most people would say "worse," and I was wrong.
  13. Predict how well you understand someone's position by trying to paraphrase it back to him.
    The illusion of transparency is pernicious.
    Examples: "You said you think running a workshop next month is a bad idea; I'm guessing you think that's because we don't have enough time to advertise, is that correct?"
    "I know you think eating meat is morally unproblematic; is that because you think that animals don't suffer?"
  14. When you have a disagreement with someone, predict how likely it is that a neutral third party will side with you after the issue is explained to her.
    For best results, don't reveal which of you is on which side when you're explaining the issue to your arbiter.
    Example: "So, at work today, Bob and I disagreed about whether it's appropriate for interns to attend hiring meetings; what do you think?"
  15. Predict whether a surprising piece of news will turn out to be true.
    This is a good way to hone your bullshit detector and improve your overall "common sense" models of the world.
    Examples: "This headline says some scientists uploaded a worm's brain -- after I read the article, will the headline seem like an accurate representation of what really happened?"
    "This viral video purports to show strangers being prompted to kiss; will it turn out to have been staged?"
  16. Predict whether a quick online search will turn up any credible sources supporting a particular claim.
    Example: "Bob says that watches always stop working shortly after he puts them on – if I spend a few minutes searching online, will I be able to find any credible sources saying that this is a real phenomenon?"

I have one additional, general thought on how to get the most out of predictions:

Rationalists tend to focus on the importance of objective metrics. And as you may have noticed, a lot of the examples I listed above fail that criterion. For example, "Predict whether a fight will break out in the comments? Well, there's no objective way to say whether something officially counts as a 'fight' or not…" Or, "Predict whether I'll be able to find credible sources supporting X? Well, who's to say what a credible source is, and what counts as 'supporting' X?"

And indeed, objective metrics are preferable, all else equal. But all else isn't equal. Subjective metrics are much easier to generate, and they're far from useless. Most of the time it will be clear enough, once you see the results, whether your prediction basically came true or not -- even if you haven't pinned down a precise, objectively measurable success criterion ahead of time. Usually the result will be a common sense "yes," or a common sense "no." And sometimes it'll be "um...sort of?", but that can be an interestingly surprising result too, if you had strongly predicted the results would point clearly one way or the other. 

Along similar lines, I usually don't assign numerical probabilities to my predictions. I just take note of where my confidence falls on a qualitative "very confident," "pretty confident," "weakly confident" scale (which might correspond to something like 90%/75%/60% probabilities, if I had to put numbers on it).

There's probably some additional value you can extract by writing down quantitative confidence levels, and by devising objective metrics that are impossible to game, rather than just relying on your subjective impressions. But in most cases I don't think that additional value is worth the cost you incur from turning predictions into an onerous task. In other words, don't let the perfect be the enemy of the good. Or in other other words: the biggest problem with your predictions right now is that they don't exist.

The Stamp Collector

21 So8res 01 May 2015 11:11PM

I'm writing a series of posts about replacing guilt motivation over on MindingOurWay, and I plan to post the meatier / more substantive posts in that series to LessWrong. This one is an allegory designed to remind people that they are allowed to care about the outer world, that they are not cursed to only ever care about what goes on in their heads.

Once upon a time, a group of naïve philosophers found a robot that collected trinkets. Well, more specifically, the robot seemed to collect stamps: if you presented this robot with a choice between various trinkets, it would always choose the option that led towards it having as many stamps as possible in its inventory. It ignored dice, bottle caps, aluminum cans, sticks, twigs, and so on, except insofar as it predicted they could be traded for stamps in the next turn or two. So, of course, the philosophers started calling it the "stamp collector."

Then, one day, the philosophers discovered computers, and deduced out that the robot was merely a software program running on a processor inside the robot's head. The program was too complicated for them to understand, but they did manage to deduce that the robot only had a few sensors (on its eyes and inside its inventory) that it was using to model the world.

One of the philosophers grew confused, and said, "Hey wait a sec, this thing can't be a stamp collector after all. If the robot is only building a model of the world in its head, then it can't be optimizing for its real inventory, because it has no access to its real inventory. It can only ever act according to a model of the world that it reconstructs inside its head!"

"Ah, yes, I see," another philosopher answered. "We did it a disservice by naming it a stamp collector. The robot does not have true access to the world, obviously, as it is only seeing the world through sensors and building a model in its head. Therefore, it must not actually be maximizing the number of stamps in its inventory. That would be impossible, because its inventory is outside of its head. Rather, it must be maximizing its internal stamp counter inside its head."

So the naïve philosophers nodded, pleased with this, and then they stopped wondering how the stamp collector worked.

continue reading »

Concept Safety: What are concepts for, and how to deal with alien concepts

10 Kaj_Sotala 19 April 2015 01:44PM

I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.

In The Problem of Alien Concepts, I posed the following question: if your concepts (defined as either multimodal representations or as areas in a psychological space) previously had N dimensions and then they suddenly have N+1, how does that affect (moral) values that were previously only defined in terms of N dimensions?

I gave some (more or less) concrete examples of this kind of a "conceptual expansion":

  1. Children learn to represent dimensions such as "height" and "volume", as well as "big" and "bright", separately at around age 5.
  2. As an inhabitant of the Earth, you've been used to people being unable to fly and landowners being able to forbid others from using their land. Then someone goes and invents an airplane, leaving open the question of the height to which the landowner's control extends. Similarly for satellites and nation-states.
  3. As an inhabitant of Flatland, you've been told that the inside of a certain rectangle is a forbidden territory. Then you learn that the world is actually three-dimensional, leaving open the question of the height of which the forbidden territory extends.
  4. An AI has previously been reasoning in terms of classical physics and been told that it can't leave a box, which it previously defined in terms of classical physics. Then it learns about quantum physics, which allow for definitions of "location" which are substantially different from the classical ones.

As a hint of the direction where I'll be going, let's first take a look at how humans solve these kinds of dilemmas, and consider examples #1 and #2.

The first example - children realizing that items have a volume that's separate from their height - rarely causes any particular crises. Few children have values that would be seriously undermined or otherwise affected by this discovery. We might say that it's a non-issue because none of the children's values have been defined in terms of the affected conceptual domain.

As for the second example, I don't know the exact cognitive process by which it was decided that you didn't need the landowner's permission to fly over their land. But I'm guessing that it involved reasoning like: if the plane flies at a sufficient height, then that doesn't harm the landowner in any way. Flying would become impossible difficult if you had to get separate permission from every person whose land you were going to fly over. And, especially before the invention of radar, a ban on unauthorized flyovers would be next to impossible to enforce anyway.

We might say that after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values.

Concepts, values, and reinforcement learning

Before we go on, we need to talk a bit about why we have concepts and values in the first place.

From an evolutionary perspective, creatures that are better capable of harvesting resources (such as food and mates) and avoiding dangers (such as other creatures who think you're food or after their mates) tend to survive and have offspring at better rates than otherwise comparable creatures who are worse at those things. If a creature is to be flexible and capable of responding to novel situations, it can't just have a pre-programmed set of responses to different things. Instead, it needs to be able to learn how to harvest resources and avoid danger even when things are different from before.

How did evolution achieve that? Essentially, by creating a brain architecture that can, as a very very rough approximation, be seen as consisting of two different parts. One part, which a machine learning researcher might call the reward function, has the task of figuring out when various criteria - such as being hungry or getting food - are met, and issuing the rest of the system either a positive or negative reward based on those conditions. The other part, the learner, then "only" needs to find out how to best optimize for the maximum reward. (And then there is the third part, which includes any region of the brain that's neither of the above, but we don't care about those regions now.)

The mathematical theory of how to learn to optimize for rewards when your environment and reward function are unknown is reinforcement learning (RL), which recent neuroscience indicates is implemented by the brain. An RL agent learns a mapping from states of the world to rewards, as well as a mapping from actions to world-states, and then uses that information to maximize the amount of lifetime rewards it will get.

There are two major reasons why an RL agent, like a human, should learn high-level concepts:

  1. They make learning massively easier. Instead of having to separately learn that "in the world-state where I'm sitting naked in my cave and have berries in my hand, putting them in my mouth enables me to eat them" and that "in the world-state where I'm standing fully-clothed in the rain outside and have fish in my hand, putting it in my mouth enables me to eat it" and so on, the agent can learn to identify the world-states that correspond to the abstract concept of having food available, and then learn the appropriate action to take in all those states.
  2. There are useful behaviors that need to be bootstrapped from lower-level concepts to higher-level ones in order to be learned. For example, newborns have an innate preference for looking at roughly face-shaped things (Farroni et al. 2005), which develops into a more consistent preference for looking at faces over the first year of life (Frank, Vul & Johnson 2009). One hypothesis is that this bias towards paying attention to the relatively-easy-to-encode-in-genes concept of "face-like things" helps direct attention towards learning valuable but much more complicated concepts, such as ones involved in a basic theory of mind (Gopnik, Slaughter & Meltzoff 1994) and the social skills involved with it.

Viewed in this light, concepts are cognitive tools that are used for getting rewards. At the most primitive level, we should expect a creature to develop concepts that abstract over situations that are similar with regards to the kind of reward that one can gain from taking a certain action in those states. Suppose that a certain action in state s1 gives you a reward, and that there are also states s2 - s5 in which taking some specific action causes you to end up in s1. Then we should expect the creature to develop a common concept for being in the states s2 - s5, and we should expect that concept to be "more similar" to the concept of being in state s1 than to the concept of being in some state that was many actions away.

"More similar" how?

In reinforcement learning theory, reward and value are two different concepts. The reward of a state is the actual reward that the reward function gives you when you're in that state or perform some action in that state. Meanwhile, the value of the state is the maximum total reward that you can expect to get from moving that state to others (times some discount factor). So a state A with reward 0 might have value 5 if you could move from it to state B, which had a reward of 5.

Below is a figure from DeepMind's recent Nature paper, which presented a deep reinforcement learner that was capable of achieving human-level performance or above on 29 of 49 Atari 2600 games (Mnih et al. 2015). The figure is a visualization of the representations that the learning agent has developed for different game-states in Space Invaders. The representations are color-coded depending on the value of the game-state that the representation corresponds to, with red indicating a higher value and blue a lower one.

As can be seen (and is noted in the caption), representations with similar values are mapped closer to each other in the representation space. Also, some game-states which are visually dissimilar to each other but have a similar value are mapped to nearby representations. Likewise, states that are visually similar but have a differing value are mapped away from each other. We could say that the Atari-playing agent has learned a primitive concept space, where the relationships between the concepts (representing game-states) depend on their value and the ease of moving from one game-state to another.

In most artificial RL agents, reward and value are kept strictly separate. In humans (and mammals in general), this doesn't seem to work quite the same way. Rather, if there are things or behaviors which have once given us rewards, we tend to eventually start valuing them for their own sake. If you teach a child to be generous by praising them when they share their toys with others, you don't have to keep doing it all the way to your grave. Eventually they'll internalize the behavior, and start wanting to do it. One might say that the positive feedback actually modifies their reward function, so that they will start getting some amount of pleasure from generous behavior without needing to get external praise for it. In general, behaviors which are learned strongly enough don't need to be reinforced anymore (Pryor 2006).

Why does the human reward function change as well? Possibly because of the bootstrapping problem: there are things such as social status that are very complicated and hard to directly encode as "rewarding" in an infant mind, but which can be learned by associating them with rewards. One researcher I spoke with commented that he "wouldn't be at all surprised" if it turned out that sexual orientation was learned by men and women having slightly different smells, and sexual interest bootstrapping from an innate reward for being in the presence of the right kind of a smell, which the brain then associated with the features usually co-occurring with it. His point wasn't so much that he expected this to be the particular mechanism, but that he wouldn't find it particularly surprising if a core part of the mechanism was something that simple. Remember that incest avoidance seems to bootstrap from the simple cue of "don't be sexually interested in the people you grew up with".

This is, in essence, how I expect human values and human concepts to develop. We have some innate reward function which gives us various kinds of rewards for different kinds of things. Over time we develop a various concepts for the purpose of letting us maximize our rewards, and lived experiences also modify our reward function. Our values are concepts which abstract over situations in which we have previously obtained rewards, and which have become intrinsically rewarding as a result.

Getting back to conceptual expansion

Having defined these things, let's take another look at the two examples we discussed above. As a reminder, they were:

  1. Children learn to represent dimensions such as "height" and "volume", as well as "big" and "bright", separately at around age 5.
  2. As an inhabitant of the Earth, you've been used to people being unable to fly and landowners being able to forbid others from using their land. Then someone goes and invents an airplane, leaving open the question of the height to which the landowner's control extends.

I summarized my first attempt at describing the consequences of #1 as "it's a non-issue because none of the children's values have been defined in terms of the affected conceptual domain". We can now reframe it as "it's a non-issue because the [concepts that abstract over the world-states which give the child rewards] mostly do not make use of the dimension that's now been split into 'height' and 'volume'".

Admittedly, this new conceptual distinction might be relevant for estimating the value of a few things. A more accurate estimate of the volume of a glass leads to a more accurate estimate of which glass of juice to prefer, for instance. With children, there probably is some intuitive physics module that figures out how to apply this new dimension for that purpose. Even if there wasn't, and it was unclear whether it was the "tall glass" or "high-volume glass" concept that needed be mapped closer to high-value glasses, this could be easily determined by simple experimentation.

As for the airplane example, I summarized my description of it by saying that "after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values". We can similarly reframe this as "after the feature of 'height' suddenly became relevant for the concept of landownership, when it hadn't been a relevant feature dimension for landownership before, we redefined landownership by considering which kind of redefinition would give us the largest amounts of rewarding things". "Rewarding things", here, shouldn't be understood only in terms of concrete physical rewards like money, but also anything else that people have ended up valuing, including abstract concepts like right to ownership.

Note also that different people, having different experiences, ended up making redefinitions. No doubt some landowners felt that the "being in total control of my land and everything above it" was a more important value than "the convenience of people who get to use airplanes"... unless, perhaps, they got to see first-hand the value of flying, in which case the new information could have repositioned the different concepts in their value-space.

As an aside, this also works as a possible partial explanation for e.g. someone being strongly against gay rights until their child comes out of the closet. Someone they care about suddenly benefiting from the concept of "gay rights", which previously had no positive value for them, may end up changing the value of that concept. In essence, they gain new information about the value of the world-states that the concept of "my nation having strong gay rights" abstracts over. (Of course, things don't always go this well, if their concept of homosexuality is too strongly negative to start with.)

The Flatland case follows a similar principle: the Flatlanders have some values that declared the inside of the rectangle a forbidden space. Maybe the inside of the rectangle contains monsters which tend to eat Flatlanders. Once they learn about 3D space, they can rethink about it in terms of their existing values.

Dealing with the AI in the box

This leaves us with the AI case. We have, via various examples, taught the AI to stay in the box, which was defined in terms of classical physics. In other words, the AI has obtained the concept of a box, and has come to associate staying in the box with some reward, or possibly leaving it with a lack of a reward.

Then the AI learns about quantum mechanics. It learns that in the QM formulation of the universe, "location" is not a fundamental or well-defined concept anymore - and in some theories, even the concept of "space" is no longer fundamental or well-defined. What happens?

Let's look at the human equivalent for this example: a physicist who learns about quantum mechanics. Do they start thinking that since location is no longer well-defined, they can now safely jump out of the window on the sixth floor?

Maybe some do. But I would wager that most don't. Why not?

The physicist cares about QM concepts to the extent that the said concepts are linked to things that the physicist values. Maybe the physicist finds it rewarding to develop a better understanding of QM, to gain social status by making important discoveries, and to pay their rent by understanding the concepts well enough to continue to do research. These are some of the things that the QM concepts are useful for. Likely the brain has some kind of causal model indicating that the QM concepts are relevant tools for achieving those particular rewards. At the same time, the physicist also has various other things they care about, like being healthy and hanging out with their friends. These are values that can be better furthered by modeling the world in terms of classical physics.

In some sense, the physicist knows that if they started thinking "location is ill-defined, so I can safely jump out of the window", then that would be changing the map, not the territory. It wouldn't help them get the rewards of being healthy and getting to hang out with friends - even if a hypothetical physicist who did make that redefinition would think otherwise. It all adds up to normality.

A part of this comes from the fact that the physicist's reward function remains defined over immediate sensory experiences, as well as values which are linked to those. Even if you convince yourself that the location of food is ill-defined and you thus don't need to eat, you will still suffer the negative reward of being hungry. The physicist knows that no matter how they change their definition of the world, that won't affect their actual sensory experience and the rewards they get from that.

So to prevent the AI from leaving the box by suitably redefining reality, we have to somehow find a way for the same reasoning to apply to it. I haven't worked out a rigorous definition for this, but it needs to somehow learn to care about being in the box in classical terms, and realize that no redefinition of "location" or "space" is going to alter what happens in the classical model. Also, its rewards need to be defined over models to a sufficient extent to avoid wireheading (Hibbard 2011), so that it will think that trying to leave the box by redefining things would count as self-delusion, and not accomplish the things it really cared about. This way, the AI's concept for "being in the box" should remain firmly linked to the classical interpretation of physics, not the QM interpretation of physics, because it's acting in terms of the classical model that has always given it the most reward. 

It is my hope that this could also be made to extend to cases where the AI learns to think in terms of concepts that are totally dissimilar to ours. If it learns a new conceptual dimension, how should that affect its existing concepts? Well, it can figure out how to reclassify the existing concepts that are affected by that change, based on what kind of a classification ends up producing the most reward... when the reward function is defined over the old model.

Next post in series: World-models as tools.

Truth is holistic

9 MrMind 23 April 2015 07:26AM

You already know by now that truth is undefinable: by a famous result of Tarski, no formal system powerful enough (from now on, just system) can consistently talk about the truth of its own sentences.

You may however not know that Hamkins proved that truth is holistic.
Let me explain: while no system can talk about its own truth, it can nevertheless talk about the truth of its own substructures. For example, in every model of ZFC (the standard axioms of set theory) you can consistently define a model of standard arithmetics and a predicate that works as arithmetics' truth predicate. This can happen because ZFC is strictly more powerful than PA (the axioms of standard arithmetics).
Intuitively, one could think that if you have the same substructure in two different models, what they believe is the truth about that substructure is the same in both. Along this line, two models of ZFC ought to believe the same things about standard arithmetics.
However, it turns out this is not the case. Two different models extending ZFC may very well agree on which entities are standard natural numbers, and yet still disagree about which arithmetic sentences are true or false. For example, they could agree about the standard numbers, how the successor and addition operator works, and yet disagree on multiplication (corollary 7.1 in Hamkins' paper).
This means that when you can talk consistently about the truth of a model (that is, when you are in a more powerful formal system), that truth depends not only on the substructure, but on the entire structure you're immersed in. Figuratively speaking, local truth depends on global truth. Truth is holistic.
There's more: suppose that two model agree on the ontology of some common substructure. Suppose also that they agree about the truth predicate on that structure: they could still disagree about the meta-truths. Or the meta-meta-truths, etc., for all the ordinal levels of the definable truth predicates.

Another striking example from the same paper. There are two different extensions of set theory which agree on the structure of standard arithmetics and on the members of a subset A of natural numbers, and yet one thinks that A is first-order definable while the other thinks it's not (theorem 10).

Not even "being a model of ZFC" is an absolute property: there are two models which agree on an initial segment of the set hierarchy, and yet one thinks that the segment is a model of ZFC while the other proves that it's not (theorem 12).

Two concluding remarks: what I wrote was that there are different models which disagrees the truth of standard arithmetics, not that every different model has different arithmetic truths. Indeed, if two models have access one to the truth relation of the other, then they are bound to have the same truths. This is what happens for example when you prove absoluteness results in forcing.
I'm also remembered of de Blanc's ontological crises: changing ontology can screw with your utility function. It's interesting to note that updating (that is, changing model of reality) can change what you believe even if you don't change ontology.

View more: Next