Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Even better cryonics – because who needs nanites anyway?

47 maxikov 07 April 2015 08:10PM

Abstract: in this post I propose a protocol for cryonic preservation (with the central idea of using high pressure to prevent water from expanding rather than highly toxic cryoprotectants), which I think has a chance of being non-destructive enough for us to be able to preserve and then resuscitate an organism with modern technologies. In addition, I propose a simplified experimental protocol for a shrimp (or other small model organism (building a large pressure chamber is hard) capable of surviving in very deep and cold waters; shrimp is a nice trade-off between the depth of habitat and the ease of obtaining them on market), which is simple enough to be doable in a small lab or well-equipped garage setting.

Are there obvious problems with this, and how can they be addressed?

Is there a chance to pitch this experiment to a proper academic institution, or garage it is?

Originally posted here.

I do think that the odds of ever developing advanced nanomachines and/or brain scanning on molecular level plus algorithms for reversing information distortion - everything you need to undo the damage from conventional cryonic preservation and even to some extent that of brain death, according to its modern definition, if wasn't too late when the brain was preserved - for currently existing cryonics to be a bet worth taking. This is dead serious, and it's an actionable item.

Less of an action item: what if the future generations actually build quantum Bayesian superintelligence, close enough in its capabilities to Solomonoff induction, at which point even a mummified brain or the one preserved in formalin would be enough evidence to restore its original state? Or what if they invent read-only time travel, and make backups of everyone's mind right before they died (at which point it becomes indistinguishable from the belief in afterlife existing right now)? Even without time travel, they can just use a Universe-sized supercomputer to simulate every singe human physically possible, and naturally of of them is gonna be you. But aside from the obvious identity issues (and screw the timeless identity), that relies on unknown unknowns with uncomputable probabilities, and I'd like to have as few leaps of faith and quantum suicides in my life as possible.

So although vitrification right after diagnosed brain death relies on far smaller assumptions, and if totally worth doing - let me reiterate that: go sign up for cryonics - it'd be much better if we had preservation protocols so non-destructive that we could actually freeze a living human, and then bring them back alive. If nothing else, that would hugely increase the public outreach, grant the patient (rather than cadaver) status to the preserved, along with the human rights, get it recognized as a medical procedure covered by insurance or single payer, allow doctors to initiate the preservation of a dying patient before the brain death (again: I think everything short of information-theoretic death should potentially be reversible, but why take chances?), allow suffering patient opt for preservation rather than euthanasia (actually, I think it should be done right now: why on earth would anyone allow a person to do something that's guaranteed to kill them, but not allowed to do something that maybe will kill, or maybe will give the cure?), or even allow patients suffering from degrading brain conditions (e.g. Alzheimer's) to opt for preservation before their memory and personality are permanently destroyed.

Let's fix cryonics! First of all, why can't we do it on living organisms? Because of heparin poisoning - every cryoprotectant efficient enough to prevent the formation of ice crystals is a strong enough poison to kill the organism (leave alone that we can't even saturate the whole body with it - current technologies only allow to do it for the brain alone). But without cryoprotectants the water will expand upon freezing, and break the cells. But there's another way to prevent this. Under pressure above 350 MPa water slightly shrinks upon freezing rather than expanding:


So that's basically that: the key idea is to freeze (and keep) everything under pressure. Now, there are some tricks to that too.

It's not easy to put basically any animal, especially a mammal, under 350 MPa (which is 3.5x higher than in Mariana Trench). At this point even Trimix becomes toxic. Basically the only remaining solution is total liquid ventilation, which has one problem: it has never been applied successfully to a human. There's one fix to that I see: as far as I can tell, no one has ever attempted to do perform it under high pressure, and the attempts were basically failing because of the insufficient solubility of oxygen and carbon dioxide in perfluorocarbons. Well then, let's increase the pressure! Namely, go to 3 MPa on Trimix, which is doable, and only then switch to TLV, whose efficiency is improved by the higher gas solubility under high pressure. But there's another solution too. If you just connect a cardiopulmonary bypass (10 hours should be enough for the whole procedure), you don't need the surrounding liquid to even be breathable - it can just be saline. CPB also solves the problem of surviving the period after the cardiac arrest (which will occur at around 30 centigrade) but before the freezing happens - you can just keep the blood circulating and delivering oxygen.

Speaking of hypoxia, even with the CPB it's still a problem. You positively don't want the blood to circulate when freezing starts, lest it act like an abrasive water cutter. It's not that much of a problem under near-freezing temperatures, but still. Fortunately, this effect can be mitigated by administering insulin first (yay, it's the first proper academic citation in this post! Also yay, I thought about this before I even discovered that it's actually true). This makes sense: if oxygen is primarily used to metabolize glucose, less glucose means less oxygen consumed, and less damage done by hypoxia. Then there's another thing: on the phase diagram you can see that before going into the area of high temperature ice at 632 MPa, freezing temperature actually dips down to roughly -30 centigrade at 209~350 MPa. That would allow to really shut down metabolism for good when water is still liquid, and blood can be pumped by the CPB. From this point we have two ways. First, we can do the normal thing, and start freezing very slowly, so minimize the formation of ice crystals (even though they're smaller than the original water volume, they may still be sharp). Second, we can increase the pressure. That would lead to near-instantaneous freezing everywhere, thus completely eliminating the problem of hypoxia - before the freezing, blood still circulated, and freezing is very quick - way faster than can ever be achieved even by throwing a body into liquid helium under normal pressure. Video evidence suggests that quick freezing of water leads to the formation of a huge number of crystals, which is bad, but I don't know near-instantaneous freezing from supercooled state and near-instantaneous freezing upon raising the pressure will lead to the same effect. More experiments are needed, preferably not on humans.

So here is my preservation protocol:

  1. Anesthetize a probably terminally ill, but still conscious person.
  2. Connect them to a cardiopulmonary bypass.
  3. Replacing their blood with perfluorohexane is not necessary, since we seem to be already doing a decent job at having medium-term (several days) cardiopulmonary bypasses, but that could still help.
  4. Submerge them in perfluorohexane, making sure that no air bubbles are left.
  5. Slowly raise the ambient pressure to 350 MPa (~3.5kBar) without stopping the bypass.
  6. Apply a huge dose of insulin to reduce all their metabolic processes.
  7. Slowly cool them to -30 centigrade (at which point, given such pressure, water is still liquid), while increasing the dose of insulin, and raising the oxygen supply to the barely subtoxic level.
  8. Slowly raise the pressure to 1 GPa (~10kBar), at which point the water solidifies, but does so with shrinking rather than expanding. Don't cutoff the blood circulation until the moment when ice crystals starts forming in the blood/perfluorohexane flow.
  9. Slowly lower the temperature to -173 centigrade or lower, as you wish.


And then back:

  1. Raise the temperature to -20 centigrade.
  2. Slowly lower the pressure to 350 MPa, at which point ice melts.
  3. Start artificial blood circulation with a barely subtoxic oxygen level.
  4. Slowly raise the temperature to +4 centigrade.
  5. Slowly lower the pressure to 1 Bar.
  6. Drain the ambient perfluorohexane and replace it with pure oxygen. Attach and start a medical ventilator.
  7. Slowly raise the temperature to +32 centigrade.
  8. Apply a huge dose of epinephrine and sugar, while transfusing the actual blood (preferably autotransfusion), to restart the heart.
  9. Rejoice.


I claim that this protocol allows you freeze a living human to an arbitrarily low temperature, and then bring them back alive without brain damage, thus being the first true victory over death.

But let's start with something easy and small, like a shrimp. They already live in water, so there's no need to figure out the protocol for putting them into liquid. And they're already adapted to live under high pressure (no swim bladders or other cavities). And they're already adapted to live in cold water, so they should be expected to survive further cooling.

Small ones can be about 1 inch big, so let's be safe and use a 5cm-wide cylinder. To form ice III we need about 350MPa, which gives us 350e6 * 3.14 * 0.025^2 / 9.8 = 70 tons or roughly 690kN of force. Applying it directly or with a lever is unreasonable, since 70 tons of bending force is a lot even for steel, given the 5cm target. Block and tackle system is probably a good solution - actually, two of them, on each side of a beam used for compression, so we have 345 kN per system. And it looks like you can buy 40~50 ton manual hoists from alibaba, though I have no idea about their quality.


I'm not sure to which extent Pascal's law applies to solids, but if it does, the whole setup can be vastly optimized by creating a bottle neck for the pistol. One problem is that we can no longer assume that water in completely incompressible - it had to be compressed to about 87% its original volume - but aside from that, 350MPa per a millimeter thick rod is just 28kg. To compress a 0.05m by 0.1m cylinder to 87% its original volume we need to pump extra 1e-4 m^3 of water there, which amounts to 148 meters of movement, which isn't terribly good. 1cm thick rod, on the other hand, would require almost 3 tons of force, but will move only 1.5 meters. Or the problem of applying the constant pressure can be solved by enclosing the water in a plastic bag, and filling the rest of chamber with a liquid with a lower freezing point, but the same density. Thus, it is guaranteed that all the time it takes the water to freeze, it is under uniform external pressure, and then it just had nowhere to go.

Alternatively, one can just buy a 90'000 psi pump and 100'000 psi tubes and vessels, but let's face it: it they don't even list the price on their website, you probably don't even wanna know it. And since no institutions that can afford this thing seem to be interested in cryonics research, we'll have to stick to makeshift solutions (until at least the shrimp thing works, which would probably give in a publication in Nature, and enough academic recognition for proper research to start).

Political topics attract participants inclined to use the norms of mainstream political debate, risking a tipping point to lower quality discussion

36 emr 26 March 2015 12:14AM

(I hope that is the least click-baity title ever.)

Political topics elicit lower quality participation, holding the set of participants fixed. This is the thesis of "politics is the mind-killer".

Here's a separate effect: Political topics attract mind-killed participants. This can happen even when the initial participants are not mind-killed by the topic. 

Since outreach is important, this could be a good thing. Raise the sanity water line! But the sea of people eager to enter political discussions is vast, and the epistemic problems can run deep. Of course not everyone needs to come perfectly prealigned with community norms, but any community will be limited in how robustly it can handle an influx of participants expecting a different set of norms. If you look at other forums, it seems to take very little overt contemporary political discussion before the whole place is swamped, and politics becomes endemic. As appealing as "LW, but with slightly more contemporary politics" sounds, it's probably not even an option. You have "LW, with politics in every thread", and "LW, with as little politics as we can manage".  

That said, most of the problems are avoided by just not saying anything that patterns matches too easily to current political issues. From what I can tell, LW has always had tons of meta-political content, which doesn't seem to cause problems, as well as standard political points presented in unusual ways, and contrarian political opinions that are too marginal to raise concern. Frankly, if you have a "no politics" norm, people will still talk about politics, but to a limited degree. But if you don't even half-heartedly (or even hypocritically) discourage politics, then a open-entry site that accepts general topics will risk spiraling too far in a political direction. 

As an aside, I'm not apolitical. Although some people advance a more sweeping dismissal of the importance or utility of political debate, this isn't required to justify restricting politics in certain contexts. The sort of the argument I've sketched (I don't want LW to be swamped by the worse sorts of people who can be attracted to political debate) is enough. There's no hypocrisy in not wanting politics on LW, but accepting political talk (and the warts it entails) elsewhere. Of the top of my head, Yvain is one LW affiliate who now largely writes about more politically charged topics on their own blog (SlateStarCodex), and there are some other progressive blogs in that direction. There are libertarians and right-leaning (reactionary? NRx-lbgt?) connections. I would love a grand unification as much as anyone, (of course, provided we all realize that I've been right all along), but please let's not tell the generals to bring their armies here for the negotiations.

Rationality: From AI to Zombies online reading group

32 Mark_Friedenbach 21 March 2015 09:54AM

Update: When I posted this announcement I remarkably failed to make the connection that the April 15th is tax day here in the US, and as a prime example of the planning fallacy (a topic of the first sequence!), I failed to anticipate just how complicated my taxes would be this year. The first post of the reading group is basically done but a little rushed, and I want to take an extra day to get it right. Expect it to post on the next day, the 16th


On Thursday, 16 April 2015, just under a month out from this posting, I will hold the first session of an online reading group for the ebook Rationality: From AI to Zombies, a compilation of the LessWrong sequences by our own Eliezer Yudkowsky. I would like to model this on the very successful Superintelligence reading group led by KatjaGrace. This is an advanced warning, so that you can have a chance to get the ebook, make a donation to MIRI, and read the first sequence.

The point of this online reading group is to join with others to ask questions, discuss ideas, and probe the arguments more deeply. It is intended to add to the experience of reading the sequences in their new format or for the first time. It is intended to supplement discussion that has already occurred the original postings and the sequence reruns.

The reading group will 'meet' on a semi-monthly post on the LessWrong discussion forum. For each 'meeting' we will read one sequence from the the Rationality book, which contains a total of 26 lettered sequences. A few of the sequences are unusually long, and these might be split into two sessions. If so, advance warning will be given.

In each posting I will briefly summarize the salient points of the essays comprising the sequence, link to the original articles and discussion when possible, attempt to find, link to, and quote one or more related materials or opposing viewpoints from outside the text, and present a half-dozen or so question prompts to get the conversation rolling. Discussion will take place in the comments. Others are encouraged to provide their own question prompts or unprompted commentary as well.

We welcome both newcomers and veterans on the topic. If you've never read the sequences, this is a great opportunity to do so. If you are an old timer from the Overcoming Bias days then this is a chance to share your wisdom and perhaps revisit the material with fresh eyes. All levels of time commitment are welcome.

If this sounds like something you want to participate in, then please grab a copy of the book and get started reading the preface, introduction, and the 10 essays / 42 pages which comprise Part A: Predictably Wrong. The first virtual meeting (forum post) covering this material will go live before 6pm Thursday PDT (1am Friday UTC), 16 April 2015. Successive meetings will start no later than 6pm PDT on the first and third Wednesdays of a month.

Following this schedule it is expected that it will take just over a year to complete the entire book. If you prefer flexibility, come by any time! And if you are coming upon this post from the future, please feel free leave your opinions as well. The discussion period never closes.

Topic for the first week is the preface by Eliezer Yudkowsky, the introduction by Rob Bensinger, and Part A: Predictably Wrong, a sequence covering rationality, the search for truth, and a handful of biases.

Defeating the Villain

29 Zubon 26 March 2015 09:43PM

We have a recurring theme in the greater Less Wrong community that life should be more like a high fantasy novel. Maybe that is to be expected when a quarter of the community came here via Harry Potter fanfiction, and we also have rationalist group houses named after fantasy locations, descriptions of community members in terms of character archetypes and PCs versus NPCs, semi-serious development of the new atheist gods, and feel free to contribute your favorites in the comments.

A failure mode common to high fantasy novels as well as politics is solving all our problems by defeating the villain. Actually, this is a common narrative structure for our entire storytelling species, and it works well as a narrative structure. The story needs conflict, so we pit a sympathetic protagonist against a compelling antagonist, and we reach a satisfying climax when the two come into direct conflict, good conquers evil, and we live happily ever after.

This isn't an article about whether your opponent really is a villain. Let's make the (large) assumption that you have legitimately identified a villain who is doing evil things. They certainly exist in the world. Defeating this villain is a legitimate goal.

And then what?

Defeating the villain is rarely enough. Building is harder than destroying, and it is very unlikely that something good will spontaneously fill the void when something evil is taken away. It is also insufficient to speak in vague generalities about the ideals to which the post-[whatever] society will adhere. How are you going to avoid the problems caused by whatever you are eliminating, and how are you going to successfully transition from evil to good?

In fantasy novels, this is rarely an issue. The story ends shortly after the climax, either with good ascending or time-skipping to a society made perfect off-camera. Sauron has been vanquished, the rightful king has been restored, cue epilogue(s). And then what? Has the Chosen One shown skill in diplomacy and economics, solving problems not involving swords? What was Aragorn's tax policy? Sauron managed to feed his armies from a wasteland; what kind of agricultural techniques do you have? And indeed, if the book/series needs a sequel, we find that a problem at least as bad as the original fills in the void.

Reality often follows that pattern. Marx explicitly had no plan for what happened after you smashed capitalism. Destroy the oppressors and then ... as it turns out, slightly different oppressors come in and generally kill a fair percentage of the population. It works on the other direction as well; the fall of Soviet communism led not to spontaneous capitalism but rather kleptocracy and Vladmir Putin. For most of my lifetime, a major pillar of American foreign policy has seemed to be the overthrow of hostile dictators (end of plan). For example, Muammar Gaddafi was killed in 2011, and Libya has been in some state of unrest or civil war ever since. Maybe this is one where it would not be best to contribute our favorites in the comments.

This is not to say that you never get improvements that way. Aragorn can hardly be worse than Sauron. Regression to the mean perhaps suggests that you will get something less bad just by luck, as Putin seems clearly less bad than Stalin, although Stalin seems clearly worse than almost any other regime change in history. Some would say that causing civil wars in hostile countries is the goal rather than a failure of American foreign policy, which seems a darker sort of instrumental rationality.

Human flourishing is not the default state of affairs, temporarily suppressed by villainy. Spontaneous order is real, but it still needs institutions and social technology to support it.

Defeating the villain is a (possibly) necessary but (almost certainly) insufficient condition for bringing about good.

One thing I really like about this community is that projects tend to be conceived in the positive rather than the negative. Please keep developing your plans not only in terms of "this is a bad thing to be eliminated" but also "this is a better thing to be created" and "this is how I plan to get there."

Thinking well

28 Vaniver 01 April 2015 10:03PM

Many people want to know how to live well. Part of living well is thinking well, because if one thinks the wrong thoughts it is hard to do the right things to get the best ends.

We think a lot about how to think well, and one of the first things we thought about was how to not think well. Bad ways of thinking repeat in ways we can see coming, because we have looked at how people think and know more now about that than we used to.

But even if we know how other people think bad thoughts, that is not enough. We need to both accept that we can have bad ways of thinking and figure out how to have good ways of thinking instead.

The first is very hard on the heart, but is why we call this place "Less Wrong." If we had called it something like more right, it could have been about how we're more right than other people instead of more right than our past selves.

The second is very hard on the head. It is not just enough to study the bad ways of thinking and turn them around. There are many ways to be wrong, but only a few ways to be right. If you turn left all the way around, it will point right, but we want it to point up.

The heart of our approach has a few parts:


  1. We are okay with not knowing. Only once we know we don't know can we look. 
  2. We are okay with having been wrong. If we have wrong thoughts, the only way to have right thoughts is to let the wrong ones go. 
  3. We are quick to change our minds. We look at what is when we get the chance. 
  4. We are okay with the truth. Instead of trying to force it to be what we thought it was, we let it be what it is. 
  5. We talk with each other about the truth of everything. If one of us is wrong, we want the others to help them become less wrong. 
  6. We look at the world. We look at both the time before now and the time after now, because many ideas are only true if they agree with the time after now, and we can make changes to check those ideas. 
  7. We like when ideas are as simple as possible. 
  8. We make plans around being wrong. We look into the dark and ask what the world would look like if we were wrong, instead of just what the world would look like if we were right. 
  9. We understand that as we become less wrong, we see more things wrong. We try to fix all the wrong things, because as soon as we accept that something will always be wrong we can not move past that thing. 
  10. We try to be as close to the truth as possible. 
  11. We study as many things as we can. There is only one world, and to look at a part tells you a little about all the other parts. 
  12. We have a reason to do what we do. We do these things only because they help us, not because they are their own reason.


Concept Safety: Producing similar AI-human concept spaces

27 Kaj_Sotala 14 April 2015 08:39PM

I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.

A frequently-raised worry about AI is that it may reason in ways which are very different from us, and understand the world in a very alien manner. For example, Armstrong, Sandberg & Bostrom (2012) consider the possibility of restricting an AI via "rule-based motivational control" and programming it to follow restrictions like "stay within this lead box here", but they raise worries about the difficulty of rigorously defining "this lead box here". To address this, they go on to consider the possibility of making an AI internalize human concepts via feedback, with the AI being told whether or not some behavior is good or bad and then constructing a corresponding world-model based on that. The authors are however worried that this may fail, because

Humans seem quite adept at constructing the correct generalisations – most of us have correctly deduced what we should/should not be doing in general situations (whether or not we follow those rules). But humans share a common of genetic design, which the OAI would likely not have. Sharing, for instance, derives partially from genetic predisposition to reciprocal altruism: the OAI may not integrate the same concept as a human child would. Though reinforcement learning has a good track record, it is neither a panacea nor a guarantee that the OAIs generalisations agree with ours.

Addressing this, a possibility that I raised in Sotala (2015) was that possibly the concept-learning mechanisms in the human brain are actually relatively simple, and that we could replicate the human concept learning process by replicating those rules. I'll start this post by discussing a closely related hypothesis: that given a specific learning or reasoning task and a certain kind of data, there is an optimal way to organize the data that will naturally emerge. If this were the case, then AI and human reasoning might naturally tend to learn the same kinds of concepts, even if they were using very different mechanisms. Later on the post, I will discuss how one might try to verify that similar representations had in fact been learned, and how to set up a system to make them even more similar.

Word embedding

"Left panel shows vector offsets for three word pairs illustrating the gender relation. Right panel shows a different projection, and the singular/plural relation for two words. In high-dimensional space, multiple relations can be embedded for a single word." (Mikolov et al. 2013)A particularly fascinating branch of recent research relates to the learning of word embeddings, which are mappings of words to very high-dimensional vectors. It turns out that if you train a system on one of several kinds of tasks, such as being able to classify sentences as valid or invalid, this builds up a space of word vectors that reflects the relationships between the words. For example, there seems to be a male/female dimension to words, so that there's a "female vector" that we can add to the word "man" to get "woman" - or, equivalently, which we can subtract from "woman" to get "man". And it so happens (Mikolov, Yih & Zweig 2013) that we can also get from the word "king" to the word "queen" by adding the same vector to "king". In general, we can (roughly) get to the male/female version of any word vector by adding or subtracting this one difference vector!

Why would this happen? Well, a learner that needs to classify sentences as valid or invalid needs to classify the sentence "the king sat on his throne" as valid while classifying the sentence "the king sat on her throne" as invalid. So including a gender dimension on the built-up representation makes sense.

But gender isn't the only kind of relationship that gets reflected in the geometry of the word space. Here are a few more:

It turns out (Mikolov et al. 2013) that with the right kind of training mechanism, a lot of relationships that we're intuitively aware of become automatically learned and represented in the concept geometry. And like Olah (2014) comments:

It’s important to appreciate that all of these properties of W are side effects. We didn’t try to have similar words be close together. We didn’t try to have analogies encoded with difference vectors. All we tried to do was perform a simple task, like predicting whether a sentence was valid. These properties more or less popped out of the optimization process.

This seems to be a great strength of neural networks: they learn better ways to represent data, automatically. Representing data well, in turn, seems to be essential to success at many machine learning problems. Word embeddings are just a particularly striking example of learning a representation.

It gets even more interesting, for we can use these for translation. Since Olah has already written an excellent exposition of this, I'll just quote him:

We can learn to embed words from two different languages in a single, shared space. In this case, we learn to embed English and Mandarin Chinese words in the same space.

We train two word embeddings, Wen and Wzh in a manner similar to how we did above. However, we know that certain English words and Chinese words have similar meanings. So, we optimize for an additional property: words that we know are close translations should be close together.

Of course, we observe that the words we knew had similar meanings end up close together. Since we optimized for that, it’s not surprising. More interesting is that words we didn’t know were translations end up close together.

In light of our previous experiences with word embeddings, this may not seem too surprising. Word embeddings pull similar words together, so if an English and Chinese word we know to mean similar things are near each other, their synonyms will also end up near each other. We also know that things like gender differences tend to end up being represented with a constant difference vector. It seems like forcing enough points to line up should force these difference vectors to be the same in both the English and Chinese embeddings. A result of this would be that if we know that two male versions of words translate to each other, we should also get the female words to translate to each other.

Intuitively, it feels a bit like the two languages have a similar ‘shape’ and that by forcing them to line up at different points, they overlap and other points get pulled into the right positions.

After this, it gets even more interesting. Suppose you had this space of word vectors, and then you also had a system which translated images into vectors in the same space. If you have images of dogs, you put them near the word vector for dog. If you have images of Clippy you put them near word vector for "paperclip". And so on.

You do that, and then you take some class of images the image-classifier was never trained on, like images of cats. You ask it to place the cat-image somewhere in the vector space. Where does it end up? 

You guessed it: in the rough region of the "cat" words. Olah once more:

This was done by members of the Stanford group with only 8 known classes (and 2 unknown classes). The results are already quite impressive. But with so few known classes, there are very few points to interpolate the relationship between images and semantic space off of.

The Google group did a much larger version – instead of 8 categories, they used 1,000 – around the same time (Frome et al. (2013)) and has followed up with a new variation (Norouzi et al. (2014)). Both are based on a very powerful image classification model (from Krizehvsky et al. (2012)), but embed images into the word embedding space in different ways.

The results are impressive. While they may not get images of unknown classes to the precise vector representing that class, they are able to get to the right neighborhood. So, if you ask it to classify images of unknown classes and the classes are fairly different, it can distinguish between the different classes.

Even though I’ve never seen a Aesculapian snake or an Armadillo before, if you show me a picture of one and a picture of the other, I can tell you which is which because I have a general idea of what sort of animal is associated with each word. These networks can accomplish the same thing.

These algorithms made no attempt of being biologically realistic in any way. They didn't try classifying data the way the brain does it: they just tried classifying data using whatever worked. And it turned out that this was enough to start constructing a multimodal representation space where a lot of the relationships between entities were similar to the way humans understand the world.

How useful is this?

"Well, that's cool", you might now say. "But those word spaces were constructed from human linguistic data, for the purpose of predicting human sentences. Of course they're going to classify the world in the same way as humans do: they're basically learning the human representation of the world. That doesn't mean that an autonomously learning AI, with its own learning faculties and systems, is necessarily going to learn a similar internal representation, or to have similar concepts."

This is a fair criticism. But it is mildly suggestive of the possibility that an AI that was trained to understand the world via feedback from human operators would end up building a similar conceptual space. At least assuming that we chose the right learning algorithms.

When we train a language model to classify sentences by labeling some of them as valid and others as invalid, there's a hidden structure implicit in our answers: the structure of how we understand the world, and of how we think of the meaning of words. The language model extracts that hidden structure and begins to classify previously unseen things in terms of those implicit reasoning patterns. Similarly, if we gave an AI feedback about what kinds of actions counted as "leaving the box" and which ones didn't, there would be a certain way of viewing and conceptualizing the world implied by that feedback, one which the AI could learn.

Comparing representations

"Hmm, maaaaaaaaybe", is your skeptical answer. "But how would you ever know? Like, you can test the AI in your training situation, but how do you know that it's actually acquired a similar-enough representation and not something wildly off? And it's one thing to look at those vector spaces and claim that there are human-like relationships among the different items, but that's still a little hand-wavy. We don't actually know that the human brain does anything remotely similar to represent concepts."

Here we turn, for a moment, to neuroscience.

From Kaplan, Man & Greening (2015): "In this example, subjects either see or touch two classes of objects, apples and bananas. (A) First, a classifier is trained on the labeled patterns of neural activity evoked by seeing the two objects. (B) Next, the same classifier is given unlabeled data from when the subject touches the same objects and makes a prediction. If the classifier, which was trained on data from vision, can correctly identify the patterns evoked by touch, then we conclude that the representation is modality invariant."Multivariate Cross-Classification (MVCC) is a clever neuroscience methodology used for figuring out whether different neural representations of the same thing have something in common. For example, we may be interested in whether the visual and tactile representation of a banana have something in common.

We can test this by having several test subjects look at pictures of objects such as apples and bananas while sitting in a brain scanner. We then feed the scans of their brains into a machine learning classifier and teach it to distinguish between the neural activity of looking at an apple, versus the neural activity of looking at a banana. Next we have our test subjects (still sitting in the brain scanners) touch some bananas and apples, and ask our machine learning classifier to guess whether the resulting neural activity is the result of touching a banana or an apple. If the classifier - which has not been trained on the "touch" representations, only on the "sight" representations - manages to achieve a better-than-chance performance on this latter task, then we can conclude that the neural representation for e.g. "the sight of a banana" has something in common with the neural representation for "the touch of a banana".

A particularly fascinating experiment of this type is that of Shinkareva et al. (2011), who showed their test subjects both the written words for different tools and dwellings, and, separately, line-drawing images of the same tools and dwellings. A machine-learning classifier was both trained on image-evoked activity and made to predict word-evoked activity and vice versa, and achieved a high accuracy on category classification for both tasks. Even more interestingly, the representations seemed to be similar between subjects. Training the classifier on the word representations of all but one participant, and then having it classify the image representation of the left-out participant, also achieved a reliable (p<0.05) category classification for 8 out of 12 participants. This suggests a relatively similar concept space between humans of a similar background.

We can now hypothesize some ways of testing the similarity of the AI's concept space with that of humans. Possibly the most interesting one might be to develop a translation between a human's and an AI's internal representations of concepts. Take a human's neural activation when they're thinking of some concept, and then take the AI's internal activation when it is thinking of the same concept, and plot them in a shared space similar to the English-Mandarin translation. To what extent do the two concept geometries have similar shapes, allowing one to take a human's neural activation of the word "cat" to find the AI's internal representation of the word "cat"? To the extent that this is possible, one could probably establish that the two share highly similar concept systems.

One could also try to more explicitly optimize for such a similarity. For instance, one could train the AI to make predictions of different concepts, with the additional constraint that its internal representation must be such that a machine-learning classifier trained on a human's neural representations will correctly identify concept-clusters within the AI. This might force internal similarities on the representation beyond the ones that would already be formed from similarities in the data.

Next post in series: The problem of alien concepts.

Slate Star Codex: alternative comment threads on LessWrong?

27 tog 27 March 2015 09:05PM

Like many Less Wrong readers, I greatly enjoy Slate Star Codex; there's a large overlap in readership. However, the comments there are far worse, not worth reading for me. I think this is in part due to the lack of LW-style up and downvotes. Have there ever been discussion threads about SSC posts here on LW? What do people think of the idea occasionally having them? Does Scott himself have any views on this, and would he be OK with it?


The latest from Scott:

I'm fine with anyone who wants reposting things for comments on LW, except for posts where I specifically say otherwise or tag them with "things i will regret writing"

In this thread some have also argued for not posting the most hot-button political writings.

Would anyone be up for doing this? Ataxerxes started with "Extremism in Thought Experiments is No Vice"

Cooperative conversational threading

24 philh 15 April 2015 06:40PM

(Cross-posted from my blog.)

Sometimes at LW meetups, I'll want to raise a topic for discussion. But we're currently already talking about something, so I'll wait for a lull in the current conversation. But it feels like the duration of lull needed before I can bring up something totally unrelated, is longer than the duration of lull before someone else will bring up something marginally related. And so we can go for a long time, with the topic frequently changing incidentally, but without me ever having a chance to change it deliberately.

Which is fine. I shouldn't expect people to want to talk about something just because I want to talk about it, and it's not as if I find the actual conversation boring. But it's not necessarily optimal. People might in fact want to talk about the same thing as me, and following the path of least resistance in a conversation is unlikely to result in the best possible conversation.

At the last meetup I had two topics that I wanted to raise, and realized that I had no way of raising them, which was a third topic worth raising. So when an interruption occured in the middle of someone's thought - a new person arrived, and we did the "hi, welcome, join us" thing - I jumped in. "Before you start again, I have three things I'd like to talk about at some point, but not now. Carry on." Then he started again, and when that topic was reasonably well-trodden, he prompted me to transition.

Then someone else said that he also had two things he wanted to talk about, and could I just list my topics and then he'd list his? (It turns out that no I couldn't. You can't dangle an interesting train of thought in front of the London LW group and expect them not to follow it. But we did manage to initially discuss them only briefly.)

This worked pretty well. Someone more conversationally assertive than me might have been able to take advantage of a less solid interruption than the one I used. Someone less assertive might not have been able to use that one.

What else could we do to solve this problem?

Someone suggested a hand signal: if you think of something that you'd like to raise for discussion later, make the signal. I don't think this is ideal, because it's not continuous. You make it once, and then it would be easy for people to forget, or just to not notice.

I think what I'm going to do is bring some poker chips to the next meetup. I'll put a bunch in the middle, and if you have a topic that you want to raise at some future point, you take one and put it in front of you. Then if a topic seems to be dying out, someone can say "<person>, what did you want to talk about?"

I guess this still needs at least one person assertive enough to do that. I imagine it would be difficult for me. But the person who wants to raise the topic doesn't need to be assertive, they just need to grab a poker chip. It's a fairly obvious gesture, so probably people will notice, and it's easy to just look and see for a reminder of whether anyone wants to raise anything. (Assuming the table isn't too messy, which might be a problem.)

I don't know how well this will work, but it seems worth experimenting.

(I'll also take a moment to advocate another conversation-signal that we adopted, via CFAR. If someone says something and you want to tell people that you agree with them, instead of saying that out loud, you can just raise your hands a little and wiggle your fingers. Reduces interruptions, gives positive feedback to the speaker, and it's kind of fun.)

Future of Life Institute existential risk news site

21 Vika 19 March 2015 02:33PM

I'm excited to announce that the Future of Life Institute has just launched an existential risk news site!

The site will have regular articles on topics related to existential risk, written by journalists, and a community blog written by existential risk researchers from around the world as well as FLI volunteers. Enjoy!

Status - is it what we think it is?

20 Kaj_Sotala 30 March 2015 09:37PM

I was re-reading the chapter on status in Impro (excerpt), and I noticed that Johnstone seemed to be implying that different people are comfortable at different levels of status: some prefer being high status and others prefer being low status. I found this peculiar, because the prevailing notion in the rationalistsphere seems to be that everyone's constantly engaged in status games aiming to achieve higher status. I've even seen arguments to the effect that a true post-scarcity society is impossible, because status is zero-sum and there will always be people at the bottom of the status hierarchy.

But if some people preferred to have low status, this whole dilemma might be avoided, if a mix of statuses could be find that left everyone happy.

First question - is Johnstone's "status" talking about the same thing as our "status"? He famously claimed that "status is something you do, not something that you are", and that

I should really talk about dominance and submission, but I'd create a resistance. Students who will agree readily to raising or lowering their status may object if asked to 'dominate' or 'submit'.

Viewed via this lens, it makes sense that some people would prefer being in a low status role: if you try to take control of the group, you become subject to various status challenges, and may be held responsible for the decisions you make. It's often easier to remain low status and let others make the decisions.

But there's still something odd about saying that one would "prefer to be low status", at least in the sense in which we usually use the term. Intuitively, a person may be happy being low status in the sense of not being dominant, but most people are still likely to desire something that feels kind of like status in order to be happy. Something like respect, and the feeling that others like them. And a lot of the classical "status-seeking behaviors" seem to be about securing the respect of others. In that sense, there seems to be something intuitive true in the "everyone is engaged in status games and wants to be higher-status" claim.

So I think that there are two different things that we call "status" which are related, but worth distinguishing.

1) General respect and liking. This is "something you have", and is not inherently zero-sum. You can achieve it by doing things that are zero-sum, like being the best fan fiction writer in the country, but you can also do it by things like being considered generally friendly and pleasant to be around. One of the lessons that I picked up from The Charisma Myth was that you can be likable by just being interested in the other person and displaying body language that signals your interest in the other person.

Basically, this is "do other people get warm fuzzies from being around you / hearing about you / consuming your work", and is not zero-sum because e.g. two people who both have great social skills and show interest in you can both produce the same amount of warm fuzzies, independent of each other's existence.

But again, specific sources of this can be zero-sum: if you respect someone a lot for their art, but then run across into even better art and realize that the person you previously admired is pretty poor in comparison, that can reduce the respect you feel for them. It's just that there are also other sources of liking which aren't necessarily zero-sum.

2) Dominance and control of the group. It's inherently zero-sum because at most one person can have absolute say on the decisions of the group. This is "something you do": having the respect and liking of the people in the group (see above) makes it easier for you to assert dominance and makes the others more willing to let you do so, but you can also voluntarily abstain from using that power and leave the decisions to others. (Interestingly, in some cases this can even increase the extent to which you are liked, which translates to a further boost in the ability to control the group, if you so desired.)


Morendil and I previously suggested a definition of status as "the general purpose ability to influence a group", but I think that definition was somewhat off in conflating the two senses above.

I've always had the vague feeling that the "everyone can't always be happy because status is zero-sum" claim felt off in some sense that I was unable to properly articulate, but this seems to resolve the issue. If this model were true, it would also make me happy, because it would imply that we can avoid zero-sum status fights while still making everybody content.

What have we learned from meetups?

17 sixes_and_sevens 30 March 2015 01:27PM

We've been running regular, well-attended Less Wrong meetups in London for a few years now, (and irregular, badly-attended ones for even longer than that). In this time, I'd like to think we've learned a few things about having good conversations, but there are probably plenty of areas where we could make gains. Given the number of Less Wrong meetups around the world, it's worth attempting some sort of meetup cross-pollination. It's possible that we've all been solving each other's problems. It's also good to have a central location to make observations and queries about topics of interest, and it's likely people have such observations and queries on this topic.

So, what have you learned from attending or running Less Wrong meetups? Here are a few questions to get the ball rolling:


  • What do you suppose are the dominant positive outcomes of your meetups?
  • What problems do you encounter with discussions involving [x] people? How have you attempted to remedy them?
  • Do you have any systems or procedures in place for making sure people are having the sorts of conversations they want to have?
  • Have you developed or consciously adopted any non-mainstream social norms, taboos or rituals? How are those working out?
  • How do Less Wrong meetups differ from other similar gatherings you've been involved with? Are there any special needs idiosyncratic to this demographic?
  • Are there any activities that you've found work particularly well or particularly poorly for meetups? Do you have examples of runaway successes or spectacular failures?
  • Are there any activities you'd like to try, but haven't managed to pull off yet? What's stopping you?


If you have other specific questions you'd like answered, you're encouraged to ask them in comments. Any other observations, anecdotes or suggestions on this general topic are also welcome and encouraged.

Postdoctoral research positions at CSER (Cambridge, UK)

17 Sean_o_h 26 March 2015 05:59PM

[To be cross-posted at Effective Altruism Forum, FLI news page]

I'm delighted to announce that the Centre for the Study of Existential Risk has had considerable recent success in grantwriting and fundraising, among other activities (full update coming shortly). As a result, we are now in a position to advance to CSER's next stage of development: full research operations. Over the course of this year, we will be recruiting for a full team of postdoctoral researchers to work on a combination of general methodologies for extreme technological (and existential) risk analysis and mitigation, alongside specific technology/risk-specific projects.

Our first round of recruitment has just opened - we will be aiming to hire up to 4 postdoctoral researchers; details below. A second recruitment round will take place in the Autumn. We have a slightly unusual opportunity in that we get to cast our net reasonably wide. We have a number of planned research projects (listed below) that we hope to recruit for. However, we also have the flexibility to hire one or more postdoctoral researchers to work on additional projects relevant to CSER's aims. Information about CSER's aims and core research areas is available on our website. We request that as part of the application process potential postholders send us a research proposal of no more than 1500 words, explaining what your research skills could contribute to CSER. At this point in time, we are looking for people who will have obtained a doctorate in a relevant discipline by their start date.

We would also humbly ask that the LessWrong community aid us in spreading the word far and wide about these positions. There are many brilliant people working within the existential risk community. However, there are academic disciplines and communities that have had less exposure to existential risk as a research priority than others (due to founder effect and other factors), but where there may be people with very relevant skills and great insights. With new centres and new positions becoming available, we have a wonderful opportunity to grow the field, and to embed existential risk as a crucial consideration in all relevant fields and disciplines.

Thanks very much,

Seán Ó hÉigeartaigh (Executive Director, CSER)


"The Centre for the Study of Existential Risk (University of Cambridge, UK) is recruiting for to four full-time postdoctoral research associates to work on the project Towards a Science of Extreme Technological Risk.

We are looking for outstanding and highly-committed researchers, interested in working as part of growing research community, with research projects relevant to any aspect of the project. We invite applicants to explain their project to us, and to demonstrate their commitment to the study of extreme technological risks.

We have several shovel-ready projects for which we are looking for suitable postdoctoral researchers. These include:

  • Ethics and evaluation of extreme technological risk (ETR) (with Sir Partha Dasgupta;
  • Horizon-scanning and foresight for extreme technological risks (with Professor William Sutherland);
  • Responsible innovation and extreme technological risk (with Dr Robert Doubleday and the Centre for Science and Policy).

However, recruitment will not necessarily be limited to these subprojects, and our main selection criterion is suitability of candidates and their proposed research projects to CSER’s broad aims.

Details are available here. Closing date: April 24th."

Summary and Lessons from "On Combat"

17 Gunnar_Zarncke 22 March 2015 01:48AM

On Combat - The Psychology and hysiology of Deadly Conflict in War and in Peace by Lt. Col. Dave Grossman and Loren W. Christensen (third edition from 2007) is a well-written, evidence-based book about the reality of human behaviour in life-threatening situations. It is comprehensive (400 pages), provides detailed descriptions, (some) statistics as well as first-person recounts, historical context and other relevant information. But my main focus in this post is in the advice it gives and what lessons the LessWrong community may take from it.


In deadly force encounters you will experience and remember the most unusual physiological and psychological things. Innoculate yourself against extreme stress with repeated authentic training; play win-only paintball, train 911-dialing and -reporting. Train combat breathing. Talk to people after traumatic events.

continue reading »

How I changed my exercise habits

16 Normal_Anomaly 13 April 2015 10:19PM

In June 2013, I didn’t do any exercise beyond biking the 15 minutes to work and back. Now, I have a robust habit of hitting the gym every day, doing cardio and strength training. Here are the techniques I used to do get from not having the habit to having it, some of them common wisdom and some of them my own ideas. Consider this post a case study/anecdata in what worked for me. Note: I wrote these ideas down around August 2013 but didn’t post them, so my memory was fresh at the time of writing.

1. Have a specific goal. Ideally this goal should be reasonably achievable and something you can see progress toward over medium timescales. I initially started exercising because I wanted more upper body strength to be better at climbing. My goal is “become able to do at least one pull up, or more if possible”.

Why it works: if you have a specific goal instead of a vague feeling that you ought to do something or that it’s what a virtuous person would do, it’s harder to make excuses. Skipping work with an excuse will let you continue to think of yourself as virtuous, but it won’t help with your goal. For this to work, your goal needs to be something you actually want, rather than a stand-in for “I want to be virtuous.” If you can’t think of a consequence of your intended habit that you actually want, the habit may not be worth your time.

2. Have a no-excuses minimum. This is probably the best technique I’ve discovered. Every day, with no excuses, I went to the gym and did fifty pull-downs on one of the machines. After that’s done, I can do as much or as little else as I want. Some days I would do equivalent amounts of three other exercises, some days I would do an extra five reps and that’s it.

Why it works: this one has a host of benefits.

* It provides a sense of freedom: once I’m done with my minimum, I have a lot of choice about what and how much to do. That way it feels less like something I’m being forced into.

* If I’m feeling especially tired or feel like I deserve a day off, instead of skipping a day and breaking the habit I tell myself I’ll just do the minimum instead. Often once I get there I end up doing more than the minimum anyway, because the real thing I wanted to skip was the inconvenience of biking to the gym.

3. If you raise the minimum, do it slowly. I have sometimes raised the bar on what’s the minimum amount of exercise I have to do, but never to as much or more than I was already doing routinely. If you start suddenly forcing yourself to do more than you were already doing, the change will be much harder and less likely to stick than gradually ratcheting up your commitment.

3. Don’t fall into a guilt trap. Avoid associating guilt with doing the minimum, or even with missing a day.

Why it works: feeling guilty will make thinking of the habit unpleasant, and you’ll downplay how much you care about it to avoid the cognitive dissonance. Especially, if you only do the minimum, tell yourself “I did everything I committed to do.” Then when you do more than the minimum, feel good about it! You went above and beyond. This way, doing what you committed to will sometimes include positive reinforcement, but never negative reinforcement.

4. Use Timeless Decision Theory and consistency pressure. Credit for this one goes to this post by user zvi. When I contemplate skipping a day at the gym, I remember that I’ll be facing the same choice under nearly the same conditions many times in the future. If I skip my workout today, what reason do I have to believe that I won’t skip it tomorrow?

Why it works: Even when the benefits of one day’s worth of exercise don’t seem like enough motivation, I know my entire habit that I’ve worked to cultivate is at stake. I know that the more days I go to the gym the more I will see myself as a person who goes to the gym, and the more it will become my default action.

5. Evaluate your excuses. If I have what I think is a reasonable excuse, I consider how often I’ll skip the gym if I let myself skip it whenever I have that good of an excuse. If letting the excuse hold would make me use it often, I ignore it.

Why it works: I based this technique on this LW post

6. Tell people about it. The first thing I did when I made my resolution to start hitting the gym was telling a friend whose opinion I cared about. I also made a comment on LW saying I would make a post about my attempt at forming a habit, whether it succeeded or failed. (I wrote the post and forgot to post it for over a year, but so it goes.)

Why it works: Telling people about your commitment invests your reputation in it. If you risk being embarrassed if you fail, you have an extra motivation to succeed.

I expect these techniques can be generalized to work for many desirable habits: eating healthy, spending time on social interaction; writing, coding, or working on a long-term project; being outside getting fresh air, etc.

Negative visualization, radical acceptance and stoicism

16 Vika 27 March 2015 03:51AM

In anxious, frustrating or aversive situations, I find it helpful to visualize the worst case that I fear might happen, and try to accept it. I call this “radical acceptance”, since the imagined worst case is usually an unrealistic scenario that would be extremely unlikely to happen, e.g. “suppose I get absolutely nothing done in the next month”. This is essentially the negative visualization component of stoicism. There are many benefits to visualizing the worst case:

  • Feeling better about the present situation by contrast.
  • Turning attention to the good things that would still be in my life even if everything went wrong in one particular domain.
  • Weakening anxiety using humor (by imagining an exaggerated “doomsday” scenario).
  • Being more prepared for failure, and making contingency plans (pre-hindsight).
  • Helping make more accurate predictions about the future by reducing the “X isn’t allowed to happen” effect (or, as Anna Salamon once put it, “putting X into the realm of the thinkable”).
  • Reducing the effect of ugh fields / aversions, which thrive on the “X isn’t allowed to happen” flinch.
  • Weakening unhelpful identities like “person who is always productive” or “person who doesn’t make stupid mistakes”.

Let’s say I have an aversion around meetings with my advisor, because I expect him to be disappointed with my research progress. When I notice myself worrying about the next meeting or finding excuses to postpone it so that I have more time to make progress, I can imagine the worst imaginable outcome a meeting with my advisor could have - perhaps he might yell at me or even decide to expel me from grad school (neither of these have actually happened so far). If the scenario is starting to sound silly, that’s a good sign. I can then imagine how this plays out in great detail, from the disappointed faces and words of the rest of the department to the official letter of dismissal in my hands, and consider what I might do in that case, like applying for industry jobs. While building up these layers of detail in my mind, I breathe deeply, which I associate with meditative acceptance of reality. (I use the word “acceptance” to mean “acknowledgement” rather than “resignation”.)

I am trying to use this technique more often, both in the regular and situational sense. A good default time is my daily meditation practice. I might also set up a trigger-action habit of the form “if I notice myself repeatedly worrying about something, visualize that thing (or an exaggerated version of it) happening, and try to accept it”. Some issues have more natural triggers than others - while worrying tends to call attention to itself, aversions often manifest as a quick flinch away from a thought, so it’s better to find a trigger among the actions that are often caused by an aversion, e.g. procrastination. A trigger for a potentially unhelpful identity could be a thought like “I’m not good at X, but I should be”. A particular issue can simultaneously have associated worries (e.g. “will I be productive enough?”), aversions (e.g. towards working on the project) and identities (“productive person”), so there is likely to be something there that makes a good trigger. Visualizing myself getting nothing done for a month can help with all of these to some degree.

System 1 is good at imagining scary things - why not use this as a tool?


Book Review: Discrete Mathematics and Its Applications

15 LawrenceC 14 April 2015 09:08AM

Following in the path of So8res and others, I’ve decided to work my way through the textbooks on the MIRI Research Guide. I’ve been working my way through the guide since last October, but this is my first review. I plan on following up this review with reviews of Enderton’s A Mathematical Introduction to Logic and Sipser’s Introduction to the Theory of Computation. Hopefully these reviews will be of some use to you.

Discrete Mathematics and Its Applications

Discrete Mathematics and Its Applications is wonderful, gentle introduction to the math needed to understand most of the other books on the MIRI course list. It successfully pulls off a colloquial tone of voice. It spends a lot of time motivating concepts; it also contains a lot of interesting trivia and short biographies of famous mathematicians and computer scientists (which the textbook calls “links”). Additionally, the book provides a lot of examples for each of its theorems and topics. It also fleshes out the key subjects (counting, proofs, graphs, etc.) while also providing a high level overview of their applications. These combine to make it an excellent first textbook for learning discrete mathematics.

However, for much the same reasons, I would not recommend it nearly as much if you’ve taken a discrete math course. People who’ve participated in math competitions at the high school level probably won’t get much out of the textbooks either. Even though I went in with only the discrete math I did in high school, I still got quite frustrated at times because of how long the book would take to get to the point. Discrete Mathematics is intended to be quite introductory and it succeeds in this goal, but it probably won’t be very suitable as anything other than review for readers beyond the introductory level. The sole exception is the last chapter (on models of computation), but I recommend picking up a more comprehensive overview from Sipser’s Theory of Computation instead.

I still highly recommend it for those not familiar with the topics covered in the book. I’ve summarized the contents of the textbook below:


1.     The Foundations: Logic and Proofs

2.   Basic Structures: Sets, Functions, Sequences, Sums, and Matrices

3.     Algorithms

4.     Number Theory and Cryptography

5.     Induction and Recursion

6.     Counting

7.     Discrete Probability

8.     Advanced Counting Techniques

9.     Relations

10.  Graphs

11.  Trees

12.  Boolean Algebra

13.  Modeling Computation

The Foundations: Logic and Proofs

This chapter introduces propositional (sentential) logic, predicate logic, and proof theory at a very introductory level. It starts by introducing the propositions of propositional logic (!), then goes on to introduce applications of propositional logic, such as logic puzzles and logic circuits. It then goes on to introduce the idea of logical equivalence between sentences of propositional logic, before introducing quantifiers and predicate logic and its rules of inference. It then ends by talking about the different kinds of proofs one is likely to encounter – direct proofs via repeated modus ponens, proofs by contradiction, proof by cases, and constructive and non-constructive existence proofs.

This chapter illustrates exactly why this book is excellent as an introductory text. It doesn’t just introduce the terms, theorems, and definitions; it motivates them by giving applications. For example, it explains the need for predicate logic by pointing out that there are inferences that can’t be drawn using only propositional logic. Additionally, it also explains the common pitfalls for the different proof methods that it introduces.

Basic Structures: Sets, Functions, Sequences, Sums, and Matrices

This chapter introduces the different objects one is likely to encounter in discrete mathematics. Most of it seemed pretty standard, with the following exceptions: functions are introduced without reference to relations; the “cardinality of sets” section provides a high level overview of a lot of set theory; and the matrices section introduces zero-one matrices, which are used in the chapters on relations and graphs.


This chapter presents … surprise, surprise… algorithms! It starts by introducing the notion of algorithms, and gives a few examples of simple algorithms. It then spends a page introducing the halting problem and showing its undecidability. (!) Afterwards, it introduces big-o, big-omega, and big-theta notation and then gives a (very informal) treatment of a portion of computation complexity theory. It's quite unusual to see algorithms being dealt with so early into a discrete math course, but it's quite important because the author starts providing examples of algorithms in almost every chapter after this one.

Number Theory and Cryptography

This section goes from simple modular arithmetic (3 divides 12!) to RSA, which I found extremely impressive. (Admittedly, I’ve only ever read one other discrete math textbook.) After introducing the notion of divisibility, the textbook takes the reader on a rapid tour through base-n notation, the fundamental theorem of arithmetic, the infinitude of primes, the Euclidean GCD algorithm, Bezout’s theorem, the Chinese remainder theorem, Fermat’s little theorem, and other key results of number theory. It then gives several applications of number theory: hash functions, pseudorandom numbers, check digits, and cryptography. The last of these gets its own section, and the book spends a large amount of it introducing RSA and its applications.

Induction and Recursion

This chapter introduces mathematical induction and recursion, two extremely important concepts in computer science. Proofs by mathematical induction, basically, are proofs that show that a property is true of the first natural number (positive integer in this book), and if it is true of an integer k it is true of k+1. With these two results, we can conclude that the property is true of all natural numbers (positive integers). The book then goes on to introduce strong induction and recursively defined functions and sets. From this, the book then goes on to introduce the concept of structural induction, which is a generalization of induction to work on recursively-defined sets. Then, the book introduces recursive algorithms, most notably the merge sort, before giving a high level overview of program verification techniques.


The book now changes subjects to talk about basic counting techniques, such as the product rule and the sum rule, before (interestingly) moving on to the pigeonhole principle. It then moves on to permutations and combinations, while introducing the notion of combinatorial proof, which is when we show that two sides of the identity count the same things but in different ways, or that there exists a bijection between the sets being counted on either side. The textbook then introduces binomial coefficients, Pascal’s triangle, and permutations/combinations with repetition. Finally, it gives algorithms that generate all the permutations and combinations of a set of n objects.

Compared to other sections, I feel that a higher proportion of readers would be familiar with the results of this chapter and the one on discrete probability that follows it. Other than the last section, which I found quite interesting but not particularly useful, I felt like I barely got anything from the chapter.

Discrete Probability

In this section the book covers probability, a topic that most of LessWrong should be quite familiar with. Like most introductory textbooks, it begins by introducing the notion of sample spaces and events as sets, before defining probability of an event E as the ratio of the cardinality of E to the cardinality of S. We are then introduced to other key concepts in probability theory: conditional probabilities, independence, and random variables, for example. The textbook takes care to flesh out this section with a discussion about the Birthday Problem and Monte Carlo algorithms. Afterwards, we are treated to a section on Bayes theorem, with the canonical example of disease testing for rare diseases and a less-canonical-but-still-used-quite-a-lot example of Naïve Bayes spam filters. The chapter concludes by introducing the expected value and variances of random variables, as well as a lot of key results (linearity of expectations and Chebyshev’s Inequality, to list two). Again, aside from the applications, most of this stuff is quite basic.

Advanced Counting Techniques

This chapter, though titled “advanced counting techniques”, is really just about recurrences and the principle of inclusion-exclusion. As you can tell by the length of this section, I found this chapter quite helpful nevertheless.  

We begin by giving three applications of recurrences: Fibonacci’s “rabbit problem”, the Tower of Hanoi, and dynamic programming. We’re then shown how to solve linear homogenous relations, which are relations of the form

an = c1 an-1 + c2 an-2 + … + ck an-k+ F(n)

Where c1, c2, …, ck are constants, ck =/= 0, and F(n) is a function of n. The solutions are quite beautiful, and if you’re not familiar with them I recommend looking them up. Afterwards, we’re introduced to divide-and-conquer algorithms, which are recursive algorithms that solve smaller and smaller instances of the problem, as well as the master method for solving the recurrences associated with them, which tend to be of the form

f(n) = a f(n/b) + cnd

After these algorithms, we’re introduced to generating functions, which are yet another way of solving recurrences.

Finally, after a long trip through various recurrence-solving methods, the textbook introduces the principle of inclusion-exclusion, which lets us figure out how many elements are in the union of a finite number of finite sets.


Finally, 7 chapters after the textbook talks about functions, it finally gets to relations. Relations are defined as sets of n-tuples, but the book also gives alternative ways of representing relations, such as matrices and directed graphs for binary relations. We’re then introduced to transitive closures and Warshall’s algorithm for computing the transitive closure of a relation. We conclude with two special types of relations: equivalence relations, which are reflexive, symmetric, and transitive; and partial orderings, which are reflexive, anti-symmetric, and transitive.


After being first introduced to directed graphs as a way of representing relations in the previous chapter, we’re given a much more fleshed out treatment in this chapter. A graph is defined as a set of vertices and a set of edges connecting them. Edges can be directed or undirected, and graphs can be simple graphs (with no two edges connecting the same pair of vertices) or multigraphs, which contain multiple edges connecting the same pair of vertices. We’re then given a ton of terminology related to graphs, and a lot of theorems related to these terms. The treatment of graphs is quite advanced for an introductory textbook – it covers Dijkstra’s algorithm for shortest paths, for example, and ends with four coloring. I found this chapter to be a useful review of a lot of graph theory.


After dealing with graphs, we move on to trees, or connected graphs that don’t have cycles. The textbook gives a lot of examples of applications of trees, such as binary search trees, decision trees, and Huffman coding. We’re then presented with the three ways of traversing a tree – in-order, pre-order, and post-order. Afterwards, we get to the topic of spanning trees of graphs, which are trees that contain every vertex in the graph. Two algorithms are presented for finding spanning trees – depth first search and breadth first search. The chapter ends with a section on minimum spanning trees, which are spanning trees with the least weight. Once again we’re presented with two algorithms for finding minimum spanning trees: Prim’s Algorithm and Kruskal’s algorithm. Having never seen either of these algorithms before, I found this section to be quite interesting, though they are given a more comprehensive treatment in most introductory algorithms textbooks.  

Boolean Algebra

This section introduces Boolean algebra, which is basically a set of rules for manipulating elements of the set {0,1}. Why is this useful? Because, as it turns out, Boolean algebra is directly related to circuit design! The textbook first introduces the terminology and rules of Boolean algebra, and then moves on to circuits of logic gates and their relationship with Boolean functions. We conclude with two ways to minimize the complexity of Boolean functions (and thus circuits) – Karnaugh Maps and the Quine-McCluskey Method, which are both quite interesting. 

Modeling Computation

This is the chapter of Rosen that I’m pretty sure isn’t covered by most introductory textbooks. In many ways, it’s an extremely condensed version of the first couple chapters of a theory of computation textbook. It covers phase structure grammars, finite state machines, and closes with Turing machines. However, I found this chapter a lot more poorly motivated than the rest of the book, and also that Sipser’s Introduction to the Theory of Computation offers a lot better introduction to these topics.

Who should read this?

If you’re not familiar with discrete mathematics, this is a great book that will get you up to speed on the key concepts, at least to the level where you’ll be able to understand the other textbooks on MIRI’s course list. Of the three textbooks I’m familiar with that cover discrete mathematics, I think that Rosen is hands down the best. I also think it’s quite a “fun” textbook to skim through, even if you’re familiar with some of the topics already.

However, I think that people familiar with the topics probably should look for other books, especially if they are looking for textbooks that are more concise. It might also not be suitable if you’re already really motivated to learn the subject, and just want to jump right in. There are a few topics not normally covered in other discrete math textbooks, but I feel that it’s better to pick up those topics in other textbooks.

What should I read?

In general, the rule for the textbook is: read the sections you’re not familiar with, and skim the sections you are familiar with, just to keep an eye out for cool examples or theorems.

In terms of chapter-by-chapter, chapters 1 and 2 seem like they’ll help if you’re new to mathematics or proofs, but probably can be skipped otherwise. Chapter 3 is pretty good to know in general, though I suspect most people here would find it too easy. Chapters 4 through 12 are what most courses on discrete mathematics seem to cover, and form the bulk of the book – I would recommend skimming them once just to make sure you know them, as they’re also quite important for understanding any serious CS textbook. Chapter 13, on the other hand, seems kind of tacked on, and probably should be picked up in other textbooks.

Final Notes

Of all the books on the MIRI research guide, this is probably the most accessible, but it is by no means a bad book. I’d highly recommend it to anyone who hasn’t had any exposure to discrete mathematics, and I think it’s an important prerequisite for the rest of the books on the MIRI research guide.

How has lesswrong changed your life?

15 mstevens 31 March 2015 10:12PM

I've been wondering what effect joining lesswrong and reading the sequences has on people.

How has lesswrong changed your life?

What have you done differently?

What have you done?

Discussion of Slate Star Codex: "Extremism in Thought Experiments is No Vice"

14 Artaxerxes 28 March 2015 09:17AM

Link to Blog Post: "Extremism in Thought Experiments is No Vice"


Phil Robertson is being criticized for a thought experiment in which an atheist’s family is raped and murdered. On a talk show, he accused atheists of believing that there was no such thing as objective right or wrong, then continued:

I’ll make a bet with you. Two guys break into an atheist’s home. He has a little atheist wife and two little atheist daughters. Two guys break into his home and tie him up in a chair and gag him.

Then they take his two daughters in front of him and rape both of them and then shoot them, and they take his wife and then decapitate her head off in front of him, and then they can look at him and say, ‘Isn’t it great that I don’t have to worry about being judged? Isn’t it great that there’s nothing wrong with this? There’s no right or wrong, now, is it dude?’

Then you take a sharp knife and take his manhood and hold it in front of him and say, ‘Wouldn’t it be something if [there] was something wrong with this? But you’re the one who says there is no God, there’s no right, there’s no wrong, so we’re just having fun. We’re sick in the head, have a nice day.’

If it happened to them, they probably would say, ‘Something about this just ain’t right’.

The media has completely proportionally described this as Robinson “fantasizing about” raping atheists, and there are the usual calls for him to apologize/get fired/be beheaded.

So let me use whatever credibility I have as a guy with a philosophy degree to confirm that Phil Robertson is doing moral philosophy exactly right.


This is a LW discussion post for Yvain's blog posts at Slate Star Codex, as per tog's suggestion:

Like many Less Wrong readers, I greatly enjoy Slate Star Codex; there's a large overlap in readership. However, the comments there are far worse, not worth reading for me. I think this is in part due to the lack of LW-style up and downvotes. Have there ever been discussion threads about SSC posts here on LW? What do people think of the idea occasionally having them? Does Scott himself have any views on this, and would he be OK with it?

Scott/Yvain's permission to repost on LW was granted (from facebook):

I'm fine with anyone who wants reposting things for comments on LW, except for posts where I specifically say otherwise or tag them with "things i will regret writing"

Translating bad advice

13 Sophronius 14 April 2015 09:20AM

While writing my Magnum Opus I came across this piece of writing advice by Neil Gaiman:

“When people tell you something’s wrong or doesn’t work for them, they are almost always right. When they tell you exactly what they think is wrong and how to fix it, they are almost always wrong.”

And it struck me how true it was, even in other areas of life. People are terrible at giving advice on how to improve yourself, or on how to improve anything really. To illustrate this, here is what you would expect advice from a good rationalist friend to look like:

1)      “Hey, I’ve noticed you tend to do X.”

2)      “It’s been bugging me for a while, though I’m not really sure why. It’s possible other people think X is bad as well, you should ask them about it.”

3)      Paragon option: “Maybe you could do Y instead? I dunno, just think about it.”  

4)      Renegade option: “From now on I will slap you every time you do X, in order to help you stop being retarded about X.”

I wish I had more friends who gave advice like that, especially the renegade option. Instead, here is what I get in practice:

1)      Thinking: Argh, he is doing X again. That annoys me, but I don’t want to be rude.

2)      Thinking: Okay, he is doing Z now, which is kind of like X and a good enough excuse to vent my anger about X

3)      *Complains about Z in an irritated manner, and immediately forgets that there’s even a difference between X and Z*

4)      Thinking: Oh shit, that was rude. I better give some arbitrary advice on how to fix Z so I sound more productive.

As you can see, social rules and poor epistemology really get in the way of good advice, which is incredibly frustrating if you genuinely want to improve yourself! (Needless to say, ignoring badly phrased advice is incredibly stupid and you should never do this. See HPMOR for a fictional example of what happens if you try to survive on your wits alone.) A naïve solution is to tell everybody that you are the sort of person who loves to hear criticism in the hope that they will tell you what they really think. This never works because A) Nobody will believe you since everyone says this and it’s always a lie, and B) It’s a lie, you hate hearing real criticism just like everybody else.

The best solution I have found is to make it a habit to translate bad advice into good advice, in the spirit of what Neil Gaiman said above: Always be on the lookout for people giving subtle clues that you are doing something wrong and ask them about it (preferably without making yourself sound insecure in the process, or they’ll just tell you that you need to be more confident). When they give you some bullshit response that is designed to sound nice, keep at it and convince them to give you their real reasons for bringing it up in the first place. Once you have recovered the original information that lead them to give the poor advice, you can rewrite it as good advice in the format used above. Here is an example from my own work experience:

1)      Bad advice person: “You know, you may have your truth, but someone else may have their own truth.”

2)      Me, confused and trying not to be angry at bad epistemology: “That’s interesting. What makes you say that?”

3)      *5 minutes later*. “Holy shit, my insecurity is being read as arrogance, and as a result people feel threatened by my intelligence which makes them defensive? I never knew that!”

Seriously, apply this lesson. And get a good friend to slap you every time you don’t.

Request for Steelman: Non-correspondence concepts of truth

13 PeerGynt 24 March 2015 03:11AM

A couple of days ago, Buybuydandavis wrote the following on Less Wrong:

I'm increasingly of the opinion that truth as correspondence to reality is a minority orientation.

I've spent a lot of energy over the last couple of days trying to come to terms with the implications of this sentence.  While it certainly corresponds with my own observations about many people, the thought that most humans simply reject correspondence to reality as the criterion for truth seems almost too outrageous to take seriously.  If upon further reflection I end up truly believing this, it seems  that it would be impossible for me to have a discussion about the nature of reality with the great majority of the human race.  In other words, if I truly believed this, I would label most people as being too stupid to have a real discussion with. 

However, this reaction seems like an instance of a failure mode described by Megan McArdle:

I’m always fascinated by the number of people who proudly build columns, tweets, blog posts or Facebook posts around the same core statement: “I don’t understand how anyone could (oppose legal abortion/support a carbon tax/sympathize with the Palestinians over the Israelis/want to privatize Social Security/insert your pet issue here)." It’s such an interesting statement, because it has three layers of meaning.

The first layer is the literal meaning of the words: I lack the knowledge and understanding to figure this out. But the second, intended meaning is the opposite: I am such a superior moral being that I cannot even imagine the cognitive errors or moral turpitude that could lead someone to such obviously wrong conclusions. And yet, the third, true meaning is actually more like the first: I lack the empathy, moral imagination or analytical skills to attempt even a basic understanding of the people who disagree with me

 In short, “I’m stupid.” Something that few people would ever post so starkly on their Facebook feeds.

With this background, it seems important to improve my model of people who reject correspondence as the criterion for truth.  The obvious first place to look is in academic philosophy.  The primary challenger to correspondence theory is called “coherence theory”. If I understand correctly, coherence theory says that a statement is true iff it is logically consistent with “some specified set of sentences”

Coherence is obviously an important concept, which has valuable uses for example in formal systems. It does not capture my idea of what the word “truth” means, but that is purely a semantics issue. I would be willing to cede the word “truth” to the coherence camp if we agreed on a separate word we could use to mean “correspondence to reality”.   However, my intuition is that they wouldn't let us to get away with this. I sense that there are people out there who genuinely object to the very idea of discussing whether a sentences correspond to reality. 


So it seems I have a couple of options:

1. I can look for empirical evidence that buybuydandavis is wrong, ie that most people accept correspondence to reality as the criterion for truth

2. I can try to convince people to use some other word for correspondence to reality, so they have the necessary semantic machinery to have a real discussion about what reality is like

3. I can accept that most people are unable to have a discussion about the nature of reality

4. I can attempt to steelman the position that truth is something other than correspondence


Option 1 appears unlikely to be true. Option 2 seems unlikely to work.  Option 3 seems very unattractive, because it would be very uncomfortable to have discussions that on the surface appear to be about the nature of reality, but which really are about something else, where the precise value of "something else" is unknown to me. 

I would therefore be very interested in a steelman of non-correspondence concepts of truth. I think it would be important not only for me, but also for the rationalist community as a group, to get a more accurate model of how non-rationalists think about "truth"



Even when contrarians win, they lose: Jeff Hawkins

12 endoself 08 April 2015 04:54AM

Related: Even When Contrarians Win, They Lose

I had long thought that Jeff Hawkins (and the Redwood Center, and Numentia) were pursuing an idea that didn't work, and were continuing to fail to give up for a prolonged period of time. I formed this belief because I had not heard of any impressive results or endorsements of their research. However, I recently read an interview with Andrew Ng, a leading machine learning researcher, in which he credits Jeff Hawkins with publicizing the "one learning algorithm" hypothesis - the idea that most of the cognitive work of the brain is done by one algorithm. Ng says that, as a young researcher, this pushed him into areas that could lead to general AI. He still believes that AGI is far though.

I found out about Hawkins' influence on Ng after reading an old SL4 post by Eliezer and looking for further information about Jeff Hawkins. It seems that the "one learning algorithm" hypothesis was widely known in neuroscience, but not within AI until Hawkins' work. Based on Eliezer's citation of Mountcastle and his known familiarity with cognitive science, it seems that he learned of this hypothesis independently of Hawkins. The "one learning algorithm" hypothesis is important in the context of intelligence explosion forecasting, since hard takeoff is vastly more likely if it is true. I have been told that further evidence for this hypothesis has been found recently, but I don't know the details.

This all fits well with Robin Hanson's model. Hawkins had good evidence that better machine learning should be possible, but the particular approaches that he took didn't perform as well as less biologically-inspired ones, so he's not really recognized today. Deep learning would definitely have happened without him; there were already many people working in the field, and they started to attract attention because of improved performance due to a few tricks and better hardware. At least Ng's career though can be credited to Hawkins.

I've been thinking about Robin's hypothesis a lot recently, since many researchers in AI are starting to think about the impacts of their work (most still only think about the near-term societal impacts rather than thinking about superintelligence though). They recognize that this shift towards thinking about societal impacts is recent, but they have no idea why it is occurring. They know that many people, such as Elon Musk, have been outspoken about AI safety in the media recently, but few have heard of Superintelligence, or attribute the recent change to FHI or MIRI.

The great decline in Wikipedia pageviews (condensed version)

12 VipulNaik 27 March 2015 02:02PM

To keep this post manageable in length, I have only included a small subset of the illustrative examples and discussion. I have published a longer version of this post, with more examples (but the same intro and concluding section), on my personal site.

Last year, during the months of June and July, as my work for MIRI was wrapping up and I hadn't started my full-time job, I worked on the Wikipedia Views website, aimed at easier tabulation of the pageviews for multiple Wikipedia pages over several months and years. It relies on a statistics tool called, created by Doms Mituzas, and maintained by Henrik.

One of the interesting things I noted as I tabulated pageviews for many different pages was that the pageview counts for many already popular pages were in decline. Pages of various kinds peaked at different historical points. For instance, colors have been in decline since early 2013. The world's most populous countries have been in decline since as far back as 2010!

Defining the problem

The first thing to be clear about is what these pageviews count and what they don't. The pageview measures are taken from, which in turn uses the pagecounts-raw dump provided hourly by the Wikimedia Foundation's Analytics team, which in turn is obtained by processing raw user activity logs. The pagecounts-raw measure is flawed in two ways:

  • It only counts pageviews on the main Wikipedia website and not pageviews on the mobile Wikipedia website or through Wikipedia Zero (a pared down version of the mobile site that some carriers offer at zero bandwidth costs to their customers, particularly in developing countries). To remedy these problems, a new dump called pagecounts-all-sites was introduced in September 2014. We simply don't have data for views of mobile domains or of Wikipedia Zero at the level of individual pages for before then. Moreover, still uses pagecounts-raw (this was pointed to me in a mailing list message after I circulated an early version of the post).
  • The pageview count includes views by bots. The official estimate is that about 15% of pageviews are due to bots. However, the percentage is likely higher for pages with fewer overall pageviews, because bots have a minimum crawling frequency. So every page might have at least 3 bot crawls a day, resulting in a minimum of 90 bot pageviews even if there are only a handful of human pageviews.

Therefore, the trends I discuss will refer to trends in total pageviews for the main Wikipedia website, including page requests by bots, but excluding visits to mobile domains. Note that visits from mobile devices to the main site will be included, but mobile devices are by default redirected to the mobile site.

How reliable are the metrics?

As noted above, the metrics are unreliable because of the bot problem and the issue of counting only non-mobile traffic. German Wikipedia user Atlasowa left a message on my talk page pointing me to an email thread suggesting that about 40% of pageviews may be bot-related, and discussing some interesting examples.

Relationship with the overall numbers

I'll show that for many pages of interest, the number of pageviews as measured above (non-mobile) has declined recently, with a clear decline from 2013 to 2014. What about the total?

We have overall numbers for non-mobile, mobile, and combined. The combined number has largely held steady, whereas the non-mobile number has declined and the mobile number has risen.

What we'll find is that the decline for most pages that have been around for a while is even sharper than the overall decline. One reason overall pageviews haven't declined so fast is the creation of new pages. To give an idea, non-mobile traffic dropped by about 1/3 from January 2013 to December 2014, but for many leading categories of pages, traffic dropped by about 1/2-2/3.

Why is this important? First reason: better context for understanding trends for individual pages

People's behavior on Wikipedia is a barometer of what they're interested in learning about. An analysis of trends in the views of pages can provide an important window into how people's curiosity, and the way they satisfy this curiosity, is evolving. To take an example, some people have proposed using Wikipedia pageview trends to predict flu outbreaks. I myself have tried to use relative Wikipedia pageview counts to gauge changing interests in many topics, ranging from visa categories to technology companies.

My initial interest in pageview numbers arose because I wanted to track my own influence as a Wikipedia content creator. In fact, that was my original motivation with creating Wikipedia Views. (You can see more information about my Wikipedia content contributions on my site page about Wikipedia).

Now, when doing this sort of analysis for individual pages, one needs to account for, and control for, overall trends in the views of Wikipedia pages that are occurring for reasons other than a change in people's intrinsic interest in the subject. Otherwise, we might falsely conclude from a pageview count decline that a topic is falling in popularity, whereas what's really happening is an overall decline in the use of (the non-mobile version of) Wikipedia to satisfy one's curiosity about the topic.

Why is this important? Second reason: a better understanding of the overall size and growth of the Internet.

Wikipedia has been relatively mature and has had the top spot as an information source for at least the last six years. Moreover, unlike almost all other top websites, Wikipedia doesn't try hard to market or optimize itself, so trends in it reflect a relatively untarnished view of how the Internet and the World Wide Web as a whole are growing, independent of deliberate efforts to manipulate and doctor metrics.

The case of colors

Let's look at Wikipedia pages on some of the most viewed colors (I've removed the 2015 and 2007 columns because we don't have the entirety of these years). Colors are interesting because the degree of human interest in colors in general, and in individual colors, is unlikely to change much in response to news or current events. So one would at least a priori expect colors to offer a perspective into Wikipedia trends with fewer external complicating factors. If we see a clear decline here, then that's strong evidence in favor of a genuine decline.

I've restricted attention to a small subset of the colors, that includes the most common ones but isn't comprehensive. But it should be enough to get a sense of the trends. And you can add in your own colors and check that the trends hold up.

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Black 431K 1.5M 1.3M 778K 900K 1M 958K 6.9M 16.1 Colors
Blue 710K 1.3M 1M 987K 1.2M 1.2M 1.1M 7.6M 17.8 Colors
Brown 192K 284K 318K 292K 308K 300K 277K 2M 4.6 Colors
Green 422K 844K 779K 707K 882K 885K 733K 5.3M 12.3 Colors
Orange 133K 181K 251K 259K 275K 313K 318K 1.7M 4 Colors
Purple 524K 906K 847K 895K 865K 841K 592K 5.5M 12.8 Colors
Red 568K 797K 912K 1M 1.1M 873K 938K 6.2M 14.6 Colors
Violet 56K 96K 75K 77K 69K 71K 65K 509K 1.2 Colors
White 301K 795K 615K 545K 788K 575K 581K 4.2M 9.8 Colors
Yellow 304K 424K 453K 433K 452K 427K 398K 2.9M 6.8 Colors
Total 3.6M 7.1M 6.6M 6M 6.9M 6.5M 6M 43M 100 --
Percentage 8.5 16.7 15.4 14 16 15.3 14 100 -- --

Since the decline appears to have happened between 2013 and 2014, let's examine the 24 months from January 2013 to December 2014:


MonthViews of page BlackViews of page BlueViews of page BrownViews of page GreenViews of page OrangeViews of page PurpleViews of page RedViews of page VioletViews of page WhiteViews of page YellowTotal Percentage
201412 30K 41K 14K 27K 9.6K 28K 67K 3.1K 21K 19K 260K 2.4
201411 36K 46K 15K 31K 10K 35K 50K 3.7K 23K 22K 273K 2.5
201410 37K 52K 16K 34K 10K 34K 51K 4.5K 25K 26K 289K 2.7
201409 37K 57K 16K 35K 9.9K 37K 45K 4.8K 27K 29K 298K 2.8
201408 33K 47K 14K 34K 8.5K 31K 38K 3.9K 21K 22K 253K 2.4
201407 33K 47K 14K 30K 9.3K 31K 37K 4.2K 22K 22K 250K 2.3
201406 32K 49K 14K 31K 10K 34K 39K 4.9K 23K 22K 259K 2.4
201405 44K 55K 17K 37K 10K 51K 42K 5.2K 26K 26K 314K 2.9
201404 34K 60K 17K 36K 14K 38K 47K 5.8K 27K 28K 306K 2.8
201403 37K 136K 19K 51K 14K 123K 52K 5.5K 30K 31K 497K 4.6
201402 38K 58K 19K 39K 13K 41K 49K 5.6K 29K 29K 321K 3
201401 40K 60K 19K 36K 14K 40K 50K 4.4K 27K 28K 319K 3
201312 62K 67K 17K 44K 12K 48K 48K 4.4K 42K 26K 372K 3.5
201311 141K 96K 20K 65K 11K 68K 55K 5.3K 71K 34K 566K 5.3
201310 145K 102K 21K 69K 11K 77K 59K 5.7K 71K 36K 598K 5.6
201309 98K 80K 17K 60K 11K 53K 51K 4.9K 45K 30K 450K 4.2
201308 109K 87K 20K 57K 20K 57K 60K 4.6K 53K 28K 497K 4.6
201307 107K 92K 21K 61K 11K 66K 65K 4.6K 61K 30K 520K 4.8
201306 115K 106K 22K 69K 13K 73K 64K 5.5K 70K 33K 571K 5.3
201305 158K 122K 24K 79K 14K 83K 69K 11K 77K 39K 677K 6.3
201304 151K 127K 28K 83K 14K 86K 74K 12K 78K 40K 694K 6.4
201303 155K 135K 31K 92K 15K 99K 84K 12K 80K 43K 746K 6.9
201302 152K 131K 31K 84K 28K 95K 84K 17K 77K 41K 740K 6.9
201301 129K 126K 32K 81K 19K 99K 84K 9.6K 70K 42K 691K 6.4
Total 2M 2M 476K 1.3M 314K 1.4M 1.4M 152K 1.1M 728K 11M 100
Percentage 18.1 18.4 4.4 11.8 2.9 13.3 12.7 1.4 10.2 6.8 100 --
Tags Colors Colors Colors Colors Colors Colors Colors Colors Colors Colors -- --


As we can see, the decline appears to have begun around March 2013 and then continued steadily till about June 2014, at which numbers stabilized to their lower levels.

A few sanity checks on these numbers:

  • The trends appear to be similar for different colors, with the notable difference that the proportional drop was higher for the more viewed color pages. Thus, for instance, black and blue saw declines from 129K and 126K to 30K and 41K respectively (factors of four and three respectively) from January 2013 to December 2014. Orange and yellow, on the other hand, dropped by factors of close to two. The only color that didn't drop significantly was red (it dropped from 84K to 67K, as opposed to factors of two or more for other colors), but this seems to have been partly due to an unusually large amount of traffic in the end of 2014. The trend even for red seems to suggest a drop similar to that for orange.
  • The overall proportion of views for different colors comports with our overall knowledge of people's color preferences: blue is overall a favorite color, and this is reflected in its getting the top spot with respect to pageviews.
  • The pageview decline followed a relatively steady trend, with the exception of some unusual seasonal fluctuation (including an increase in October and November 2013).

One might imagine that this is due to people shifting attention from the English-language Wikipedia to other language Wikipedias, but most of the other major language Wikipedias saw a similar decline at a similar time. More details are in my longer version of this post on my personal site.

Geography: continents and subcontinents, countries, and cities

Here are the views of some of the world's most populated countries between 2008 and 2014, showing that the peak happened as far back as 2010:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
China 5.7M 6.8M 7.8M 6.1M 6.9M 5.7M 6.1M 45M 9 Countries
India 8.8M 12M 12M 11M 14M 8.8M 7.6M 73M 14.5 Countries
United States 13M 15M 18M 18M 34M 16M 15M 129M 25.7 Countries
Indonesia 5.3M 5.2M 3.7M 3.6M 4.2M 3.1M 2.5M 28M 5.5 Countries
Brazil 4.8M 4.9M 5.3M 5.5M 7.5M 4.9M 4.3M 37M 7.4 Countries
Pakistan 2.9M 4.5M 4.4M 4.3M 5.2M 4M 3.2M 28M 5.7 Countries
Bangladesh 2.2M 2.9M 3M 2.8M 2.9M 2.2M 1.7M 18M 3.5 Countries
Russia 5.6M 5.6M 6.5M 6.8M 8.6M 5.4M 5.8M 44M 8.8 Countries
Nigeria 2.6M 2.6M 2.9M 3M 3.5M 2.6M 2M 19M 3.8 Countries
Japan 4.8M 6.4M 6.5M 8.3M 10M 7.3M 6.6M 50M 10 Countries
Mexico 3.1M 3.9M 4.3M 4.3M 5.9M 4.7M 4.5M 31M 6.1 Countries
Total 59M 69M 74M 74M 103M 65M 59M 502M 100 --
Percentage 11.7 13.8 14.7 14.7 20.4 12.9 11.8 100 -- --

Of these countries, China, India and the United States are the most notable. China is the world's most populous. India has the largest population with some minimal English knowledge and legally (largely) unfettered Internet access to Wikipedia, while the United States has the largest population with quality Internet connectivity and good English knowledge. Moreover, in China and India, Internet use and access have been growing considerably in the last few years, whereas it has been relatively stable in the United States.

It is interesting that the year with the maximum total pageview count was as far back as 2010. In fact, 2010 was so significantly better than the other years that the numbers beg for an explanation. I don't have one, but even excluding 2010, we see a declining trend: gradual growth from 2008 to 2011, and then a symmetrically gradual decline. Both the growth trend and the decline trend are quite similar across countries.

We see a similar trend for continents and subcontinents, with the peak occurring in 2010. In contrast, the smaller counterparts, such as cities, peaked in 2013, similarly to colors, and the drop, though somewhat less steep than with colors, has been quite significant. For instance, a list for Indian cities shows that the total pageviews for these Indian cities declined from about 20 million in 2013 (after steady growth in the preceding years) to about 13 million in 2014.

Some niche topics where pageviews haven't declined

So far, we've looked at topics where pageviews have been declining since at least 2013, and some that peaked as far back as 2010. There are, however, many relatively niche topics where the number of pageviews has stayed roughly constant. But this stability itself is a sign of decay, because other metrics suggest that the topics have experienced tremendous growth in interest. In fact, the stability is even less impressive when we notice that it's a result of a cancellation between slight declines in views of established pages in the genre, and traffic going to new pages.

For instance, consider some charity-related pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Against Malaria Foundation 5.9K 6.3K 4.3K 1.4K 2 0 0 18K 15.6 Charities
Development Media International 757 0 0 0 0 0 0 757 0.7 Pages created by Vipul Naik Charities
Deworm the World Initiative 2.3K 277 0 0 0 0 0 2.6K 2.3 Charities Pages created by Vipul Naik
GiveDirectly 11K 8.3K 2.6K 442 0 0 0 22K 19.2 Charities Pages created by Vipul Naik
International Council for the Control of Iodine Deficiency Disorders 1.2K 1 2 2 0 1 2 1.2K 1.1 Charities Pages created by Vipul Naik
Nothing But Nets 5.9K 6.6K 6.6K 5.1K 4.4K 4.7K 6.1K 39K 34.2 Charities
Nurse-Family Partnership 2.9K 2.8K 909 30 8 72 63 6.8K 5.9 Pages created by Vipul Naik Charities
Root Capital 3K 2.5K 414 155 51 1.2K 21 7.3K 6.3 Charities Pages created by Vipul Naik
Schistosomiasis Control Initiative 4K 2.7K 1.6K 191 0 0 0 8.5K 7.4 Charities Pages created by Vipul Naik
VillageReach 1.7K 1.9K 2.2K 2.6K 97 3 15 8.4K 7.3 Charities Pages created by Vipul Naik
Total 38K 31K 19K 9.9K 4.6K 5.9K 6.2K 115K 100 --
Percentage 33.4 27.3 16.3 8.6 4 5.1 5.4 100 -- --

For this particular cluster of pages, we see the totals growing robustly year-on-year. But a closer look shows that the growth isn't that impressive. Whereas earlier, views were doubling every year from 2010 to 2013 (this was the take-off period for GiveWell and effective altruism), the growth from 2013 to 2014 was relatively small. And about half the growth from 2013 to 2014 was powered by the creation of new pages (including some pages created after the beginning of 2013, so they had more months in a mature state in 2014 than in 2013), while the other half was powered by growth in traffic to existing pages.

The data for philanthropic foundations demonstrates a fairly slow and steady growth (about 5% a year), partly due to the creation of new pages. This 5% hides a lot of variation between individual pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Atlantic Philanthropies 11K 11K 12K 10K 9.8K 8K 5.8K 67K 2.1 Philanthropic foundations
Bill & Melinda Gates Foundation 336K 353K 335K 315K 266K 240K 237K 2.1M 64.9 Philanthropic foundations
Draper Richards Kaplan Foundation 1.2K 25 9 0 0 0 0 1.2K 0 Philanthropic foundations Pages created by Vipul Naik
Ford Foundation 110K 91K 100K 90K 100K 73K 61K 625K 19.5 Philanthropic foundations
Good Ventures 9.9K 8.6K 3K 0 0 0 0 21K 0.7 Philanthropic foundations Pages created by Vipul Naik
Jasmine Social Investments 2.3K 1.8K 846 0 0 0 0 5K 0.2 Philanthropic foundations Pages created by Vipul Naik
Laura and John Arnold Foundation 3.7K 13 0 1 0 0 0 3.7K 0.1 Philanthropic foundations Pages created by Vipul Naik
Mulago Foundation 2.4K 2.3K 921 0 1 1 10 5.6K 0.2 Philanthropic foundations Pages created by Vipul Naik
Omidyar Network 26K 23K 19K 17K 19K 13K 11K 129K 4 Philanthropic foundations
Peery Foundation 1.8K 1.6K 436 0 0 0 0 3.9K 0.1 Philanthropic foundations Pages created by Vipul Naik
Robert Wood Johnson Foundation 26K 26K 26K 22K 27K 22K 17K 167K 5.2 Philanthropic foundations
Skoll Foundation 13K 11K 9.2K 7.8K 9.6K 5.8K 4.3K 60K 1.9 Philanthropic foundations
Smith Richardson Foundation 8.7K 3.5K 3.8K 3.6K 3.7K 3.5K 2.9K 30K 0.9 Philanthropic foundations
Thiel Foundation 3.6K 1.5K 1.1K 47 26 1 0 6.3K 0.2 Philanthropic foundations Pages created by Vipul Naik
Total 556K 533K 511K 466K 435K 365K 340K 3.2M 100 --
Percentage 17.3 16.6 15.9 14.5 13.6 11.4 10.6 100 -- --


The dominant hypothesis: shift from non-mobile to mobile Wikipedia use

The dominant hypothesis is that pageviews have simply migrated from non-mobile to mobile. This is most closely borne by the overall data: total pageviews have remained roughly constant, and the decline in total non-mobile pageviews has been roughly canceled by growth in mobile pageviews. However, the evidence for this substitution doesn't exist at the level of individual pages because we don't have pageview data for the mobile domain before September 2014, and much of the decline occurred between March 2013 and June 2014.

What would it mean if there were an approximate one-on-one substitution from non-mobile to mobile for the page types discussed above? For instance, non-mobile traffic to colors dropped to somewhere between 1/3 and 1/2 of their original traffic level between January 2013 and December 2014. This would mean that somewhere between 1/2 and 2/3 of the original non-mobile traffic to colors has shifted to mobile devices. This theory should be at least partly falsifiable: if the sum of traffic to non-mobile and mobile platforms today for colors is less than non-mobile-only traffic in January 2013, then clearly substitution is only part of the story.

Although the data is available, it's not currently in an easily computable form, and I don't currently have the time and energy to extract it. I'll update this once the data on all pageviews since September 2014 is available on or a similar platform.

Other hypotheses

The following are some other hypotheses for the pageview decline:

  1. Google's Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets (called Knowledge Cards and Knowledge Panels) based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn't need to click through to the Wikipedia page. I suspect that the Knowledge Graph played some role in the decline for colors seen between March 2013 and June 2014. On the other hand, many of the pages that saw a decline don't have any search snippets based on the Knowledge Graph, and therefore the decline for those pages cannot be explained this way.
  2. Other means of accessing Wikipedia's knowledge that don't involve viewing it directly: For instance, Apple's Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia. The usage of such tools has increased greatly starting in late 2012. Siri itself was released with the third generation iPad in September 2012 and became part of the iPhone released the next month. Since then, it has shipped with all of Apple's mobile devices and tablets.
  3. Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position. I think there's a lot of truth to this, but it's hard to quantify.
  4. Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what's going on with the early decline of pageviews (e.g., the decline in pageviews of countries and continents starting around 2010, as people go directly to specialized articles related to the particular aspects of those countries or continents they are interested in).
  5. Substitution to Internet use in other languages: This hypothesis doesn't seem borne out by the simultaneous decline in pageviews for the English, French, and Spanish Wikipedia, as documented for the color pages.

It's still a mystery

I'd like to close by noting that the pageview decline is still very much a mystery as far as I am concerned. I hope I've convinced you that (a) the mystery is genuine, (b) it's important, and (c) although the shift to mobile is probably the most likely explanation, we don't yet have clear evidence. I'm interested in hearing whether people have alternative explanations, and/or whether they have more compelling arguments for some of the explanations proffered here.

Just a casual question regarding MIRI

12 Faustus2 22 March 2015 08:16PM

Currently I am planning to start a mathematics degree when I enter university, however my interest has shifted largely to computational neuroscience and related fields, so I'm now planning to switch to an AI degree when I go to study. Having said that, MIRI has always posed interesting problems to me, and I have entertained the thought of trying to do some work for MIRI before. And so my question boils down to this: Would there be any problem with taking the AI degree if I ever wanted to try my hand at doing some math for MIRI? Is a maths degree essential or would an AI degree with a good grasp on mathematics related to MIRI work just as well? Any thoughts or musings would be appreciated :)

Lessons from each HPMOR chapter in one line [link]

11 adamzerner 09 April 2015 02:51PM

Is arrogance a symptom of bad intellectual hygeine?

11 enfascination 21 March 2015 07:59PM

I have this belief that humility is a part of good critical thinking, and that egoism undermines it.  I imagine arrogance as a kind of mind-death.  But I have no evidence, and no good mechanism by which it might be true.  In fact, I know the belief is suspect because I know that I want it to be true — I want to be able to assure myself that this or that intolerable academic will be magically punished with a decreased capacity to do good work. The truth could be the opposite: maybe hubris breeds confidence, and confidence results? After all, some of the most important thinkers in history were insufferable.

Is any link, positive or negative, between arrogance and reasoning too tenuous to be worth entertaining? Is humility a pretty word or a valuable habit? I don't know what I think yet.   Do you?

Concept Safety: The problem of alien concepts

10 Kaj_Sotala 17 April 2015 02:09PM

I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.

In the previous post in this series, I talked about how one might get an AI to have similar concepts as humans. However, one would intuitively assume that a superintelligent AI might eventually develop the capability to entertain far more sophisticated concepts than humans would ever be capable of having. Is that a problem?

Just what are concepts, anyway?

To answer the question, we first need to define what exactly it is that we mean by a "concept", and why exactly more sophisticated concepts would be a problem.

Unfortunately, there isn't really any standard definition of this in the literature, with different theorists having different definitions. Machery even argues that the term "concept" doesn't refer to a natural kind, and that we should just get rid of the whole term. If nothing else, this definition from Kruschke (2008) is at least amusing:

Models of categorization are usually designed to address data from laboratory experiments, so “categorization” might be best defined as the class of behavioral data generated by experiments that ostensibly study categorization.

Because I don't really have the time to survey the whole literature and try to come up with one grand theory of the subject, I will for now limit my scope and only consider two compatible definitions of the term.

Definition 1: Concepts as multimodal neural representations. I touched upon this definition in the last post, where I mentioned studies indicating that the brain seems to have shared neural representations for e.g. the touch and sight of a banana. Current neuroscience seems to indicate the existence of brain areas where representations from several different senses are combined together into higher-level representations, and where the activation of any such higher-level representation will also end up activating the lower sense modalities in turn. As summarized by Man et al. (2013):

Briefly, the Damasio framework proposes an architecture of convergence-divergence zones (CDZ) and a mechanism of time-locked retroactivation. Convergence-divergence zones are arranged in a multi-level hierarchy, with higher-level CDZs being both sensitive to, and capable of reinstating, specific patterns of activity in lower-level CDZs. Successive levels of CDZs are tuned to detect increasingly complex features. Each more-complex feature is defined by the conjunction and configuration of multiple less-complex features detected by the preceding level. CDZs at the highest levels of the hierarchy achieve the highest level of semantic and contextual integration, across all sensory modalities. At the foundations of the hierarchy lie the early sensory cortices, each containing a mapped (i.e., retinotopic, tonotopic, or somatotopic) representation of sensory space. When a CDZ is activated by an input pattern that resembles the template for which it has been tuned, it retro-activates the template pattern of lower-level CDZs. This continues down the hierarchy of CDZs, resulting in an ensemble of well-specified and time-locked activity extending to the early sensory cortices.

On this account, my mental concept for "dog" consists of a neural activation pattern making up the sight, sound, etc. of some dog - either a generic prototypical dog or some more specific dog. Likely the pattern is not just limited to sensory information, either, but may be associated with e.g. motor programs related to dogs. For example, the program for throwing a ball for the dog to fetch. One version of this hypothesis, the Perceptual Symbol Systems account, calls such multimodal representations simulators, and describes them as follows (Niedenthal et al. 2005):

A simulator integrates the modality-specific content of a category across instances and provides the ability to identify items encountered subsequently as instances of the same category. Consider a simulator for the social category, politician. Following exposure to different politicians, visual information about how typical politicians look (i.e., based on their typical age, sex, and role constraints on their dress and their facial expressions) becomes integrated in the simulator, along with auditory information for how they typically sound when they talk (or scream or grovel), motor programs for interacting with them, typical emotional responses induced in interactions or exposures to them, and so forth. The consequence is a system distributed throughout the brain’s feature and association areas that essentially represents knowledge of the social category, politician.

The inclusion of such "extra-sensory" features helps understand how even abstract concepts could fit this framework: for example, one's understanding of the concept of a derivative might be partially linked to the procedural programs one has developed while solving derivatives. For a more detailed hypothesis of how abstract mathematics may emerge from basic sensory and motor programs and concepts, I recommend Lakoff & Nuñez (2001).

Definition 2: Concepts as areas in a psychological space. This definition, while being compatible with the previous one, looks at concepts more "from the inside". Gärdenfors (2000) defines the basic building blocks of a psychological conceptual space to be various quality dimensions, such as temperature, weight, brightness, pitch, and the spatial dimensions of height, width, and depth. These are psychological in the sense of being derived from our phenomenal experience of certain kinds of properties, rather than the way in which they might exist in some objective reality.

For example, one way of modeling the psychological sense of color is via a color space defined by the quality dimensions of hue (represented by the familiar color circle), chromaticness (saturation), and brightness.

The second phenomenal dimension of color is chromaticness (saturation), which ranges from grey (zero color intensity) to increasingly greater intensities. This dimension is isomorphic to an interval of the real line. The third dimension is brightness which varies from white to black and is thus a linear dimension with two end points. The two latter dimensions are not totally independent, since the possible variation of the chromaticness dimension decreases as the values of the brightness dimension approaches the extreme points of black and white, respectively. In other words, for an almost white or almost black color, there can be very little variation in its chromaticness. This is modeled by letting that chromaticness and brightness dimension together generate a triangular representation ... Together these three dimensions, one with circular structure and two with linear, make up the color space. This space is often illustrated by the so called color spindle

This kind of a representation is different from the physical wavelength representation of color, where e.g. the hue is mostly related to the wavelength of the color. The wavelength representation of hue would be linear, but due to the properties of the human visual system, the psychological representation of hue is circular.

Gärdenfors defines two quality dimensions to be integral if a value cannot be given for an object on one dimension without also giving it a value for the other dimension: for example, an object cannot be given a hue value without also giving it a brightness value. Dimensions that are not integral with each other are separable. A conceptual domain is a set of integral dimensions that are separable from all other dimensions: for example, the three color-dimensions form the domain of color.

From these definitions, Gärdenfors develops a theory of concepts where more complicated conceptual spaces can be formed by combining lower-level domains. Concepts, then, are particular regions in these conceptual spaces: for example, the concept of "blue" can be defined as a particular region in the domain of color. Notice that the notion of various combinations of basic perceptual domains making more complicated conceptual spaces possible fits well together with the models discussed in our previous definition. There more complicated concepts were made possible by combining basic neural representations for e.g. different sensory modalities.

The origin of the different quality dimensions could also emerge from the specific properties of the different simulators, as in PSS theory.

Thus definition #1 allows us to talk about what a concept might "look like from the outside", with definition #2 talking about what the same concept might "look like from the inside".

Interestingly, Gärdenfors hypothesizes that much of the work involved with learning new concepts has to do with learning new quality dimensions to fit into one's conceptual space, and that once this is done, all that remains is the comparatively much simpler task of just dividing up the new domain to match seen examples.

For example, consider the (phenomenal) dimension of volume. The experiments on "conservation" performed by Piaget and his followers indicate that small children have no separate representation of volume; they confuse the volume of a liquid with the height of the liquid in its container. It is only at about the age of five years that they learn to represent the two dimensions separately. Similarly, three- and four-year-olds confuse high with tall, big with bright, and so forth (Carey 1978).

The problem of alien concepts

With these definitions for concepts, we can now consider what problems would follow if we started off with a very human-like AI that had the same concepts as we did, but then expanded its conceptual space to allow for entirely new kinds of concepts. This could happen if it self-modified to have new kinds of sensory or thought modalities that it could associate its existing concepts with, thus developing new kinds of quality dimensions.

An analogy helps demonstrate this problem: suppose that you're operating in a two-dimensional space, where a rectangle has been drawn to mark a certain area as "forbidden" or "allowed". Say that you're an inhabitant of Flatland. But then you suddenly become aware that actually, the world is three-dimensional, and has a height dimension as well! That raises the question of, how should the "forbidden" or "allowed" area be understood in this new three-dimensional world? Do the walls of the rectangle extend infinitely in the height dimension, or perhaps just some certain distance in it? If just a certain distance, does the rectangle have a "roof" or "floor", or can you just enter (or leave) the rectangle from the top or the bottom? There doesn't seem to be any clear way to tell.

As a historical curiosity, this dilemma actually kind of really happened when airplanes were invented: could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? Courts and legislation eventually settled on the latter answer. A more AI-relevant example might be if one was trying to limit the AI with rules such as "stay within this box here", and the AI then gained an intuitive understanding of quantum mechanics, which might allow it to escape from the box without violating the rule in terms of its new concept space.

More generally, if previously your concepts had N dimensions and now they have N+1, you might find something that fulfilled all the previous criteria while still being different from what we'd prefer if we knew about the N+1th dimension.

In the next post, I will present some (very preliminary and probably wrong) ideas for solving this problem.

Crude measures

10 Stuart_Armstrong 27 March 2015 03:44PM

A putative new idea for AI control; index here.

Partially inspired by as conversation with Daniel Dewey.

People often come up with a single great idea for AI, like "complexity" or "respect", that will supposedly solve the whole control problem in one swoop. Once you've done it a few times, it's generally trivially easy to start taking these ideas apart (first step: find a bad situation with high complexity/respect and a good situation with lower complexity/respect, make the bad very bad, and challenge on that). The general responses to these kinds of idea are listed here.

However, it seems to me that rather than constructing counterexamples each time, we should have a general category and slot these ideas into them. And not only have a general category with "why this can't work" attached to it, but "these are methods that can make it work better". Seeing the things needed to make their idea better can make people understand the problems, where simple counter-arguments cannot. And, possibly, if we improve the methods, one of these simple ideas may end up being implementable.


Crude measures

The category I'm proposing to define is that of "crude measures". Crude measures are methods that attempt to rely on non-fully-specified features of the world to ensure that an underdefined or underpowered solution does manage to solve the problem.

To illustrate, consider the problem of building an atomic bomb. The scientists that did it had a very detailed model of how nuclear physics worked, the properties of the various elements, and what would happen under certain circumstances. They ended up producing an atomic bomb.

The politicians who started the project knew none of that. They shovelled resources, money and administrators at scientists, and got the result they wanted - the Bomb - without ever understanding what really happened. Note that the politicians were successful, but it was a success that could only have been achieved at one particular point in history. Had they done exactly the same thing twenty years before, they would not have succeeded. Similarly, Nazi Germany tried a roughly similar approach to what the US did (on a smaller scale) and it went nowhere.

So I would define "shovel resources at atomic scientists to get a nuclear weapon" as a crude measure. It works, but it only works because there are other features of the environment that are making it work. In this case, the scientists themselves. However, certain social and human features about those scientists (which politicians are good at estimating) made it likely to work - or at least more likely to work than shovelling resources at peanut-farmers to build moon rockets.

In the case of AI, advocating for complexity is similarly a crude measure. If it works, it will work because of very contingent features about the environment, the AI design, the setup of the world etc..., not because "complexity" is intrinsically a solution to the FAI problem. And though we are confident that human politicians have some good enough idea about human motivations and culture that the Manhattan project had at least some chance of working... we don't have confidence that those suggesting crude measures for AI control have a good enough idea to make their idea works.

It should be evident that "crudeness" is on a sliding scale; I'd like to reserve the term for proposed solutions to the full FAI problem that do not in any way solve the deep questions about FAI.


More or less crude

The next question is, if we have a crude measure, how can we judge its chance of success? Or, if we can't even do that, can we at least improve the chances of it working?

The main problem is, of course, that of optimising. Either optimising in the sense of maximising the measure (maximum complexity!) or of choosing the measure that is most extreme fit to the definition (maximally narrow definition of complexity!). It seems we might be able to do something about this.

Let's start by having AI create sample a large class of utility functions. Require them to be around the same expected complexity as human values. Then we use our crude measure μ - for argument's sake, let's make it something like "approval by simulated (or hypothetical) humans, on a numerical scale". This is certainly a crude measure.

We can then rank all the utility functions u, using μ to measure the value of "create M(u), a u-maximising AI, with this utility function". Then, to avoid the problems with optimisation, we could select a certain threshold value and pick any u such that E(μ|M(u)) is just above the threshold.

How to pick this threshold? Well, we might have some principled arguments ("this is about as good a future as we'd expect, and this is about as good as we expect that these simulated humans would judge it, honestly, without being hacked").

One thing we might want to do is have multiple μ, and select things that score reasonably (but not excessively) on all of them. This is related to my idea that the best Turing test is one that the computer has not been trained or optimised on. Ideally, you'd want there to be some category of utilities "be genuinely friendly" that score higher than you'd expect on many diverse human-related μ (it may be better to randomly sample rather than fitting to precise criteria).

You could see this as saying that "programming an AI to preserve human happiness is insanely dangerous, but if you find an AI programmed to satisfice human preferences, and that other AI also happens to preserve human happiness (without knowing it would be tested on this preservation), then... it might be safer".

There are a few other thoughts we might have for trying to pick a safer u:

  • Properties of utilities under trade (are human-friendly functions more or less likely to be tradable with each other and with other utilities)?
  • If we change the definition of "human", this should have effects that seem reasonable for the change. Or some sort of "free will" approach: if we change human preferences, we want the outcome of u to change in ways comparable with that change.
  • Maybe also check whether there is a wide enough variety of future outcomes, that don't depend on the AI's choices (but on human choices - ideas from "detecting agents" may be relevant here).
  • Changing the observers from hypothetical to real (or making the creation of the AI contingent, or not, on the approval), should not change the expected outcome of u much.
  • Making sure that the utility u can be used to successfully model humans (therefore properly reflects the information inside humans).
  • Make sure that u is stable to general noise (hence not over-optimised). Stability can be measured as changes in E(μ|M(u)), E(u|M(u)), E(v|M(u)) for generic v, and other means.
  • Make sure that u is unstable to "nasty" noise (eg reversing human pain and pleasure).
  • All utilities in a certain class - the human-friendly class, hopefully - should score highly under each other (E(u|M(u)) not too far off from E(u|M(v))), while the over-optimised solutions - those scoring highly under some μ - must not score high under the class of human-friendly utilities.

This is just a first stab at it. It does seem to me that we should be able to abstractly characterise the properties we want from a friendly utility function, which, combined with crude measures, might actually allow us to select one without fully defining it. Any thoughts?

And with that, the various results of my AI retreat are available to all.

Rationality Reading Group - Introduction and A: Predictably Wrong

9 Mark_Friedenbach 17 April 2015 01:40AM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.

Welcome to the Rationality reading group. This week we discuss the Preface by primary author Eliezer Yudkowsky, Introduction by editor & co-author Rob Bensinger, and the first sequence: Predictably Wrong. This sequence introduces the methods of rationality, including its two major applications: the search for truth and the art of winning. The desire to seek truth is motivated, and a few obstacles to seeking truth--systematic errors, or biases--are discussed in detail.

This post summarizes each article of the sequence, linking to the original LessWrong posting where available, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

Reading: Preface, Biases: An Introduction, and Sequence A: Predictably Wrong (pi-xxxv and p1-42)


Preface. Introduction to the ebook compilation by Eliezer Yudkowsky. Retrospectively identifies mistakes of the text as originally presented. Some have been corrected in the ebook, others stand as-is. Most notably the book focuses too much on belief, and too little on practical actions, especially with respect to our everyday lives. Establishes that the goal of the project is to teach rationality, those ways of thinking which are common among practicing scientists and the foundation of the Enlightenment, yet not systematically organized or taught in schools (yet).

Biases: An Introduction. Editor & co-author Rob Bensinger motivates the subject of rationality by explaining the dangers of systematic errors caused by *cognitive biases*, which the arts of rationality are intended to de-bias. Rationality is not about Spock-like stoicism -- it is about simply "doing the best you can with what you've got." The System 1 / System 2 dual process dichotomy is explained: if our errors are systematic and predictable, then we can instil behaviors and habits to correct them. A number of exemplar biases are presented. However a warning: it is difficult to recognize biases in your own thinking even after learning of them, and knowing about a bias may grant unjustified overconfidence that you yourself do not fall pray to such mistakes in your thinking. To develop as a rationalist actual experience is required, not just learned expertise / knowledge. Ends with an introduction of the editor and an overview of the organization of the book.

A. Predictably Wrong

1. What do I mean by "rationality"? Rationality is a systematic means of forming true beliefs and making winning decisions. Probability theory is the set of laws underlying rational belief, "epistemic rationality": it describes how to process evidence and observations to revise ("update") one's beliefs. Decision theory is the set of laws underlying rational action, "instrumental rationality", independent of what one's goals and available options are. (p7-11)

2. Feeling rational. Becoming more rational can diminish feelings or intensify them. If one cares about the state of the world, it is expected that he or she should have an emotional response to the acquisition of truth. "That which can be destroyed by the truth should be," but also "that which the truth nourishes should thrive." The commonly perceived dichotomy between emotions and "rationality" [sic] is more often about fast perceptual judgements (System 1, emotional) vs slow deliberative judgements (System 2, "rational" [sic]). But both systems can serve the goal of truth, or defeat it, depending on how they are used. (p12-14)

3. Why truth? and... Why seek the truth? Curiosity: to satisfy an emotional need to know. Pragmatism: to accomplish some specific real-world goal. Morality: to be virtuous, or fulfill a duty to truth. Curiosity motivates a search for the most intriguing truths, pragmatism the most useful, and morality the most important. But be wary of the moral justification: "To make rationality into a moral duty is to give it all the dreadful degrees of freedom of an arbitrary tribal custom. People arrive at the wrong answer, and then indignantly protest that they acted with propriety, rather than learning from their mistake." (p15-18)

4. ...what's a bias, again? A bias is an obstacle to truth, specifically those obstacles which are produced by our own thinking processes. We describe biases as failure modes which systematically prevent typical human beings from determining truth or selecting actions that would have best achieved their goals. Biases are distinguished from mistakes which originate from false beliefs or brain injury. Do better seek truth and achieve our goals we must identify our biases and do what we can to correct for or eliminate them. (p19-22)

5. Availability. The availability heuristic is judging the frequency or probability of an event by the ease with which examples of the event come to mind. If you think you've heard about murders twice as much as suicides then you might suppose that murder is twice as common as suicide, when in fact the opposite is true. Use of the availability heuristic gives rise to the absurdity bias: events that have never happened are not recalled, and hence deemed to have no probability of occurring. In general, memory is not always a good guide to probabilities in the past, let alone to the future. (p23-25)

6. Burdensome details. The conjunction fallacy is when humans rate the probability of two events has higher than the probability of either event alone: adding detail can make a scenario sound more plausible, even though the event as described necessarily becomes less probable. Possible fixes include training yourself to notice the addition of details and discount appropriately, thinking about other reasons why the central idea could be true other than the added detail, or training oneself to hold a preference for simpler explanations -- to feel every added detail as a burden. (p26-29)

7. Planning fallacy. The planning fallacy is the mistaken belief that human beings are capable of making accurate plans. The source of the error is that we tend to imagine how things will turn out if everything goes according to plan, and do not appropriately account for possible troubles or difficulties along the way. The typically adequate solution is to compare the new project to broadly similar previous projects undertaken in the past, and ask how long those took to complete. (p30-33)

8. Illusion of transparency: why no one understands you. The illusion of transparency is our bias to assume that others will understand the intent behind our attempts to communicate. The source of the error is that we do not sufficiently consider alternative frames of mind or personal histories, which might lead the recipient to alternative interpretations. Be not too quick to blame those who misunderstand your perfectly clear sentences, spoken or written. Chances are, your words are more ambiguous than you think. (p34-36)

9. Expecting short inferential distances. Human beings are generally capable of processing only one piece of new information at at time. Worse, someone who says something with no obvious support is a liar or an idiot, and if you say something blatantly obvious and the other person doesn't see it, they're the idiot. This is our bias towards explanations of short inferential distance. A clear argument has to lay out an inferential pathway, starting from what the audience already knows or accepts. If at any point you make a statement without obvious justification in arguments you've previously supported, the audience just thinks you're crazy. (p37-39)

10. The lens that sees its own flaws. We humans have the ability to introspect our own thinking processes, a seemingly unique skill among life on Earth. As consequence, a human brain is able to understand its own flaws--its systematic errors, its biases--and apply second-order corrections to them. (p40-42)

It is at this point that I would generally like to present an opposing viewpoint. However I must say that this first introductory sequence is not very controversial! Educational, yes, but not controversial. If anyone can provide a link or citation to one or more decent non-strawman arguments which oppose any of the ideas of this introduction and first sequence, please do so in the comments. I certainly encourage awarding karma to anyone that can do a reasonable job steel-manning an opposing viewpoint.

This has been a collection of notes on the assigned sequence for this week. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Sequence B: Fake Beliefs (p43-77). The discussion will go live on Wednesday, 6 May 2015 at or around 6pm PDT, right here on the discussion forum of LessWrong.

I'd like advice from LW regarding migraines

9 Algon 11 April 2015 05:52PM

So, I read a post a little while ago saying that asking the community for advice on personal problems was okay, and no one seemed to disagree strongly with this. Therefore, I'll just ask for some advice, and hope that I'm not accidentally going past some line. If I do, I apologise

 I have had migraines for quite a while now. They started when I was a child, but were infrequent in those days. They got progressively worse as time went on, and things started to get quite bad when I was about 12. A few years down the line, I would have headaches for months at a time, with migraines popping up for a few days a month. It got worse from there. Now, I have had migraine-like symptoms for 10 months now. I say migraine-like because part of the definition of a migraine is that it lasts from about 3 hours to a few days. According to a neurologist I recently went to, I have transformative migraines, or wording similar to that. So I have all the symptoms of migraines, except they last for inordinate amounts of time. I've had an MRI, and it showed nothing wrong with my brain. According to the World Health Organisation, this is more disabling than blindness, and as bad as acute psychosis: You can see why its rather important to me that I get rid of/deal with this.

Now, I've tried quite a lot of things over the years, especially in the last two or so. NSAIDs do very little, and thing like migraleve (paracetamol with codeine) are a little better. Sumatriptan provides some relief, but it doesn't get rid of the migraine. At best it will knock me down to a weak migraine. I've tried taking propanalol (160mg) for half a year, and it does little to help. I was prescribd Amitiptyline (10mg) a week ago, but it hasn't had much effect. I was told to increase by 10mg it every two weeks until I hit 30mg. I've also tried cutting things out like chocolates, and dairy for a month. It didn't have any effect. I also don't have any caffeine. So this eliminates some common causes of migraines. My migraines sometimes respond to heat/cold applied to my head, but this is only some of the time, due to my migraines shifting in nature. Further, it only takes the edge of them. I've also tried taking magnesium supplements, but they had a negative effect on me i.e. strange dreams and insomnia. That just made my problems worse. Also, I've ruled out medication overuse.

 So, does anyone have any recommendations? There should be a few people who have had experience with this level of migraines, and I expect they might be able to provide some advice. I'm not too optimistic, but I really need something that works.

Effective Sustainability - results from a meetup discussion

9 Gunnar_Zarncke 29 March 2015 10:15PM

Related-to Focus Areas of Effective Altruism

These are some small tidbits from our LW-like Meetup in Hamburg. The focus was on sustainability not on altruism as that was more in the spirit of our group. EA was mentioned but no comparison was made. Well-informed effective altruists will probably find little new in this writeup.

So we discussed effective sustainability. To this end we were primed to think rationally by my 11-year old who moderated a session on mind-mapping 'reason' (with contributions from the children). Then we set out to objectively compare concrete everyday things by their sustainability. And how to do this. 

Is it better to drink fruit juice or wine? Or wine or water? Or wine vs. nothing ( forego sth.)? Or wine vs. paper towels? (the latter intentionally different)

The idea was to arrive at simple rules of thumb to evaluate the sustainability of something. But we discovered that even simple comparisons are not that simple and intuition can run afoul (surpise!). One example was that apparently tote bags are not clearly better than plastic bags in terms of sustainability. But even the simple comparison of tap water vs. wine which seems like a trivial subset case is non-trivial when you consider where the water comes from and how it is extracted from the ground (we still think that water is better but we not as sure as before).

We discussed some ways to measure sustainability (in brackets to which we reduced it):

  • fresh water use -> energy
  • packaging material used -> energy, permanent ressources
  • transport -> energy
  • energy -> CO_2, permanent ressources
  • CO_2 production 
  • permanent consumption of ressources

Life-Cycle-Assessment (German: Ökobilanz) was mentioned in this context but it was unclear what that meant precisely. Only afterwards was it discovered that it's a blanket term for exactly this question (with lots of estabilished measurements for which it is unclear how to simplify them for everyday use).

We didn't try to break this down - a practical everyday approch doesn't allow for that and the time spent on analysing and comparing options is also equivalent to ressources possibly not spent efficiently.

One unanswered question was how much time to invest in comparing alternatives. Too little comparison means to take the nextbest option which is what most people apparently do and which also apparently doesn't lead to overall sustainable behavior. But too much analysis of simple decisions is also no option.

The idea was still to arrive at actionable criteria. One first approximation be settled on was

1) Forego consumption. 

A nobrainer really, but maybe even that has to be stated. Instead of comparing options that are hard to compare try to avoid consumption where you can. Water instead of wine or fruit juice or lemonde. This saves lots of cognitive ressources.

Shortly after we agreed on the second approximation:

2) Spend more time on optimizing ressources you consume large amounts of.

The example at hand was wine (which we consume only a few times a year) versus toilet paper... No need to feel remorse over a one-time present packaging.

Note that we mostly excluded personal well-being, happiness and hedons from our consideration. We were aware that our goals affect our choices and hedons have to factored into any real strategy, but we left this additional complication out of our analysis - at least for this time.

We did discuss signalling effects. Mostly in the context of how effective ressources can be saved by convincing others to act sustainably. One important aspect for the parents was to pass on the idea and to act as a role model (with the caveat that children need a simplified model to grasp the concept). It was also mentioned humorously that one approach to minimize personal ressource consumption is suicide and transitively to convice others of same. The ultimate solution having no humans on the planet (a solution my 8-year old son - a friend of nature - arrived at too). This apparently being the problem when utilons/hedons are expluded.

A short time we considered whether outreach comes for free (can be done in addition to abstinence) and should be the no-brainer number 3. But it was then realized that at least right now and for us most abstinence comes at a price. It was quoted that buying sustainable products is about 20% more expensive than normal products. Forgoing e.g. a car comes at reduced job options. Some jobs involve supporting less sustainable large-scale action. Having less money means less options to act sustaibale. Time being convertible to money and so on.

At this point the key insight mentioned was that it could be much more efficient from a sustainability point of view to e.g. buy CO_2 certificates than to buy organic products. Except that the CO_2 certificate market is oversupplied currently. But there seem to be organisations which promise to achieve effective CO_2 reduction in developing countries (e.g. solar cooking) at a much higher rate than be achieved here. Thus the thrid rule was

3) Spend money on sustainable organisations instead of on everyday products that only give you a good feeling.

And with this the meetup concluded. We will likely continue this.

A note for parents: Meetups with children can be productive (in the sense of results like the above). We were 7 adults and 7 children (aged 3 to 11). The children mostly entertained themselves and no parent had to leave the discussion for long. And the 11-year-old played a significant role in the meetup itself.

Where can I go to exploit social influence to fight akrasia?

9 Snorri 26 March 2015 03:39PM

Briefly: I'm looking for a person (or group) with whom I can mutually discuss self improvement and personal goals (and nothing else) on a regular basis.

Also, note, this post is an example of asking a personally important question on LW. The following idea is not meant as a general mindhack, but just as something I want to try out myself.

We are unconsciously motivated by those around us. The Milgram experiment and the Asch conformity experiment are the two best examples of social influence that come to my mind, though I'm sure there are plenty more (if you haven't heard of them, I really suggest spending a minute).

I've tended to see this drive to conform to the expectations of others as a weakness of the human mind, and yes, it can be destructive. However, as long as its there, I should exploit it. Specifically, I want to exploit it to fight akrasia.

Utilizing positive social influence is a pretty common tactic for fighting drug addictions (like in AA), but I haven't really heard of it being used to fight unproductivity. Sharing your personal work/improvement goals with someone in the same position as yourself, along with reflecting on previous attempts, could potentially be powerful. Humans simply feel more responsible for the things they tell other people about, and less responsible for the things they bottle up and don't tell anyone (like all of my productivity strategies).

The setup that I envision would be something like this:

  • On a chat room, or some system like skype.1
  • Meet weekly at a very specific time for a set amount of time.
  • Your partner has a list of the productivity goals you set during the previous session. They ask you about your performance, forcing you to explain either your success or your failure.
  • Your partner tries to articulate what went wrong or what went right from your explanation (giving you a second perspective).
  • Once both parties have shared and evaluated, you set your new goals in light of your new experience (and with your partner's input, hopefully being more effective).
  • The partnership continues as long as it is useful for all parties.

I've tried doing something similar to this with my friends, but it just didn't work. We already knew each other too well, and there wasn't that air of dispassionate professionality. We were friends, but not partners (in this sense of the word).

If something close to what I describe already exists, or at least serves the same purpose, I would love to hear about it (I already tried the LW study hall, but it wasn't really the structure or atmosphere I was going for). Otherwise, I'd be thrilled to find someone here to try doing this with. You can PM me if you don't want to post here.



1. After explaining this whole idea to someone IRL, they remarked that there would be little social influence because we would only be meeting online in a pseudo-anonymous way. However, I don't find this to be the case personally when I talk with people online, so a virtual environment would be no detriment (hopefully this isn't just unique to me).

Edit (29/3/2015): Just for the record, I wanted to say that I was able to make the connection I wanted, via a PM. Thanks LW!

Indifferent vs false-friendly AIs

9 Stuart_Armstrong 24 March 2015 12:13PM

A putative new idea for AI control; index here.

For anyone but an extreme total utilitarian, there is a great difference between AIs that would eliminate everyone as a side effect of focusing on their own goals (indifferent AIs) and AIs that would effectively eliminate everyone through a bad instantiation of human-friendly values (false-friendly AIs). Examples of indifferent AIs are things like paperclip maximisers, examples of false-friendly AIs are "keep humans safe" AIs who entomb everyone in bunkers, lobotomised and on medical drips.

The difference is apparent when you consider multiple AIs and negotiations between them. Imagine you have a large class of AIs, and that they are all indifferent (IAIs), except for one (which you can't identify) which is friendly (FAI). And you now let them negotiate a compromise between themselves. Then, for many possible compromises, we will end up with most of the universe getting optimised for whatever goals the AIs set themselves, while a small portion (maybe just a single galaxy's resources) would get dedicated to making human lives incredibly happy and meaningful.

But if there is a false-friendly AI (FFAI) in the mix, things can go very wrong. That is because those happy and meaningful lives are a net negative to the FFAI. These humans are running dangers - possibly physical, possibly psychological - that lobotomisation and bunkers (or their digital equivalents) could protect against. Unlike the IAIs, which would only complain about the loss of resources to the FAI, the FFAI finds the FAI's actions positively harmful (and possibly vice versa), making compromises much harder to reach.

And the compromises reached might be bad ones. For instance, what if the FAI and FFAI agree on "half-lobotomised humans" or something like that? You might ask why the FAI would agree to that, but there's a great difference to an AI that would be friendly on its own, and one that would choose only friendly compromises with a powerful other AI with human-relevant preferences.

Some designs of FFAIs might not lead to these bad outcomes - just like IAIs, they might be content to rule over a galaxy of lobotomised humans, while the FAI has its own galaxy off on its own, where its humans take all these dangers. But generally, FFAIs would not come about by someone designing a FFAI, let alone someone designing a FFAI that can safely trade with a FAI. Instead, they would be designing a FAI, and failing. And the closer that design got to being FAI, the more dangerous the failure could potentially be.

So, when designing an FAI, make sure to get it right. And, though you absolutely positively need to get it absolutely right, make sure that if you do fail, the failure results in a FFAI that can safely be compromised with, if someone else gets out a true FAI in time.

Why I Reject the Correspondence Theory of Truth

9 pragmatist 24 March 2015 11:00AM

This post began life as a comment responding to Peer Gynt's request for a steelman of non-correspondence views of truth. It ended up being far too long for a comment, so I've decided to make it a separate post. However, it might have the rambly quality of a long comment rather than a fully planned out post.

Evaluating Models

Let's say I'm presented with a model and I'm wondering whether I should incorporate it into my belief-set. There are several different ways I could go about evaluating the model, but for now let's focus on two. The first is pragmatic. I could ask how useful the model would be for achieving my goals. Of course, this criterion of evaluation depends crucially on what my goals actually are. It must also take into account several other factors, including my cognitive abilities (perhaps I am better at working with visual rather than verbal models) and the effectiveness of alternative models available to me. So if my job is designing cannons, perhaps Newtonian mechanics is a better model than relativity, since the calculations are easier and there is no significant difference in the efficacy of the technology I would create using either model correctly. On the other hand, if my job is designing GPS systems, relativity might be a better model, with the increased difficulty of calculations being compensated by a significant improvement in effectiveness. If I design both cannons and GPS systems, then which model is better will vary with context.

Another mode of evaluation is correspondence with reality, the extent to which the model accurately represents its domain. In this case, you don't have much of the context-sensitivity that's associated with pragmatic evaluation. Newtonian mechanics may be more effective than the theory of relativity at achieving certain goals, but (conventional wisdom says) relativity is nonetheless a more accurate representation of the world. If the cannon maker believes in Newtonian mechanics, his beliefs don't correspond with the world as well as they should. According to correspondence theorists, it is this mode of evaluation that is relevant when we're interested in truth. We want to know how well a model mimics reality, not how useful it is.

I'm sure most correspondence theorists would say that the usefulness of a model is linked to its truth. One major reason why certain models work better than others is that they are better representations of the territory. But these two motivations can come apart. It may be the case that in certain contexts a less accurate theory is more useful or effective for achieving certain goals than a more accurate theory. So, according to a correspondence theorist, figuring out which model is most effective in a given context is not the same thing as figuring out which model is true.

How do we go about these two modes of evaluation? Well, evaluation of the pragmatic success of a model is pretty easy. Say I want to figure out which of several models will best serve the purpose of keeping me alive for the next 30 days. I can randomly divide my army of graduate students into several groups, force each group to behave according to the dictates of a separate model, and then check which group has the highest number of survivors after 30 days. Something like that, at least.

But how do I evaluate whether a model corresponds with reality? The first step would presumable involve establishing correspondences between parts of my model and parts of the world. For example, I could say "Let mS in my model represent the mass of the Sun." Then I check to see if the structural relations between the bits of my model match the structural relations between the corresponding bits of the world. Sounds simple enough, right? Not so fast! The procedure described above relies on being able to establish (either by stipulation or discovery) relations between the model and reality. That presupposes that we have access to both the model and to reality, in order to correlate the two. In what sense do we have "access" to reality, though? How do I directly correlate a piece of reality with a piece of my model?

Models and Reality

Our access to the external world is entirely mediated by models, either models that we consciously construct (like quantum field theory) or models that our brains build unconsciously (like the model of my immediate environment produced in my visual cortex). There is no such thing as pure, unmediated, model-free access to reality. But we often do talk about comparing our models to reality. What's going on here? Wouldn't such a comparison require us to have access to reality independent of the models? Well, if you think about it, whenever we claim to be comparing a model to reality, we're really comparing one model to another model. It's just that we're treating the second model as transparent, as an uncontroversial proxy for reality in that context. Those last three words matter: A model that is used as a criterion for reality in one investigative context might be regarded as controversial -- as explicitly a model of reality rather than reality itself -- in another context.

Let's say I'm comparing a drawing of a person to the actual person. When I say things like "The drawing has a scar on the left side of the face, but in reality the scar is on the right side", I'm using the deliverances of visual perception as my criterion for "reality". But in another context, say if I'm talking about the psychology of perception, I'd talk about my perceptual model as compared (and, therefore, contrasted) to reality. In this case my criterion for reality will be something other than perception, say the readings from some sort of scientific instrument. So we could say things like, "Subjects perceive these two colors as the same, but in reality they are not." But by "reality" here we mean something like "the model of the system generated by instruments that measure surface reflectance properties, which in turn are built based on widely accepted scientific models of optical phenomena".

When we ordinarily talk about correspondence between models and reality, we're really talking about the correspondence between bits of one model and bits of another model. The correspondence theory of truth, however, describes truth as a correspondence relation between a model and the world itself. Not another model of the world, the world. And that, I contend, is impossible. We do not have direct access to the world. When I say "Let mS represent the mass of the Sun", what I'm really doing is correlating a mathematical model with a verbal model, not with immediate reality. Even if someone asks me "What's the Sun?", and I point at the big light in the sky, all I'm doing is correlating a verbal model with my visual model (a visual model which I'm fairly confident is extremely similar, though not exactly the same, as the visual model of my interlocutor). Describing correspondence as a relationship between models and the world, rather than a relationship between models and other models, is a category error.

So I can go about the procedure of establishing correspondences all I want, correlating one model with another. All this will ultimately get me is coherence. If all my models correspond with one another, then I know that there is no conflict between my different models. My theoretical model coheres with my visual model, which coheres with my auditory model, and so on. Some philosophers have been content to rest here, deciding that coherence is all there is to truth. If the deliverances of my scientific models match up with the deliverances of my perceptual models perfectly, I can say they are true. But there is something very unsatisfactory about this stance. The world has just disappeared. Truth, if it is anything at all, involves both our models and the world. However, the world doesn't feature in the coherence conception of truth. I could be floating in a void, hallucinating various models that happen to cohere with one another perfectly, and I would have attained the truth. That can't be right.

Correspondence Can't Be Causal

The correspondence theorist may object that I've stacked the deck by requiring that one consciously establish correlations between models and the world. The correspondence isn't a product of stipulation or discovery, it's a product of basic causal connections between the world and my brain. This seems to be Eliezer's view. Correspondence relations are causal relations. My model of the Sun corresponds with the behavior of the actual Sun, out there in the real world, because my model was produced by causal interactions between the actual Sun and my brain.

But I don't think this maneuver can save the correspondence theory. The correspondence theory bases truth on a representational relationship between models/beliefs and the world. A model is true if it accurately represents its domain. Representation is a normative relationship. Causation is not. What I mean by this is that representation has correctness conditions. You can meaningfully say "That's a good representation" or "That's a bad representation". There is no analog with causation. There's no sense in which some particular putatively causal relation ends up being a "bad" causal relation. Ptolemy's beliefs about the Sun's motion were causally entangled with the Sun, yet we don't want to say that those beliefs are accurate. It seems mere causal entanglement is insufficient. We need to distinguish between the right sort of causal entanglement (the sort that gets you an accurate picture of the world) and the wrong sort. But figuring out this distinction takes us back to the original problem. If we only have immediate access to models, on what basis can we decide whether our models are caused by the world in a manner that produces an accurate picture. To determine this, it seems we again need unmediated access to the world.

Back to Pragmatism

Ultimately, it seems to me the only clear criterion the correspondence theorist can establish for correlating the model with the world is actual empirical success. Use the model and see if it works for you, if it helps you attain your goals. But this is exactly the same as the pragmatic mode of evaluation which I described above. And the representational mode of evaluation is supposed to differ from this.

The correspondence theorist could say that pragmatic success is a proxy for representational success. Not a perfect proxy, but good enough. The response is, "How do you know?" If you have no independent means of determining representational success, if you have no means of calibration, how can you possibly determine whether or not pragmatic success is a good proxy for representational success? I mean, I guess you can just assert that a model that is extremely pragmatically successful for a wide range of goals also corresponds well with reality, but how does that assertion help your theory of truth? It seems otiose. Better to just associate truth with pragmatic success itself, rather than adding the unjustifiable assertion to rescue the correspondence theory.

So yeah, ultimately I think the second of the two means of evaluating models I described at the beginning (correspondence) can only really establish coherence between your various models, not coherence between your models and the world. Since that sort of evaluation is not world-involving, it is not the correct account of truth. Pragmatic evaluation, on the other hand, *is* world-involving. You're testing your models against the world, seeing how effective they are at helping you accomplish your goal. That is the appropriate normative relationship between your beliefs and the world, so if anything deserves to be called "truth", it's pragmatic success, not correspondence.

This has consequences for our conception of what "reality" is. If you're a correspondence theorist, you think reality must have some form of structural similarity to our beliefs. Without some similarity in structure (or at least potential similarity) it's hard to say how one meaningfully could talk about beliefs representing reality or corresponding to reality. Pragmatism, on the other hand, has a much thinner conception of reality. The real world, on the pragmatic conception is just an external constraint on the efficacy of our models. We try to achieve certain goals using our models and something pushes back, stymieing our efforts. Then we need to build improved models in order to counteract this resistance. Bare unconceptualized reality, on this view, is not a highly structured field whose structure we are trying to grasp. It is a brute, basic constraint on effective action.

It turns out that working around this constraint requires us to build complex models -- scientific models, perceptual models, and more. These models become proxies for reality, and we treat various models as "transparent", as giving us a direct view of reality, in various contexts. This is a useful tool for dealing with the constraints offered by reality. The models are highly structured, so in many contexts it makes sense to talk about reality as highly structured, and to talk about our other models matching reality. But it is also important to realize that when we say "reality" in those contexts, we are really talking about some model, and in other contexts that model need not be treated as transparent. Not realizing this is an instance of the mind projection fallacy. If you want a context-independent, model-independent notion of reality, I think you can say no more about it than "a constraint on our models' efficacy".

That sort of reality is not something you represent (since representation assumes structural similarity), it's something you work around. Our models don't mimic that reality, they are tools we use to facilitate effective action under the constraints posed by reality. All of this, as I said at the beginning, is goal and context dependent, unlike the purported correspondence theory mode of evaluating models. That may not be satisfactory, but I think it's the best we have. Pragmatist theory of truth for the win.


Resolving the Fermi Paradox: New Directions

8 jacob_cannell 18 April 2015 06:00AM

Our sun appears to be a typical star: unremarkable in age, composition, galactic orbit, or even in its possession of many planets.  Billions of other stars in the milky way have similar general parameters and orbits that place them in the galactic habitable zone.  Extrapolations of recent expolanet surveys reveal that most stars have planets, removing yet another potential unique dimension for a great filter in the past.  

According to Google, there are 20 billion earth like planets in the Galaxy.

A paradox indicates a flaw in our reasoning or our knowledge, which upon resolution, may cause some large update in our beliefs.

Ideally we could resolve this through massive multiscale monte carlo computer simulations to approximate Solonomoff Induction on our current observational data.  If we survive and create superintelligence, we will probably do just that.

In the meantime, we are limited to constrained simulations, fermi estimates, and other shortcuts to approximate the ideal bayesian inference.

The Past

While there is still obvious uncertainty concerning the likelihood of the series of transitions along the path from the formation of an earth-like planet around a sol-like star up to an early tech civilization, the general direction of the recent evidence flow favours a strong Mediocrity Principle.

Here are a few highlight developments from the last few decades relating to an early filter:

  1. The time window between formation of earth and earliest life has been narrowed to a brief interval.  Panspermia has also gained ground, with some recent complexity arguments favoring a common origin of life at 9 billion yrs ago.[1]
  2. Discovery of various extremophiles indicate life is robust to a wider range of environments than the norm on earth today.
  3. Advances in neuroscience and studies of animal intelligence lead to the conclusion that the human brain is not nearly as unique as once thought.  It is just an ordinary scaled up primate brain, with a cortex enlarged to 4x the size of a chimpanzee.  Elephants and some cetaceans have similar cortical neuron counts to the chimpanzee, and demonstrate similar or greater levels of intelligence in terms of rituals, problem solving, tool use, communication, and even understanding rudimentary human language.  Elephants, cetaceans, and primates are widely separated lineages, indicating robustness and inevitability in the evolution of intelligence.

So, if there is a filter, it probably lies in the future (or at least the new evidence tilts us in that direction).

The Future(s)

When modelling the future development of civilization, we must recognize that the future is a vast cloud of uncertainty compared to the past.  The best approach is to focus on the most key general features of future postbiological civilizations, categorize the full space of models, and then update on our observations to determine what ranges of the parameter space are excluded and which regions remain open.

An abridged taxonomy of future civilization trajectories :


Civilization is wiped out due to an existential catastrophe that sterilizes the planet sufficient enough to kill most large multicellular organisms, essentially resetting the evolutionary clock by a billion years.  Given the potential dangers of nanotech/AI/nuclear weapons - and then aliens, I believe this possibility is significant - ie in the 1% to 50% range.

Biological/Mixed Civilization:

This is the old-skool sci-fi scenario.  Humans or our biological descendants expand into space.  AI is developed but limited to human intelligence, like CP30.  No or limited uploading.

This leads eventually to slow colonization, terraforming, perhaps eventually dyson spheres etc.

This scenario is almost not worth mentioning: prior < 1%.  Unfortunately SETI in current form is till predicated on a world model that assigns a high prior to these futures.

PostBiological Warm-tech AI Civilization:

This is Kurzweil/Moravec's sci-fi scenario.  Humans become postbiological, merging with AI through uploading.  We become a computational civilization that then spreads out some fraction of the speed of light to turn the galaxy into computronium.  This particular scenario is based on the assumption that energy is a key constraint, and that civilizations are essentially stellavores which harvest the energy of stars.

One of the very few reasonable assumptions we can make about any superintelligent postbiological civilization is that higher intelligence involves increased computational efficiency.  Advanced civs will upgrade into physical configurations that maximize computation capabilities given the local resources.

Thus to understand the physical form of future civs, we need to understand the physical limits of computation.

One key constraint is the Landauer Limit, which states that the erasure (or cloning) of one bit of information requires a minimum of kTln2 joules.  At room temperature (293 K), this corresponds to a minimum of 0.017 eV to erase one bit.  Minimum is however the keyword here, as according to the principle, the probability of the erasure succeeding is only 50% at the limit.  Reliable erasure requires some multiple of the minimal expenditure - a reasonable estimate being about 100kT or 1eV as the minimum for bit erasures at today's levels of reliability.

Now, the second key consideration is that Landauer's Limit does not include the cost of interconnect, which is already now dominating the energy cost in modern computing.  Just moving bits around dissipates energy.

Moore's Law is approaching its asymptotic end in a decade or so due to these hard physical energy constraints and the related miniaturization limits.

I assign a prior to the warm-tech scenario that is about the same as my estimate of the probability that the more advanced cold-tech (reversible quantum computing, described next) is impossible: < 10%.

From Warm-tech to Cold-tech

There is a way forward to vastly increased energy efficiency, but it requires reversible computing (to increase the ratio of computations per bit erasures), and full superconducting to reduce the interconnect loss down to near zero.

The path to enormously more powerful computational systems necessarily involves transitioning to very low temperatures, and the lower the better, for several key reasons:

  1. There is the obvious immediate gain that one gets from lowering the cost of bit erasures: a bit erasure at room temperature costs 100 times more than a bit erasure at the cosmic background temperature, and a hundred thousand times more than an erasure at 0.01K (the current achievable limit for large objects)
  2. Low temperatures are required for most superconducting materials regardless.
  3. The delicate coherence required for practical quantum computation requires or works best at ultra low temperatures.
At a more abstract level, the essence of computation is precise control over the physical configurations of a device as it undergoes complex state transitions.  Noise/entropy is the enemy of control, and temperature is a form of noise.  

Assuming large scale quantum computing is possible, then the ultimate computer is thus a reversible massively entangled quantum device operating at absolute zero.  Unfortunately, such a device would be delicate to a degree that is hard to imagine - even a single misplaced high energy particle could cause enormous damage.

In this model, advanced computational civilization would take the form of a compact body (anywhere from asteroid to planet size) that employs layers of sophisticated shielding to deflect as much of the incoming particle flux as possible.  The ideal environment for such a device is as far away from hot stars as one can possibly go, and the farther the better.  The extreme energy efficiency of advanced low temperature reversible/quantum computing implies that energy is not a constraint.  These advanced civilizations could probably power themselves using fusion reactors for millions, if not billions, of years.

Stellar Escape Trajectories

For a cold-tech civilization, one interesting long term strategy involves escaping the local star's orbit to reach the colder interstellar medium, and eventually the intergalactic medium.

If we assume that these future civs have long planning horizons (reasonable), we can consider this an investment that has an initial cost in terms of the energy required to achieve escape velocity and a return measured in the future integral of computation gained over the trajectory due to increased energy efficiency.  Expendable boost mass in the system can be used, and domino chains of complex chaotic gravitational assist maneuvers computed by deep simulations may offer a route to expel large objects using reasonable amounts of energy.[3]

The Great Game 

Given the constraints of known physics (ie no FTL), it appears that the computational brains housing more advanced cold-tech civs will be incredibly vulnerable to hostile aliens.  A relativistic kill vehicle is a simple technology that permits little avenue for direct defense.  The only strong defense is stealth.

Although the utility functions and ethics of future civs are highly speculative, we can observe that a very large space of utility functions lead to similar convergent instrumental goals involving control over one's immediate future light cone.  If we assume that some civs are essentially selfish, then the dynamics suggest successful strategies will involve stealth and deception to avoid detection combined with deep simulation sleuthing to discover potential alien civs and their locations.

If two civs both discover each other's locations around the same time, then MAD (mutually assured destruction) dynamics takeover and cooperation has stronger benefits.  The vast distances involve suggest that one sided discoveries are more likely.

Spheres of Influence

A new civ, upon achieving the early postbiological stage of development (earth in say 2050?), should be able to resolve the general answer to the fermi paradox using advanced deep simulation alone - long before any probes would reach distant stars.  Assuming that the answer is "lots of aliens", then further simulations could be used to estimate the relative likelihood of elder civs interacting with the past lightcone.  

The first few civilizations would presumably realize that the galaxy is more likely to be mostly colonized, in which case the ideal strategy probably involves expansion of actuator type devices (probes, construction machines) into nearby systems combined with construction and expulsion of advanced stealthed coldtech brains out into the void.  On the other hand, the very nature of the stealth strategy suggests that it may be hard to confidently determine how colonized the galaxy is. 

For civilizations appearing later, the situation is more complex.  The younger a civ estimates itself to be in the cosmic order, the more likely it becomes that it's local system has already come under an alien influence.

From the perspective of an elder civ, an alien planet at a pre-singularity level of development has no immediate value.  Raw materials are plentiful - and most of the baryonic mass appears to be interstellar and free floating.  The tiny relative value of any raw materials on a biological world are probably outweighed - in the long run - by the potential future value of information trade with the resulting mature civ.

Each biological world - or seed of a future elder civ - although perhaps similar in abstract, is unique in details.  Each such world is valuable in the potential unique knowledge/insights it may eventually generate - directly or indirectly.  From a pure instrumental rational standpoint, there is some value in preserving biological worlds to increase general knowledge of civ development trajectories.

However, there could exist cases where the elder civ may wish to intervene.  For example, if deep simulations predict that the younger world will probably develop into something unfriendly - like an aggressive selfish/unfriendly replicator - then small pertubations in the natural trajectory could be called for.  In short the elder civ may have reasons to occasionally 'play god'.

On the other hand, any intervention itself would leave a detectable signature or trace in the historical trajectory which in turn could be detected by another rival or enemy civ!  In the best case these clues would only reveal the presence of an alien influence.  In the worst case they could reveal information concerning the intervening elder civ's home system and the likely locations of its key assets.

Around 70,000 years ago, we had a close encounter with Scholz's star, which passed with 0.8 light years of the sun (within the oort cloud).  If the galaxy is well colonized, flybys such as this have potentially interesting implications  (that particular flyby corresponds to the estimated time of the Toba super-eruption, for example).

Conditioning on our Observational Data

Over the last few decades SETI has searched a small portion of the parameter space covering potential alien civs.  

SETI's original main focus concerned the detection of large permanent alien radio beacons.  We can reasonably rule out models that predict advanced civs constructing high energy omnidirectional radio beacons.

At this point we can also mostly rule out large hot-tech civilizations (energy constrained civilizations) that harvest most of the energy from stars.

Obviously detecting cold-tech civilizations is considerably more difficult, and perhaps close to impossible if advanced stealth is a convergent strategy.

However, determining whether the galaxy as a whole is colonized by advanced stealth civs is a much easier problem.  In fact, one way or another the evidence is already right in front of us.  We now know that most of the mass in the galaxy is dark rather than light.  I have assumed that coldtech still involves baryonic matter and normal physics, but of course there is also the possibility that non-baryonic matter could be used for computation.  Either way, the dark matter situation is favorable.  Focusing on normal baryonic matter, the ratio of dark/cold to light/hot is still large - very favorable for colonization.

Observational Selection Effects

All advanced civs will have strong instrumental reasons to employ deep simulations to understand and model developmental trajectories for the galaxy as a whole and for civilizations in particular.  A very likely consequence is the production of large numbers of simulated conscious observers, ala the Simulation Argument.  Universes with the more advanced low temperature reversible/quantum computing civilizations will tend to produce many more simulated observer moments and are thus intrinsically more likely than one would otherwise expect - perhaps massively so.


Rogue Planets

If the galaxy is already colonized by stealthed coldtech civs, then one prediction is that some fraction of the stellar mass has been artificially ejected.  Some recent observations actually point - at least weakly - in this direction.

From "Nomads of The Galaxy"[4]

We estimate that there may be up to ∼ 10^5 compact objects in the mass range 10^−8 to 10^−2M⊙
per main sequence star that are unbound to a host star in the Galaxy. We refer to these objects as
nomads; in the literature a subset of these are sometimes called free-floating or rogue planets.

Although the error range is still large, it appears that free floating planets outnumber planets bound to stars, and perhaps by a rather large margin.

Assuming the galaxy is colonized:  It could be that rogue planets form naturally outside of stars and then are colonized.  It could be they form around stars and then are ejected naturally (and colonized).  Artificial ejection - even if true - may be a rare event.  Or not.  But at least a few of these options could potentially be differentiated with future observations - for example if we find an interesting discrepancy in the rogue planet distribution predicted by simulations (which obviously do not yet include aliens!) and actual observations.

Also: if rogue planets outnumber stars by a large margin, then it follows that rogue planet flybys are more common in proportion.



SETI to date allows us to exclude some regions of the parameter space for alien civs, but the regions excluded correspond to low prior probability models anyway, based on the postbiological perspective on the future of life.  The most interesting regions of the parameter space probably involve advanced stealthy aliens in the form of small compact cold objects floating in the interstellar medium.

The upcoming WFIST telescope should shed more light on dark matter and enhance our microlensing detection abilities significantly.  Sadly, it's planned launch date isn't until 2024.  Space development is slow.


Effective effective altruism: Get $400 off your next charity donation

8 Baisius 17 April 2015 05:45AM

For those of you unfamiliar with Churning, it's the practice of signing up for a rewards credit card, spending enough with your everyday purchases to get the (usually significant) reward and then cancelling it. Many of these cards are cards with annual fees (which is commonly waived and/or the one-time reward will pay for). For a nominal amount of work, you can churn cards for significant bonuses.

Ordinarily I wouldn't come close to spending enough money to qualify for many of these rewards, but I recently made the Giving What You Can pledge. I now have a steady stream of predictable expenses, and conveniently, GiveWell allows donations via most any credit card. I've started using new rewards cards to pay for these expenses each time, resulting in free flights (this is how I'm paying to fly to NYC this summer), Amazon gift cards, or sometimes just straight cash.

Since the first of the year (total expenses $4000, including some personal expenses) I've churned $700 worth of bonuses (from a Delta Amazon Express Gold and a Capital One Venture Card). This money can be redonated, saved, spent, or whatever.


1. I hope it goes without saying that you should pay off your balance in full each month, just like you should with any other card.

2. This has some negative impact on your credit, in the short term.

3. It should be noted that credit card companies make at least some money (I think 3%) off of your transactions, so if you're trying to hit a target of X% to charity, you would need to donate X/0.97, or 10.31% for 10% to account for that 3%. The reward should more than cover it.

4. Read more about this, including the pros and cons, from multiple sources before you try it. It's not something that should be done lightly, but does synergize very nicely with charity donations.

What level of compassion do you consider normal, expected, mandatory etc. ?

8 DeVliegendeHollander 10 April 2015 12:57PM

My hidden secret goal is to understand the sentiments behind social justice better, however  I will refrain from asking questions that directly relate to it, as they can be mind-killers, instead, I have constructed an entirely apolitical, and probably safe thought experiment involving a common everyday problem that shouldn't be incisive.

Alice is living in an apartment, she is listening to music. The volume of her music is well within what is allowed by the regulations or social norms. Yet the neighbor is still complaining and wants her to turn it down, claiming that she (the neighbor) is unusually sensitive to noise due to some kind of ear or mental condition. 

Bob, Alice's friend is also present, and he makes a case that while she can turn it down basically out of niceness or neighborliness, this level of kindness is going far beyond the requirements of duty, and should be considered a favor, because she has no ethical duty to turn it down, for the following reasons.

1) Her volume level of music is usual, it is the sensitivity level of the neighbor that is unusual, and we are under no duty to cater to every special need of others.

2) In other words, it is okay to cause suffering to others as long as it is a usual, common, accepted thing to do that would not cause suffering to a typical person.

The reasons for this are

A) It would be too hard to do otherwise, to cater to every special need, in this case it is easy, but not in all cases, so this is no general principle.

B/1) It would not help the other person much, if the other person is unusually sensitive, the problem would not be fixed by one person catering to them. A hundred people should cater to it, after all there are many sources of noise in the neighborhood.

B/2) In other words, if you are unusually rude, reducing it to usual levels of rudeness is efficient, because by that one move you made a lot of people content. But if you are already on the usual levels of rudeness and an unusually sensitive person is still suffering, further reduction is less efficient because you are only one of the many sources of their suffering. And these people are few anyway.

C) Special needs are easy to fake.

D) People should really work on toughening up and growing a thicker skin, it is actually possible.

Polls in comments below


Please explain your view in the comments.


Language Learning and the Dark Arts.

8 Lemmih 06 April 2015 11:33AM

I've been wanting to learn Mandarin Chinese for years now and just recently I wrote a small website to help me practise.1 All of the exercises are gap sentences that require you to type the correct answer before you can move on. I chose this kind of exercise because of the convincing evidence for the spacing effect and the testing effect.

Knocking through a bunch of exercises every day feels efficient but it's not exactly fun and I put in less time than I should. I've found two things that help with this: setting small and achievable goals, and reading short stories once I'm proficient with the vocabulary. And if there are two ways to make practicing more fun, there gotta be a lot more that I haven't thought about. So, how do I make myself work harder? Are there are any of the so called Dark Arts that are more than hearsay and could work in my favor? How do you people out there learn foreign languages and how do you keep yourself from giving up or slowing down? Do you use the pomodoro technique?

Cheers, David.



Edit: more on -> move on.

Futarchy and Unfriendly AI

8 jkaufman 03 April 2015 09:45PM

We have a reasonably clear sense of what "good" is, but it's not perfect. Suffering is bad, pleasure is good, more people living enjoyable lives is good, yes, but tradeoffs are hard. How much worse is it to go blind than to lose your leg? [1] How do we compare the death of someone at eighty to the death of someone at twelve? If you wanted to build some automated system that would go from data about the world to a number representing how well it's doing, where you would prefer any world that scored higher to any world scoring lower, that would be very difficult.

Say, however, that you've built a metric that you think matches your values well and you put some powerful optimizer to work maximizing that metric. This optimizer might do many things you think are great, but it might be that the easiest ways to maximize the metric are the ones that pull it apart from your values. Perhaps after it's in place it turns out your metric included many things that only strongly correlated with what you cared about, where the correlation breaks down under maximization.

What confuses me is that the people who warn about this scenario with respect to AI are often the same people in favor of futarchy. They both involve trying to define your values and then setting an indifferent optimizer to work on them. If you think AI would be very dangerous but futarchy would be very good, why?

I also posted this on my blog.

[1] This is a question people working in public health try to answer with Disability Weights for DALYs.

Superintelligence 29: Crunch time

8 KatjaGrace 31 March 2015 04:24AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-ninth section in the reading guideCrunch time. This corresponds to the last chapter in the book, and the last discussion here (even though the reading guide shows a mysterious 30th section). 

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 15


  1. As we have seen, the future of AI is complicated and uncertain. So, what should we do? (p255)
  2. Intellectual discoveries can be thought of as moving the arrival of information earlier. For many questions in math and philosophy, getting answers earlier does not matter much. Also people or machines will likely be better equipped to answer these questions in the future. For other questions, e.g. about AI safety, getting the answers earlier matters a lot. This suggests working on the time-sensitive problems instead of the timeless problems. (p255-6)
  3. We should work on projects that are robustly positive value (good in many scenarios, and on many moral views)
  4. We should work on projects that are elastic to our efforts (i.e. cost-effective; high output per input)
  5. Two objectives that seem good on these grounds: strategic analysis and capacity building (p257)
  6. An important form of strategic analysis is the search for crucial considerations. (p257)
  7. Crucial consideration: idea with the potential to change our views substantially, e.g. reversing the sign of the desirability of important interventions. (p257)
  8. An important way of building capacity is assembling a capable support base who take the future seriously. These people can then respond to new information as it arises. One key instantiation of this might be an informed and discerning donor network. (p258)
  9. It is valuable to shape the culture of the field of AI risk as it grows. (p258)
  10. It is valuable to shape the social epistemology of the AI field. For instance, can people respond to new crucial considerations? Is information spread and aggregated effectively? (p258)
  11. Other interventions that might be cost-effective: (p258-9)
    1. Technical work on machine intelligence safety
    2. Promoting 'best practices' among AI researchers
    3. Miscellaneous opportunities that arise, not necessarily closely connected with AI, e.g. promoting cognitive enhancement
  12. We are like a large group of children holding triggers to a powerful bomb: the situation is very troubling, but calls for bitter determination to be as competent as we can, on what is the most important task facing our times. (p259-60)

Another view

Alexis Madrigal talks to Andrew Ng, chief scientist at Baidu Research, who does not think it is crunch time:

Andrew Ng builds artificial intelligence systems for a living. He taught AI at Stanford, built AI at Google, and then moved to the Chinese search engine giant, Baidu, to continue his work at the forefront of applying artificial intelligence to real-world problems.

So when he hears people like Elon Musk or Stephen Hawking—people who are not intimately familiar with today’s technologies—talking about the wild potential for artificial intelligence to, say, wipe out the human race, you can practically hear him facepalming.

“For those of us shipping AI technology, working to build these technologies now,” he told me, wearily, yesterday, “I don’t see any realistic path from the stuff we work on today—which is amazing and creating tons of value—but I don’t see any path for the software we write to turn evil.”

But isn’t there the potential for these technologies to begin to create mischief in society, if not, say, extinction?

“Computers are becoming more intelligent and that’s useful as in self-driving cars or speech recognition systems or search engines. That’s intelligence,” he said. “But sentience and consciousness is not something that most of the people I talk to think we’re on the path to.”

Not all AI practitioners are as sanguine about the possibilities of robots. Demis Hassabis, the founder of the AI startup DeepMind, which was acquired by Google, made the creation of an AI ethics board a requirement of its acquisition. “I think AI could be world changing, it’s an amazing technology,” he told journalist Steven Levy. “All technologies are inherently neutral but they can be used for good or bad so we have to make sure that it’s used responsibly. I and my cofounders have felt this for a long time.”

So, I said, simply project forward progress in AI and the continued advance of Moore’s Law and associated increases in computers speed, memory size, etc. What about in 40 years, does he foresee sentient AI?

“I think to get human-level AI, we need significantly different algorithms and ideas than we have now,” he said. English-to-Chinese machine translation systems, he noted, had “read” pretty much all of the parallel English-Chinese texts in the world, “way more language than any human could possibly read in their lifetime.” And yet they are far worse translators than humans who’ve seen a fraction of that data. “So that says the human’s learning algorithm is very different.”

Notice that he didn’t actually answer the question. But he did say why he personally is not working on mitigating the risks some other people foresee in superintelligent machines.

“I don’t work on preventing AI from turning evil for the same reason that I don’t work on combating overpopulation on the planet Mars,” he said. “Hundreds of years from now when hopefully we’ve colonized Mars, overpopulation might be a serious problem and we’ll have to deal with it. It’ll be a pressing issue. There’s tons of pollution and people are dying and so you might say, ‘How can you not care about all these people dying of pollution on Mars?’ Well, it’s just not productive to work on that right now.”

Current AI systems, Ng contends, are basic relative to human intelligence, even if there are things they can do that exceed the capabilities of any human. “Maybe hundreds of years from now, maybe thousands of years from now—I don’t know—maybe there will be some AI that turn evil,” he said, “but that’s just so far away that I don’t know how to productively work on that.”

The bigger worry, he noted, was the effect that increasingly smart machines might have on the job market, displacing workers in all kinds of fields much faster than even industrialization displaced agricultural workers or automation displaced factory workers.

Surely, creative industry people like myself would be immune from the effects of this kind of artificial intelligence, though, right?

“I feel like there is more mysticism around the notion of creativity than is really necessary,” Ng said. “Speaking as an educator, I’ve seen people learn to be more creative. And I think that some day, and this might be hundreds of years from now, I don’t think that the idea of creativity is something that will always be beyond the realm of computers.”

And the less we understand what a computer is doing, the more creative and intelligent it will seem. “When machines have so much muscle behind them that we no longer understand how they came up with a novel move or conclusion,” he concluded, “we will see more and more what look like sparks of brilliance emanating from machines.”

Andrew Ng commented:

Enough thoughtful AI researchers (including Yoshua Bengio​, Yann LeCun) have criticized the hype about evil killer robots or "superintelligence," that I hope we can finally lay that argument to rest. This article summarizes why I don't currently spend my time working on preventing AI from turning evil. 


1. Replaceability

'Replaceability' is the general issue of the work that you do producing some complicated counterfactual rearrangement of different people working on different things at different times. For instance, if you solve a math question, this means it gets solved somewhat earlier and also someone else in the future does something else instead, which someone else might have done, etc. For a much more extensive explanation of how to think about replaceability, see 80,000 Hours. They also link to some of the other discussion of the issue within Effective Altruism (a movement interested in efficiently improving the world, thus naturally interested in AI risk and the nuances of evaluating impact).

2. When should different AI safety work be done?

For more discussion of timing of work on AI risks, see Ord 2014. I've also written a bit about what should be prioritized early.

3. Review

If you'd like to quickly review the entire book at this point, Amanda House has a summary here, including this handy diagram among others: 

4. What to do?

If you are convinced that AI risk is an important priority, and want some more concrete ways to be involved, here are some people working on it: FHIFLICSERGCRIMIRIAI Impacts (note: I'm involved with the last two). You can also do independent research from many academic fields, some of which I have pointed out in earlier weeks. Here is my list of projects and of other lists of projects. You could also develop expertise in AI or AI safety (MIRI has a guide to aspects related to their research here; all of the aforementioned organizations have writings). You could also work on improving humanity's capacity to deal with such problems. Cognitive enhancement is one example. Among people I know, improving individual rationality and improving the effectiveness of the philanthropic sector are also popular. I think there are many other plausible directions. This has not been a comprehensive list of things you could do, and thinking more about what to do on your own is also probably a good option.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. What should be done about AI risk? Are there important things that none of the current organizations are working on?
  2. What work is important to do now, and what work should be deferred?
  3. What forms of capability improvement are most useful for navigating AI risk?

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

This is the last reading group, so how to proceed is up to you, even more than usually. Thanks for joining us! 

[LINK] Amanda Knox exonerated

8 fortyeridania 28 March 2015 06:15AM

Here are the New York Times, CNN, and NBC. Here is Wikipedia for background.

The case has made several appearances on LessWrong; examples include:

View more: Next