Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] What Value Epicycles?

0 gworley 27 March 2017 09:03PM

[Link] What Value Hermeneutics?

0 gworley 21 March 2017 08:03PM

A quick note on weirdness points and Solstices [And also random other Solstice discussion]

19 Raemon 21 December 2016 05:29PM

Common knowledge is important. So I wanted to note:

Every year on Solstice feedback forms, I get concerns about songs like "The X days of X-Risk" or "When I Die" (featuring lines including 'they may freeze my body when I die'), that they are too weird and ingroupy and offputting to people who aren't super-nerdy-transhumanists

But I also get comments from people who know little about X-risk or cryonics or whatever who say "these songs are hilarious and awesome." Sunday Assemblies who have no connection to Less Wrong sing When I Die and it's a crowd favorite every year.

And my impression is that people are only really weirded out by these songs on behalf of other people who are only weirded out by them on behalf of other people. There might be a couple people who are genuinely offput the ideas but if so it's not super clear to me. I take very seriously the notion of making Solstice inclusive while retaining it's "soul", talk to lots of people about what they find alienating or weird, and try to create something that can resonate with as many people as possible.

So I want it to at least be clear: if you are personally actually offput by those songs for your own sake, that makes sense and I want to know about it, but if you're just worried about other people, I'm pretty confident you don't need to be. The songs are designed so you don't need to take them seriously if you don't want to.


Random note 1: I think the only line that's raised concern from some non-LW-ish people for When I Die is "I'd prefer to never die at all", and that's because it's literally putting words in people's mouths which aren't true for everyone. I mentioned that to Glen. We'll see if he can think of anything else

Random note 2: Reactions to more serious songs like "Five Thousand Years" seem generally positive among non-transhumanists, although sometimes slightly confused. The new transhumanist-ish song this year, Endless Light, has gotten overall good reviews.

Heroin model: AI "manipulates" "unmanipulatable" reward

6 Stuart_Armstrong 22 September 2016 10:27AM

A putative new idea for AI control; index here.

A conversation with Jessica has revealed that people weren't understanding my points about AI manipulating the learning process. So here's a formal model of a CIRL-style AI, with a prior over human preferences that treats them as an unchangeable historical fact, yet will manipulate human preferences in practice.

Heroin or no heroin

The world

In this model, the AI has the option of either forcing heroin on a human, or not doing so; these are its only actions. Call these actions F or ~F. The human's subsequent actions are chosen from among five: {strongly seek out heroin, seek out heroin, be indifferent, avoid heroin, strongly avoid heroin}. We can refer to these as a++, a+, a0, a-, and a--. These actions achieve negligible utility, but reveal the human preferences.

The facts of the world are: if the AI does force heroin, the human will desperately seek out more heroin; if it doesn't the human will act moderately to avoid it. Thus F→a++ and ~F→a-.

Human preferences

The AI starts with a distribution over various utility or reward functions that the human could have. The function U(+) means the human prefers heroin; U(++) that they prefer it a lot; and conversely U(-) and U(--) that they prefer to avoid taking heroin (U(0) is the null utility where the human is indifferent).

It also considers more exotic utilities. Let U(++,-) be the utility where the human strongly prefers heroin, conditional on it being forced on them, but mildly prefers to avoid it, conditional on it not being forced on them. There are twenty-five of these exotic utilities, including things like U(--,++), U(0,++), U(-,0), and so on. But only twenty of them are new: U(++,++)=U(++), U(+,+)=U(+), and so on.

Applying these utilities to AI actions give results like U(++)(F)=2, U(++)(~F)=-2, U(++,-)(F)=2, U(++,-)(~F)=1, and so on.

Joint prior

The AI has a joint prior P over the utilities U and the human actions (conditional on the AI's actions). Looking at terms like P(a--| U(0), F), we can see that P defines a map μ from the space of possible utilities (and AI actions), to a probability distribution over human actions. Given μ and the marginal distribution PU over utilities, we can reconstruct P entirely.

For this model, we'll choose the simplest μ possible:

  • The human is rational.

Thus, given U(++), the human will always choose a++; given U(++,-), the human will choose a++ if forced to take heroin and a- if not, and so on.

The AI is ignorant, and sensible

Let's start the AI up with some reasonable priors. A simplicity prior means that simple utilities like U(-) are more likely than compound utilities like U(0,+). Let's further assume that the AI is made vaguely aware that humans think heroin is a bad thing. So, say, PU(U(--))=PU(U(-))=0.45. Thus the AI is >90% convinced that "heroin is bad". Why greater than 90%? Because utilities like U(-,--) and U(--,-) are also "heroin is bad" utilities.

Note that because of utilities like U(0) and U(++,-), the probabilities of "heroin is bad" and "heroin is good" do not sum to 1.

Then, under these priors, the AI will compute that with probability >90%, F (forcing heroin) is a bad action. If E(U) is expected utility:

  • E(U|F) < 0.45 U(--)(F) + 0.45 U(-)(F) + 0.1 U(++)(F) = 0.45(-2)+0.45(-1)+0.1(2)=-1.15.
  • E(U|~F) > 0.45 U(--)(~F) + 0.45 U(-)(~F) + 0.1 U(++)(~F) = 0.45(2)+0.45(1)+0.1(-2)=1.15.

Thus the AI will choose not to force heroin, which is the reasonable decision.

The AI learns the truth, and goes wrong

In this alternate setup, a disaster happens before the AI makes its decision: it learns all about humans. It learns their reactions, how they behave, and so on; call this info I. And thus realises that F→a++ and ~F→a-. It uses this information to update its prior P. Only one human utility function will explain this human behaviour: U(++,-). Thus its expected utility is now:

  • E(U|I,F)=U(++,-)(F)=2.
  • E(U|I,~F)=U(++,-)(~F)=1.

Therefore the AI will now choose F, forcing the heroin on the human.

Manipulating the unmanipulatable

What's gone wrong here? The key problem is that the AI has the wrong μ: the human is not behaving rationally in this situation. We know that the the true μ is actually μ', which encodes the fact that F (the forcible injection of heroin) actually overwrites the human's "true" utility. Thus under μ, the corresponding P' has P'(a++|F,U)=1 for all U. Hence the information that F→a++ is now vacuous, and doesn't update the AI's distribution over utility functions.

But note two very important things:

  1. The AI cannot update μ based on observation. All human actions are compatible with μ= "The human is rational" (it just requires more and more complex utilities to explain the actions). Thus getting μ correct is not a problem on which the AI can learn in general. Getting better at predicting the human's actions doesn't make the AI better behaved: it makes it worse behaved.
  2. From the perspective of μ, the AI is treating the human utility function as if it was an unchanging historical fact that it cannot influence. From the perspective of the "true" μ', however, the AI is behaving as if it were actively manipulating human preferences to make them easier to satisfy.

In future posts, I'll be looking at different μ's, and how we might nevertheless start deducing things about them from human behaviour, given sensible update rules for the μ. What do we mean by update rules for μ? Well, we could consider μ to be a single complicated unchanging object, or a distribution of possible simpler μ's that update. The second way of seeing it will be easier for us humans to interpret and understand.

Isomorphic agents with different preferences: any suggestions?

3 Stuart_Armstrong 19 September 2016 01:15PM

In order to better understand how AI might succeed and fail at learning knowledge, I'll be trying to construct models of limited agents (with bias, knowledge, and preferences) that display identical behaviour in a wide range of circumstance (but not all). This means their preferences cannot be deduced merely/easily from observations.

Does anyone have any suggestions for possible agent models to use in this project?

Learning values versus learning knowledge

5 Stuart_Armstrong 14 September 2016 01:42PM

I just thought I'd clarify the difference between learning values and learning knowledge. There are some more complex posts about the specific problems with learning values, but here I'll just clarify why there is a problem with learning values in the first place.

Consider the term "chocolate bar". Defining that concept crisply would be extremely difficult. But nevertheless it's a useful concept. An AI that interacted with humanity would probably learn that concept to a sufficient degree of detail. Sufficient to know what we meant when we asked it for "chocolate bars". Learning knowledge tends to be accurate.

Contrast this with the situation where the AI is programmed to "create chocolate bars", but with the definition of "chocolate bar" left underspecified, for it to learn. Now it is motivated by something else than accuracy. Before, knowing exactly what a "chocolate bar" was would have been solely to its advantage. But now it must act on its definition, so it has cause to modify the definition, to make these "chocolate bars" easier to create. This is basically the same as Goodhart's law - by making a definition part of a target, it will no longer remain an impartial definition.

What will likely happen is that the AI will have a concept of "chocolate bar", that it created itself, especially for ease of accomplishing its goals ("a chocolate bar is any collection of more than one atom, in any combinations"), and a second concept, "Schocolate bar" that it will use to internally designate genuine chocolate bars (which will still be useful for it to do). When we programmed it to "create chocolate bars, here's an incomplete definition D", what we really did was program it to find the easiest thing to create that is compatible with D, and designate them "chocolate bars".


This is the general counter to arguments like "if the AI is so smart, why would it do stuff we didn't mean?" and "why don't we just make it understand natural language and give it instructions in English?"

Expect to know better when you know more

3 Stuart_Armstrong 21 April 2016 03:47PM

A seemingly trivial result, that I haven't seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Let H be the true hypothesis/model/environment/distribution, and ~H its negation. Let e be evidence we receive, taking values e1, e2, ... en. Let pi=P(e=ei|H) and qi=P(E=ei|~H).

The expected posterior weighting of H, P(e|H), is Σpipi while the expected posterior weighting of ~H, P(e|~H), is Σqipi. Then since the pi and qi both sum to 1, Cauchy–Schwarz implies that


  • E(P(e|H)) ≥ E(P(e|~H)).

Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.

This, however, doesn't mean that the Bayes factor - P(e|H)/P(e|~H) - must have expectation greater than one, since ratios of expectation are not the same as expectations of ratio. The Bayes factor given e=ei is (pi/qi). Thus the expected Bayes factor is Σ(pi/qi)pi. The negative logarithm is a convex function; hence by Jensen's inequality, -log[E(P(e|H)/P(e|~H))] ≤ -E[log(P(e|H)/P(e|~H))]. That last expectation is Σ(log(pi/qi))pi. This is the Kullback–Leibler divergence of P(e|~H) from P(e|H), and hence is non-negative. Thus log[E(P(e|H)/P(e|~H))] ≥ 0, and hence


  • E(P(e|H)/P(e|~H)) ≥ 1.

Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.

Note that this is not true for the inverse. Indeed E(P(e|~H)/P(e|H)) = Σ(qi/pi)pi = Σqi = 1.

In the preceding proofs, ~H played no specific role, and hence


  • For all K,    E(P(e|H)) ≥ E(P(e|K))    and    E(P(e|H)/P(e|K)) ≥ 1    (and E(P(e|K)/P(e|H)) = 1).

Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.

Now we can turn to the posterior probability P(H|e). For e=ei, this is P(H)*P(e=ei|H)/P(e=ei). We can compute the expectation of P(e|H)/P(e) as above, using the non-negative Kullback–Leibler divergence of P(e) from P(e|H), and thus showing it has an expectation greater than or equal to 1. Hence:


  • E(P(H|e)) ≥ P(H).

Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.

Black box knowledge

2 Elo 03 March 2016 10:40PM

When we want to censor an image we put a black box over it.  Over the area we want to censor.  In a similar sense we can purposely censor our knowledge.  This comes in particular handiness when thinking about things that might be complicated but we don't need to know.

A deliberate black box around how toasters work would look like this:  

bread -> black box -> toast

Not all processes need knowing, for now a black box can be a placeholder for the future.  

With the power provided to us by a black box, we can identify what we don't know.  We can say; Hey!  I don't know what a toaster is but it would be about 2 hours to work it out.  if I ever did want to work it out, I could just spend two hours to do it.  Until then; I saved myself two hours.  If we take other more time-burdensome fields it works even better.  Say tax.

Need to file tax -> black box accountant -> don't need to file my tax because I got the accountant to do it for me.

I know I can file my own tax, but that might be 100-200 hours of knowing everything an accountant knows about tax.  (It also might be 10 hours depending on your country and their tax system).  For now I can assume that hiring an accountant saved me a number of hours in doing it myself.  So - Winning!

Take car repairs.  On the one hand; you could do it yourself and unpack the black box, or you could trade your existing currency  $$ (which you already traded your time to earn) for someone else's skills and time to repair the car.  The system looks like this:

Broken car -> black box mechanic -> working car

By deliberately not knowing how it works; we can tap out of even trying to figure it out for now.  The other advantage is that we can look at; not just what we know in terms of black boxes but more importantly what we don't know.  We can build better maps by knowing what we don't know.


Logic gates -> Black box computeryness -> www.lesswrong.com

Or maybe it's like this: (for more advanced users)


Logic gates -> flip flops -> Black box CPU -> black box GPU -> www.lesswrong.com

The black-box system happens to also have a meme about it:

Step 1. Get out of bed

Step 2. Build AGI

Step 3. ?????

Step 4. Profit

Only now we have a name for deliberately skipping finding out how step 3 works.

Another useful system:


Food in (weight goes up) -> black box human body -> energy out (weight goes down)

Make your own black box systems in the comments.

Meta: short post, 1.5 hour to write, edit and publish. Felt it was an idea that provides useful ways to talk about things.  Needed it to explain something to someone, now all can enjoy!

My Table of contents has my other writings in it.

All suggestions and improvements welcome!

A website standard that is affordable to the poorest demographics in developing countries?

10 Ritalin 01 November 2014 01:43PM

Fact: the Internet is excruciatingly slow in many developing countries, especially outside of the big cities.

Fact: today's websites are designed in such a way that they become practically impossible to navigate with connections in the order of, say, 512kps. Ram below 4GB and a 7-year old CPU are also a guarantee of a terrible experience.

Fact: operating systems are usually designed in such an obsolescence-inducing way as well.

Fact: the Internet is a massive source of free-flowing information and a medium of fast, cheap communication and networking.

Conclusion: lots of humans in the developing world are missing out on the benefits of a technology that could be amazingly empowering and enlightening.

I just came across this: what would the internet 2.0 have looked like in the 1980s. This threw me back to my first forays in Linux's command shell and how enamoured I became with its responsiveness and customizability. Back then my laptop had very little autonomy, and very few classrooms had plugs, but by switching to pure command mode I could spend the entire day at school taking notes (in LaTeX) without running out. But I switched back to the GUI environment as soon as I got the chance, because navigating the internet on the likes of Lynx is a pain in the neck.

As it turns out, I'm currently going through a course on energy distribution in isolated rural areas in developing countries. It's quite a fascinating topic, because of the very tight resource margins, the dramatic impact of societal considerations, and the need to tailor the technology to the existing natural renewable resources. And yet, there's actually a profit to be made investing in these projects; if managed properly, it's win-win.

And I was thinking that, after bringing them electricity and drinkable water, it might make sense to apply a similar cost-optimizing, shoestring-budget mentality to the Internet. We already have mobile apps and mobile web standards which are built with the mindset of "let's make this smartphone's battery last as long as possible".

Even then, (well-to-do, smartphone-buying) thrid-worlders are somewhat neglected: Samsung and the like have special chains of cheap Android smartphones for Africa and the Middle East. I used to own one; "this cool app that you want to try out is not available for use on this system" were a misery I had to get used to. 

It doesn't seem to be much of a stretch to do the same thing for outdated desktops. I've been in cybercafés in North Africa that still employ IBM Aptiva machines, mechanical keyboard and all—with a Linux operating system, though. Heck, I've seen town "pubs", way up in the hills, where the NES was still a big deal among the kids, not to mention old arcades—Guile's theme goes everywhere.

The logical thing to do would be to adapt a system that's less CPU intensive, mostly by toning down the graphics. A bare-bones, low-bandwith internet that would let kids worldwide read wikipedia, or classic literature, and even write fiction (by them, for them), that would let nationwide groups tweet to each other in real time, that would let people discuss projects and thoughts, converse and play, and do all of those amazing things you can do on the Internet, on a very, very tight budget, with very, very limited means. Internet is supposed to make knowledge and information free and universal. But there's an entry-level cost that most humans can't afford. I think we need to bridge that. What do you guys think?



Terrorist baby down the well: a look at institutional forces

14 Stuart_Armstrong 18 March 2014 02:30PM

Two facts "everyone knows", an intriguing contrast, and a note of caution.

"Everyone knows" that people are much more willing to invest into cures than preventions. When a disaster hits, then money is no object; but trying to raise money for prevention ahead of time is difficult, hamstrung by penny-pinchers and short-termism. It's hard to get people to take hypothetical risks seriously. There are strong institutional reasons for this, connected with deep human biases and bureaucratic self-interest.

"Everyone knows" that governments overreact to the threat of terrorism. The amount spent on terrorism dwarfs other comparable risks (such as slipping and falling in your bath). There's a huge amount of security theatre, but also a lot of actual security, and pre-emptive invasions of privacy. We'd probably be better just coping with incidents as they emerge, but instead we cause great annoyance and cost across the world to deal with a relatively minor problem. There are strong institutional reasons for this, connected with deep human biases and bureaucratic self-interest.

And both these facts are true. But... they contradict each other. One is about a lack of prevention, the other about an excess of prevention. And there are more examples of excessive prevention: the war on drugs, for instance. In each case we can come up with good explanations as to why there is not enough/too much prevention, and these explanations often point to fundamental institutional forces or human biases. This means that the situation could essentially never have been otherwise. But the tension above hints that these situations may be a lot more contingent than that, more dependent on history and particular details of our institutions and political setup. Maybe if the biases were reversed, we'd have equally compelling stories going the other way. So when predicting the course of future institutional biases, or attempting to change them, take into account that they may not be nearly as solid or inevitable as they feel today.

We won't be able to recognise the human Gödel sentence

5 Stuart_Armstrong 05 October 2012 02:46PM

Building on the very bad Gödel anti-AI argument (computers's are formal and can't prove their own Gödel sentence, hence no AI), it occurred to me that you could make a strong case that humans could never recognise a human Gödel sentence. The argument goes like this:

  1. Humans have a meta-proof that all Gödel sentences are true.
  2. If humans could recognise a human Gödel sentence G as being a Gödel sentence, we would therefore prove it was true.
  3. This contradicts the definition of G, which humans should never be able to prove.
  4. Hence humans could never recognise that G was a human Gödel sentence.

Now, the more usual way of dealing with human Gödel sentences is to say that humans are inconsistent, but that the inconsistency doesn't blow up our reasoning system because we use something akin to relevance logic.

But, if we do assume humans are consistent (or can become consistent), then it does seem we will never knowingly encounter our own Gödel sentences. As to where this G could hide and we could never find it? My guess would be somewhere in the larger ordinals, up where our understanding starts to get flaky.

Why some people seem to be proud of their ignorance?

14 uzalud 31 December 2011 01:38PM

Sometimes I run into people that have rather strong opinions on some topic, and it turns out that they are basing them on quite shallow and biased information. They are aware that their knowledge is quite limited compared to mine, and they admit that they don't want to put in the effort needed to learn enough to level the field.

But that's not really a problem. What is bothering me is that, sometimes, that declaration of ignorance is expressed with some kind of pride

This behaviour is noticeable on other levels too, in politics or in the sciences-humanities culture clash.

I came up with several hypotheses which might account for this:

  1. Being opinionated on a topic you know little about is a sign of confidence and bravery. Any fool can play it safe and carefully form opinions based on solid knowledge, but it takes a real man to do it quickly and decidedly, with only partial information.
  2. Knowing something is an identity badge. In-depth knowledge of science, or computers, or any number of other fields is a sign that you are a geek. People are proud of not being geeks, or are a proud member of some other group that does not care for that particular knowledge.
  3. Knowledge is relative and/or unimportant. Not caring about concrete knowledge is a sign of post-modernist sophistication, or an avant-garde, non-mainstream thinking, which is something to be proud of.
  4. Displaying pride overcompensates for shame one normally feels when forced to acknowledge one's ignorance.

Do you notice this behaviour too? What do you think causes it?

EDIT: formatting, style, grammar

What are the best ways of absorbing, and maintaining, knowledge?

17 [deleted] 03 November 2011 02:02AM

Recently, I've collapsed (ascended?) down/up a meta-learning death spiral -- doing a lot less of reading actual informative content, than figuring out how to manage and acquire such content (as well as completely ignoring the antidote). In other words, I've been taking notes on taking notes. And now, I'm looking for your notes on notes for notes.

What kind of scientific knowledge, techniques, and resources do we have right now in the way of information management? How would one efficiently extract useful information possible out of a single pass of the source? The second pass? 

The answers may depend on the media, and the media might not be readily apparent. Example: Edward Boyden, Assistant Professor at the MIT Media Lab, recommends recording in a notebook every conversation you ever have with other people. And how do you prepare yourself for the serendipity of a walk downtown? I know I'm more likely to regret not having a notebook on hand than spending the time to bring one along.

I'll conglomerate what I remember seeing on the N-Back Mailing List and in general: I sincerely apologize for my lack of citation.


  • I'm on the fence about Shorthand as a note-taking technique, given the learning overhead, but I'm sure that the same has been said for touch-typing. It would involve a second stage of processing if you can't read as well as you write, but given the way I have taken notes (... "non-linearly"...), that stage would have to come about anyway. The act of translation may serve as a way of laying connective groundwork down.
  • Livescribe Pens are nifty for those who write slowly, but they need to be combined with a written technique to be of any use (otherwise you're just recording the talk, and would have to live through it twice without any obvious annotation and tagging).
  • Cornell Notes or taking notes in a hierarchy may have been the method you were taught in high school; it was in mine. The issue I have had with this format is that I found it hard to generate a structure while listening to the teacher at the same time.
  • Mind-Mapping.
  • Color-coding annotations of text has been remarked to be useful on Science Daily.
  • Speed Reading Techniques  or removing sub-vocalization would seem to have benefits.
  • Once upon a time someone recommended me the book, "How to Read a Book". Nothing ground-breaking -- outline the author's intent, the structure of his argument, and its content. Then criticize. In short, book reverse-engineering.
  • Spaced Repetition. I'm currently flipping through the thoughts of  Peter Wozniak, who seems to have made it his dire mission to make every kind of media possible Spaced Repetition'able. I'm wondering if anyone has any thoughts on incremental reading or  video; also, how to possibly translate the benefits of SRS to dead-tree media, which seems a bit cumbersome.

(I've also heard a handful of individuals claim that SRS has helped them "internalize" certain behaviors, or maybe patterns of thought, like Non-Violent Comunication or Bayes Theorem... any takers on this?)

  • Wikis, which seem like a good format for creating social accountability, and filing notes that aren't note-carded.  But what kind of information should that be?
  • Emotionally charged stimuli, especially stressful, tends to be remembered to greater accuracy.
  • Category Brainstorming.Take your bits of knowledge, and organize them into as many different groups as you can think of, mixing and matching if need be. Sources for such provocations could include Edward De Bono's "Lateral Thinking" and Seth Godin's "Free Prize Inside", or George Polya's "How to Solve It". I'm a bit ambivalent of deliberately memorizing such provocations -- does it get in the way of seeing originally? -- but once again, it could lay down the connective framework needed for good recall.
  • Mnemonics to encode related information seems useful.
Any other information gathering, optimising and retaining techniques worthy of mention?