In partially observable environments, stochastic policies can be optimal
I always had the informal impression that the optimal policies were deterministic (choosing the best option, rather than some mix of options). Of course, this is not the case when facing other agents, but I had the impression this would hold when facing the environment rather that other players.
But stochastic policies can also be needed if the environment is partially observable, at least if the policy is Markov (memoryless). Consider the following POMDP (partially observable Markov decision process):

There are two states, 1a and 1b, and the agent cannot tell which one they're in. Action A in state 1a and B in state 1b, gives a reward of -R and keeps the agent in the same place. Action B in state 1a and A in state 1b, gives a reward of R and moves the agent to the other state.
The returns for the two deterministic policies - A and B - are -R every turn except maybe for the first. While the return for the stochastic policy of 0.5A + 0.5B is 0 per turn.
Of course, if the agent can observe the reward, the environment is no longer partially observable (though we can imagine the reward is delayed until later). And the general policy of "alternate A and B" is more effective that the 0.5A + 0.5B policy. Still, that stochastic policy is the best of the memoryless policies available in this POMDP.
Notes on Imagination and Suffering
Time: 22:56:47
I
This is going to be an exercise in speed writing a LW post.
Not writing posts at all seems to be worse than writing poorly edited posts.
It is currently hard for me to do anything that even resembles actual speed writing: even as I type this sentence, I have a very hard to resist urge to check it for grammar mistakes and make small corrections/improvements before I've even finished typing.
But to reduce the burden of writing, I predict it is going to be highly useful to develop the ability of actually writing a post as fast as I can type, without going back.
If this proves to have acceptable results, you can expect more regular posts from me in the future.
And possibly, if I develop the habit of writing regularly, I'll finally get to describing some of the topics on which I have (what I believe are) original and sizable clusters of knowledge, which is not easily available somewhere else.
But for now, just some thoughts on a very particular aspect of modelling how human brains think about a very particular thing.
This thing is immense suffering.
Time: 23:03:18
(Still slow!)
II
You might have heard this or similar from someone, possibly more than once in your life:
"you have no idea how I feel!"
or
"you can't even imagine how I feel!"
For me, this kind of phrase has always had the ring of a challenge. I have a potent imagination, and non-negligible experience in the affairs of humans. Therefore, I am certainly able to imagine how you feel, am I not?
Not so fast.
(Note added later: as Gram_Stone mentions, these kinds of statements tend to be used in epistemically unsound arguments, and as such can be presumed to be suspicious; however here, I am more concerned with the fact of the matter of how imagination works.)
Let's back up a little bit and recount some simple observations about imagining numbers.
You might be able to imagine and hold the image of five, six, nine, or even sixteen apples in your mind.
If I tell you to imagine something more complex, like pointed arrows arranged in a circle, you might be able to imagine four, or six, or maybe even eight of them.
If your brain is constructed differently from mine, you might easily go higher with the numbers.
But at some fairly small number, your mental machinery simply no longer has the capacity to imagine more shapes.
III
However, if I tell you that "you can't even imagine 35 apples!" it is obviously not an insult or a challenge, and what is more:
"imagining 35 apples" is NOT EQUAL to "comprehending in every detail what 35 apples are"
I.e. depending on how good your knowledge of natural numbers is, that is to say, if you passed the first class of primary school, you can analyse the situation of "35 apples" in every possible way, and imagine it partially - but not all of it at the same time.
Directly imagining apples is very similar to actually experiencing apples in your life, but it has a severe limitation.
You can experience 35 apples in your life, but you can't imagine all of them at once even if you saw them 3 seconds ago.
Meta: I think I'm getting better at not stopping when I write.
Time: 23:13:00
IV
But, you ask, what is the point of writing all this obvious stuff about apples?
Well, if you move to more emotionally charged topics, like someone's emotions, it is much harder to think about the situation in a clear way.
And if you have a clear model of how your brain processes this information, you might be able to respond in a more effective way.
In particular, you might be saved from feeling guilty or inadequate about not being able to imagine someone's feelings or suffering.
It is a simple fact about your brain that it has a limited capability to imagine emotion.
And especially with suffering, the amount of suffering you are able to experience IS OF A COMPLETELY DIFFERENT ORDER OF MAGNITUDE than the amount you are able to imagine, even with the best intentions and knowledge.
However, can you comprehend it?
V
From this model, it is also immediately obvious that the same thing happens when you think about your own suffering in the past.
We know generally that humans can't remember their emotions very well, and their memories don't correlate very well with reported experience-in-the-moment.
Based on my personal experience, I'll tentatively make some bolder claims.
If you have suffered a tremendous amount, and then enough time has passed to "get over it", your brain is not only unable to imagine how much you have suffered in the past:
it is also unable to comprehend the amount of suffering.
Yes, even if it's your own suffering.
And what is more, I propose that the exact mechanism of "getting over something" is more or less EQUIVALENT to losing the ability to comprehend that suffering.
The same would (I expect) hold in case of getting better after severe PTSD etc.
VI
So in this sense, a person telling you "you cannot even imagine how I feel" is right also with a less literal interpretation of their statement.
If you are a mentally healthy individual, not suffering any major traumas etc., I suggest your brain literally has a defense mechanism (that protects your precious mental health) that makes it impossible for you to not only imagine, but also fully comprehend the amounts of suffering you are being told about.
Time: 23:28:04
Publish!
The map of future models
TL;DR: Many models of the future exist. Several are relevant. Hyperbolic model is strongest, but too strange.
Our need: correct model of the future
Different people: different models = no communication.
Assumptions:
Model of the future = main driving force of historical process + graphic of changes
Model of the future determines global risks
The map: lists all main future models.
Structure: from fast growth – to slow growth models.
Pfd: http://immortality-roadmap.com/futuremodelseng.pdf
Link: The Economist on Paperclip Maximizers
I certainly was not expecting the Economist to publish a special report on paperclip maximizers (!).
As the title suggests, they are downplaying the risks of unfriendly AI, but just the fact that the Economist published this is significant
Diaspora roundup thread, 23rd June 2016
Guidelines: Top-level comments here should be links to things written by members of the rationalist community, preferably that have some particular interest to this community. Self-promotion is totally fine. Including a brief summary or excerpt is great, but not required. Generally stick to one link per top-level comment, so they can be voted on individually. Recent links are preferred.
Rule: Do not link to anyone who does not want to be linked to. In particular, Scott Alexander has asked people to get his permission, before linking to specific posts on his tumblr or in other out-of-the-way places.
Crazy Ideas Thread
This thread is intended to provide a space for 'crazy' ideas. Ideas that spontaneously come to mind (and feel great), ideas you long wanted to tell but never found the place and time for and also for ideas you think should be obvious and simple - but nobody ever mentions them.
Rules for this thread:
- Each crazy idea goes into its own top level comment and may be commented there.
- Voting should be based primarily on how original the idea is.
- Meta discussion of the thread should go to the top level comment intended for that purpose.
How my something to protect just coalesced into being
Tl;dr Different people will probably have different answers to the question of how to find the goal & nurture the 'something to protect' feeling, but mine is: your specific working experience is already doing it for you.
What values do other people expect of you?
I think that for many people, their jobs are the most meaningful ways of changing the world (including being a housewife). When you just enter a profession and start sharing your space and time with people who have been in it for a while, you let them shape you, for better or for worse. If the overwhelming majority of bankers are not EA (from the beneficiaries' point of view), it will be hard to be an EA banker. If the overwhelming majority of teachers view the lessons as basically slam dunks (from the students' point of view), it will be hard to be a teacher who revisits past insights with any purpose other than cramming.
So basically, if I want Something to protect, I find a compatible job, observe the people, like something good and hate something bad, and then try to give others like me the chance to do more of the first and less of the second.
I am generalizing from one example... or two...
I've been in a PhD program. I liked being expected to think, being given free advice about some of the possible failures, knowing other people who don't consider solo expeditions too dangerous. I hated being expected to fail, being denied changing my research topic, spending half a day home with a cranky kid and then running to meet someone who wasn't going to show up.
Then I became a lab technician & botany teacher in an out-of-school educational facility. I liked being able to show up later on some days, being treated kindly by a dozen unfamiliar people (even if they speak at classroom volume level), being someone who steps in for a chemistry instructor, finds umbrellas, and gives out books from her own library. I hated the condescending treatment of my subject by other teachers, sudden appointments, keys going missing, questions being recycled in highschool contests, and the feeling of intrusion upon others' well-structured lessons when I just had to add something (everyone took it in stride).
(...I am going to leave the job, because it doesn't pay well enough & I do want to see my kid on weekdays. It let me to identify my StP, though - a vision of what I want from botany education.)
Background and resolution.
When kids here in Ukraine start studying biology (6th-7th Form), they wouldn't have had any physics or chemistry classes, and are at the very start of algebra and geometry curriculum. (Which makes this a good place to introduce the notion of a phenomenon for the first time.) The main thing one can get out of a botany course is, I think, the notion of ordered, sequential, mathematically describable change. The kids have already observed seasonal changes in weather and vegetation, they have words to describe their personal experiences - but this goes unused. Instead, they begin with history of botany (!), proceed to cell structure (!!) and then to bacteriae etc. Life cycle of mosses? Try asking them how long does any particular stage take! It all happens on one page, doesn't it?
There are almost no numbers.
There is, frankly, no need for numbers. Understanding the difference between the flowering and the non-flowering plants doesn't require any. There is almost no use for direct observation, either - even of the simplest things, like what will grow in the infusions of different vegetables after a week on the windowsill. There is no science.
And I don't like this.
I want there to be a book of simple, imperfectly posed problems containing as little words and as many pictures as possible. As in, 'compare the areas of the leaves on Day 1 - Day 15. How does it change? What processes underlie it?' etc. And there should be 10 or more leaves per day, so that the child would see that they don't grow equally fast, and that maybe sometimes, you can't really tell Day 7 from Day 10.
And there would be questions like 'given such gradient of densities of stomata on the poplar's leaves from Height 1 to Height 2, will there be any change in the densities of stomata of the mistletoe plants attached at Height 1 and Height 2? Explain your reasoning.' (Actually, I am unsure about this one. Leaf conductance depends on more than stomatal density...)
Conclusion
...Sorry for so many words. One day, my brain just told me [in the voice of Sponge Bob] that this was what I wanted. Subjectively, it didn't use virtue ethics or conscious decisions or anything, just saw a hole in the world order and squashed plugs into it until one kinda fit.
Has it been like this for you?
LINK: Quora brainstorms strategies for containing AI risk
In case you haven't seen it yet, Quora hosted an interesting discussion of different strategies for containing / mitigating AI risk, boosted by a $500 prize for the best answer. It attracted sci-fi author David Brin, U. Michigan professor Igor Markov, and several people with PhDs in machine learning, neuroscience, or artificial intelligence. Most people from LessWrong will disagree with most of the answers, but I think the article is useful as a quick overview of the variety of opinions that ordinary smart people have about AI risk.
Rationality Quotes May 2016
Another month, another rationality quotes thread. The rules are:
- Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.
- Post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself.
- Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
- No more than 5 quotes per person per monthly thread, please.
Open Thread May 2 - May 8, 2016
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
The 'why does it even tell me this' moment
Edited based on the outline kindly provided by Gram_Stone, whom I thank.
There is a skill of reading and thinking which I haven't learned so far: of looking for implications as one goes through the book, simply putting it back on shelf until one's mind has run out of the inferences, perhaps writing them down. I think it would be easier to do with books that [have pictures]
- invite an attitude (like cooking shows or Darwin's travel accounts or Feynman's biography: it doesn't have to be "personal"),
- are/have been regularly needed (ideally belong to you so you can make notes on the margins),
- are either outdated (so you "take it with a grain of salt" and have the option of looking for a current opinion) or very new,
- are not highly specialized,
- are well-structured, preferably into one- to a-few-pages-long chapters,
- allow reading those chapters out of order*,
- (make you) recognize that you do not need this knowledge for its own sake,
- can be shared, or at least shown to other people, and talked about, etc. (Although I keep imagining picture albums when I read the list, so maybe I missed something.)
These features are what attracts me to an amateur-level Russian plant identification text of the 1948.** It was clearly written, and didn't contain many species of plants that the author considered to be easily grouped with others for practical purposes. It annoyed me when I expected the book to hold certain information that it didn't (a starting point - I have to notice something to want to think). This is merely speculation, but I suspect that the author omitted many of the species that they did because the book was intended to convey agricultural knowledge of great economic importance to the Soviet population of the time (although some included details were clearly of less import, botanists know that random bits trivia might help recognizing the plant in the field, which established a feeling of kinship - the realisation that the author's goal was to teach how to use the book, and how to get by without it on hand). I found the book far more entertaining to read when I realized that I would have to evaluate it in this context, even though one might think that this would actually make it more difficult to read. I was surprised that something as simple as glancing at a note on beetroot production rates could make me do more cognitive work than any cheap trick that I'd ever seen a pedagogical author try to perform purposefully.
There may be other ways that books could be written to spontaneously cause independent thought in their audiences. Perhaps we can do this on purpose. Or perhaps the practice of making inferences beyond what is obviously stated in books can be trained.
* which might be less useful for people learning about math.
** Ф. Нейштадт. Определитель растений. - Учпедгиз, 1948. - 476 с. An identification key gives you an algorithm, a branching path which must end with a Latin name, which makes using it leisurely a kind of game. If you cannot find what you see, then either you've made a mistake or it isn't there.
Double Corrigibility: better Corrigibility
A putative new idea for AI control; index here.
Corrigibility was an attempt to allow an AI to safely change its values, without seeking to provoke or avoid the change. The idea is that, when the AI's utility changes from u to v at time t, it maximises a meta-utility U such that
- U≤t = u
- U>t = v + E(u|u→u) - E(v|u→v).
Here u→u designates the event that u remains unchanged, while u→v designates the change.
As has been shown, adding those expectation terms means the AI will not seek to provoke or resist such a utility change - it is indifferent to the change. Those expectation terms can be designated compensatory rewards.
What it is not indifferent to, though, is changes to E. If the AI knows that change u→v is coming, then it has incentives to change E to E' with a high E'(u|u→u) and a low E'(v|u→v). Its ability to rewire is limited by the fact that E' will serve as its future expectation estimator, so it won't mangle it completely, but it certainly can twist it a bit.
Notice that the AI is now no longer indifferent to the change of utility. It can benefit from rewiring E this way only if a change is coming up. Notice that E' probably can't rewire itself any more, so the rewired AI will be indifferent, but not before rewiring.
Double corrigibility
To solve this problem, we need to turn to corrigibility a second time. Let ρ be the AI's current expectation/probability estimator. Let ρ' be one that it can conceivably change into. Let C be a random variable designating the sum total of all compensatory rewards. Now, to guarantee that C is finite would require bounds on all the utility functions and some discount rate. In practice, if often will be finite. That's because E(C)=E(u0|u0→u0)-E(un |un-1→un, H), where u0 is the AI's first utility, un its last, and H intermediate history. So some milder restrictions on the ui should suffice to bound C.
Then, whenever the AI takes an action that changes ρ to ρ', it gets the special meta-compensatory reward
- Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ').
This prevents it from deriving any advantage to C via changes to ρ. Or, more precisely, it prevents it from deriving any expected advantage, that it can figure out using its current ρ.
Note that it doesn't prevent improvements to ρ in general, it just removes the incentives to change it to affect compensatory rewards.
Since any action a might have an indirect effect on ρ, the "utility maximising" for a given u must be changed to:
- Eρ(u|a) + Σρ' Pρ(ρ→ρ'|a) (Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ')),
where Pρ is the probability estimate corresponding to ρ; the probability term can be rewritten as Eρ(Iρ→ρ') for Iρ→ρ' the indicator function for ρ→ρ'. In fact the whole line above can be rewritten as
- Eρ(u|a) + Eρ(Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ') | a).
For this to work, Eρ needs to be able to say sensible things about itself, and also about Eρ', which is used to estimate C if ρ→ρ'.
If we compare this with various ways of factoring out variables, we can see that it's a case where we have a clear default, ρ, and are estimating deviations from that.
Sleepwalk bias, self-defeating predictions and existential risk
Connected to: The Argument from Crisis and Pessimism Bias
When we predict the future, we often seem to underestimate the degree to which people will act to avoid adverse outcomes. Examples include Marx's prediction that the ruling classes would fail to act to avert a bloody revolution, predictions of environmental disasters and resource constraints, y2K, etc. In most or all of these cases, there could have been a catastrophe, if people had not acted with determination and ingenuity to prevent it. But when pressed, people often do that, and it seems that we often fail to take that into account when making predictions. In other words: too often we postulate that people will sleepwalk into a disaster. Call this sleepwalk bias.
What are the causes of sleepwalk bias? I think there are two primary causes:
Cognitive constraints. It is easier to just extrapolate existing trends than to engage in complicated reasoning about how people will act to prevent those trends from continuing.
Predictions as warnings. We often fail to distinguish between predictions in the pure sense (what I would bet will happen) and what we may term warnings (what we think will happen, unless appropriate action is taken). Some of these predictions could perhaps be interpreted as warnings - in which case, they were not as bad as they seemed.
However, you could also argue that they were actual predictions, and that they were more effective because they were predictions, rather than warnings. For, more often than not, there will of course be lots of work to reduce the risk of disaster, which will reduce the risk. This means that a warning saying that "if no action is taken, there will be a disaster" is not necessarily very effective as a way to change behaviour - since we know for a fact that action will be taken. A prediction that there is a high probability of a disaster all things considered is much more effective. Indeed, the fact that predictions are more effective than warnings might be the reason why people predict disasters, rather than warn about them. Such predictions are self-defeating - which you may argue is why people make them.
In practice, I think people often fail to distinguish between pure predictions and warnings. They slide between these interpretations. In any case, the effect of all this is for these "prediction-warnings" to seem too pessimistic qua pure predictions.
The upshot for existential risk is that those suffering from sleepwalk bias may be too pessimistic. They fail to appreciate the enormous efforts people will make to avoid an existential disaster.
Is sleepwalk bias common among the existential risk community? If so, that would be a pro tanto-reason to be somewhat less worried about existential risk. Since it seems to be a common bias, it would be unsurprising if the existential risk community also suffered from it. On the other hand, they have thought about these issues a lot, and may have been able to overcome it (or even overcorrect for it)
Also, even if sleepwalk bias does indeed affect existential risk predictions, it would be dangerous to let this notion make us decrease our efforts to reduce existential risk, given the enormous stakes, and the present neglect of existential risk. If pessimistic predictions may be self-defeating, so may optimistic predictions.
[Added 24/4 2016] Under which circumstances can we expect actors to sleepwalk? And under what circumstances can we expect that people will expect them to sleepwalk, even though they won't? Here are some considerations, inspired by the comments below. Sleepwalking is presumably more likely if:
- The catastrophe is arriving too fast for actors to react.
- It is unclear whether the catastrophe will in fact occur, or it is at least not very observable for the relevant actors (the financial crisis, possibly AGI).
- The possible disaster, though observable in some sense, is not sufficiently salient (especially to voters) to override more immediate concerns (climate change).
- There are conflicts (World War I) and/or free-riding problems (climate change) which are hard to overcome.
- The problem is technically harder than initially thought.
1, 2 and, in a way, 3, have to do with observing the disaster in time to act, whereas 4 and 5 have to do with ability to act once the problem is identified.
On the second question, my guess would be that people in general do not differentiate sufficiently between scenarios where sleepwalking is plausible and those where it is not (i.e. predicted sleepwalking has less variance than actual sleepwalking). This means that we sometimes probably underestimate the amount of sleepwalking, but more often, if my main argument is right, we overestimate it. An upshot of this is that it is important to try to carefully model the amount of sleepwalking that there will be regarding different existential risks.
How to provide a simple example to the requirement of falsifiability in the scientific method to a novice audience?
(I once posted this question on academia.stackexchange, but it was deemed to be off topic there. I hope it would be more on-topic here)
I would like to introduce the basics of the scientific method to an audience unfamiliar with the real meaning of it, without making it hard to understand.
As the suspected knowledge level of the intended audience is of the type which commonly thinks that to "prove something scientifically" is the same as "use modern technological gadgets to measure something, afterwards interpret the results as we wish", my major topic would be the selection of an experimental method and the importance of falsifiability. Wikipedia lists the "all swans are white" as an example for a falsifiable statement, but it is not practical enough. To prove that all swans are white would require to observe all the swans in the world. I'm searching of a simple example which uses the scientific method to determine the workings of an unknown system, starting by forming a good hypothesis.
A good example I found is the 2-4-6 game, culminating in the very catchy phrase "if you are equally good at explaining any outcome, you have zero knowledge". This would be one of the best examples to illustrate the most important part of the scientific method which a lot of people imagine incorrectly, it has just one flaw: for best effect it has to be interactive. And if I make it interactive, it has some non-negligible chance to fail, especially if done with a broader audience.
Is there any simple, non-interactive example to illustrate the problem underlying the 2-4-6 game? (for example, if we had taken this naive method to formulate our hypothesis, we would have failed)
I know, the above example is mostly used in the topic of fallacies, like the confirmation bias, but nevertheless it seems to me as a good method in grasping the most important aspects of the scientific method.
I've seen several good posts about the importance of falsifiability, some of them in this very community, but I did not yet see any example which is simple enough so that people unfamiliar with how scientists work, can also understand it. A good working example would be one, where we want to study a familiar concept, but by forgetting to take falsifiability into account, we arrive to an obviously wrong (and preferably humorous) conclusion.
(How I imagine such an example to work? My favorite example in a different topic is the egg-laying dog. A dog enters the room where we placed ten sausages and ten eggs, and when it leaves the room, we observe that the percentage of eggs relative to the sausages increased, so we conclude that the dog must have produced eggs. It's easy to spot the mistake in this example, because the image of a dog laying eggs is absurd. However, let's replace the example of the dog with an effective medicine against heart diseases where someone noticed that the chance of dying of cancer in the next ten years increased for those patients who were treated with it, so they declared the medicine to be carcinogenic even though it wasn't (people are not immortal, so if they didn't die in one disease, they died later in another one). In this case, many people will accept that it's carcinogenic without any second thought. This is why the example of the egg-laying dog can be so useful in illustrating the problem. Now, the egg-laying dog is not a good example to raise awareness for the importance of falsifiability, I presented it as a good and useful style for an effective example any laymen can understand)
An update on Signal Data Science (an intensive data science training program)
In December 2015, Robert Cordwell and I cofounded Signal Data Science (website), which we announced on Less Wrong.
Our first cohort has just concluded, and overall went very well. We're planning another one in Berkeley from May 2nd – June 24th. The program is a good fit for people who are both excited to learn how to extract insights from data sets and looking to prepare for industry data science jobs. If you're interested attending the next cohort, we would love to hear from you. You can apply here, or contact us at signaldatascience@gmail.com.
We offer inquiry-based learning and an unusually intellectually curious peer group. Unlike typical college classes, Signal Data Science focuses on learning by doing. You’ll learn from a combination of lectures, short knowledge-reinforcement problems, and longer, more open-ended assignments focusing on analyzing real datasets. (That’s your chance to discover something new!) Don’t worry if that sounds daunting: our instructors will be there to support you every step of the way.
You’ll learn both the theory and the application of a wide array of data science techniques. We offer a pair programming-focused curriculum, allowing students to learn from each other’s strengths. We cover everything from basic linear regression to advanced, industry-relevant methods like support vector machines and dimensionality reduction. You’ll do an advanced, self-directed project at the end of the course. Curious? Check out our showcase of past students’ final projects. Whatever your interests are—from doing something with real-world, industry-relevant applicability to applying cutting-edge neural nets—we’ll work with you to find a project to match your interests and help you showcase it to prospective employers.
Less Wrong readers might be especially interested by Olivia Schaefer's project, which describes results of doing some natural language processing on the Less Wrong comment corpus, explaining how the words pictured in different colors below are at opposite ends of an axis.

Rationality Reading Group: Part X: Yudkowsky's Coming of Age
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Beginnings: An Introduction (pp. 1527-1530) and Part X: Yudkowsky's Coming of Age (pp. 1535-1601). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
Beginnings: An Introduction
X. Yudkowsky's Coming of Age
292. My Childhood Death Spiral - Wherein Eliezer describes how a history of being rewarded for believing that 'intelligence is more important than experience or wisdom' initially led him to dismiss the possibility that most possible smarter-than-human artificial intelligences will cause unvaluable futures if constructed.
293. My Best and Worst Mistake - When Eliezer went into his death spiral around intelligence, he wound up making a lot of mistakes that later became very useful.
294. Raised in Technophilia - When Eliezer was quite young, it took him a very long time to get to the point where he was capable of considering that the dangers of technology might outweigh the benefits.
295. A Prodigy of Refutation - Eliezer's skills at defeating other people's ideas led him to believe that his own (mistaken) ideas must have been correct.
296. The Sheer Folly of Callow Youth - Eliezer's big mistake was when he took a mysterious view of morality.
297. That Tiny Note of Discord - Eliezer started to dig himself out of his philosophical hole when he noticed a tiny inconsistency.
298. Fighting a Rearguard Action Against the Truth - When Eliezer started to consider the possibility of Friendly AI as a contingency plan, he permitted himself a line of retreat. He was now able to slowly start to reconsider positions in his metaethics, and move gradually towards better ideas.
299. My Naturalistic Awakening - Eliezer actually looked back and realized his mistakes when he imagined the idea of an optimization process.
300. The Level Above Mine - There are people who have acquired more mastery over various fields than Eliezer has over his.
301. The Magnitude of His Own Folly - Eliezer considers his training as a rationalist to have started the day he realized just how awfully he had screwed up.
302. Beyond the Reach of God - Compare the world in which there is a God, who will intervene at some threshold, against a world in which everything happens as a result of physical laws. Which universe looks more like our own?
303. My Bayesian Enlightenment - The story of how Eliezer Yudkowsky became a Bayesian.
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part Y: Challenging the Difficult (pp. 1605-1647). The discussion will go live on Wednesday, 20 April 2016, right here on the discussion forum of LessWrong.
Open Thread April 4 - April 10, 2016
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
How It Feels to Improve My Rationality
Note: this has started as a comment reply, but I thought it got interesting (and long) enough to deserve its own post.
Important note: this post is likely to spark some extreme reactions, because of how human brains are built. I'm including warnings, so please read this post carefully and in order written or don't read it at all.
I'm going to attempt to describe my subjective experience of progress in rationality.
Important edit: I learned from the responses to this post that there's a group of people which whom this resonates pretty well, and there's also a substantial group with whom it does not at all resonate, to the degree they don't know if what I'm saying even makes sense and is correlated to rationality in any meaningful way. If you find yourself in the second group, please notice that trying to verify if I'm doing "real rationality" or not is not a way to resolve your doubts. There is no reason why you would need to feel the same. It's OK to have different experiences. How you experience things is not a test of your rationality. It's also not a test of my rationality. All in all, because of publishing this and reading the comments, I've found out some interesting stuff about how some clusters of people tend to think about this :)
Also, I need to mention that I am not an advanced rationalist, and my rationality background is mostly reading Eliezer's sequences and self-experimentation.
I'm still going to give this a shot, because I think it's going to be a useful reference for a certain level in rationality progress.
I even expect myself to find all that I write here silly and stupid some time later.
But that's the whole point, isn't it?
What I can say about how rationality feels to me now, is going to be pretty irrelevant pretty soon.
I also expect a significant part of readers to be outraged by it, one way or the other.
If you think this is has no value, maybe try to imagine a rationality-beginner version of you that would find a description such as this useful. If only as a reference that says, yes, there is a difference. No, rationality does not feel like a lot of abstract knowledge that you remember from a book. Yes, it does change you deeply, probably deeper than you suspect.
In case you want to downvote this, please do me a favour and write a private message to me, suggesting how I could change this so that it stops offending you.
Please stop any feeling of wanting to compare yourself to me or anyone else, or to prove anyone's superiority or inferiority.
If you can't do this please bookmark this post and return to it some other time.
...
...
Ready?
So, here we go. If you are free from againstness and competitiveness, please be welcome to read on, and feel free to tell me how this resonates, and how different it feels inside your own head and on your own level.
Part 1. Pastures and fences
Let's imagine a vast landscape, full of vibrant greenery of various sorts.
Now, my visualization of object-level rationality is staking out territories, like small parcels of a pasture surrounded by fences.
Inside of the fences, I tend to gave more of neat grass than anything else. It's never perfect, but when I keep working on an area, it's slowly improving. If neglected, weeds will start growing back sooner or later.
Let's also imagine that the ideas and concepts I generalize as I go about my work become seeds of grass, carried by the wind.
What the work feels like, is that I'm running back and forth between object level (my pastures) and meta-level (scattering seeds).
As result of this running back and forth I'm able to stake new territories, or improve previous ones, to have better coverage and less weeds.
The progress I make in my pastures feeds back into interesting meta-level insights (more seeds carried by the wind), which in turn tend to spread to new areas even when I'm not helping with this process on purpose.
My pastures tend to concentrate in clusters, in areas that I have worked on the most.
When I have lots of action in one area, the large amounts of seeds generated (meta techniques) are more often carried to other places, and at those times I experience the most change happening in other, especially new and unexplored, areas.
However even if I can reuse some of my meta-ideas (seeds), then still to have a nice and clear territory I need to go over there, and put in the manual work of clearing it up.
As I'm getting better and more efficient at this, it becomes less work to gain new territories and improve old ones.
But there's always some amount of manual labor involved.
Part 2. Tells of epistemic high ground
Disclaimer: not using this for the Dark Side requires a considerable amount of self-honesty. I'm only posting this because I believe most of you folks reading this are advanced enough not to shoot yourself in the foot by e.g. using this in arguments.
Note: If you feel the slightest urge to flaunt your rationality level, pause and catch it. (You are welcome.) Please do not start any discussion motivated by this.
So, what clues do I tend to notice when my rationality level is going up, relative to other people?
Important note: This is not the same as "how do I notice if I'm mistaken" or "how do I know if I'm on the right path". These are things I notice after the fact, that I judge to be correlates, but they are not to be used to choose direction in learning or sorting out beliefs. I wrote the list below exactly because it is the less talked about part, and it's fun to notice things. Somehow everyone seems to have thought this is more than I meant it to be.
Edit: check Viliam's comment for some concrete examples that make this list better.
In a particular field:
- My language becomes more precise. Where others use one word, I now use two, or six.
- I see more confusion all around.
- Polarization in my evaluations increases. E.g. two sensible sounding ideas become one great idea and one stupid idea.
- I start getting strong impulses that tell me to educate people who I now see are clearly confused, and could be saved from their mistake in one minute if I could tell them what I know... (spoiler alert, this doesn't work).
Rationality level in general:
- I stop having problems in my life that seem to be common all around, and that I used to have in the past.
- I forget how it is to have certain problems, and I need to remind myself constantly that what seems easy to me is not easy for everyone.
- Writings of other people move forward on the path from intimidating to insightful to sensible to confused to pitiful.
- I start to intuitively discriminate between rationality levels of more people above me.
- Intuitively judging someone's level requires less and less data, from reading a book to reading ten articles to reading one article.
Important note: although I am aware that my mind automatically estimates rationality levels of various people, I very strongly discourage anyone (including myself) from ever publishing such scores/lists/rankings. If you ever have an urge to do this, especially in public, think twice, and then think again, and then shut up. The same applies to ever telling your estimates to the people in question.
Note: Growth mindset!
Now let's briefly return to the post I started out replying to. Gram_Stone suggested that:
You might say that one possible statement of the problem of human rationality is obtaining a complete understanding of the algorithm implicit in the physical structure of our brains that allows us to generate such new and improved rules.
Now after everything I've seen until now, my intuition suggests Gram_Stone's idealized method wouldn't work from inside a human brain.
A generalized meta-technique could become one of the many seeds that help me in my work, or even a very important one that would spread very widely, but it still wouldn't magically turn raw territory into perfect grassland.
Part 3. OK or Cancel?
The closest I've come to Gram_Stone's ideal is when I witnessed a whole cycle of improving in a certain area being executed subconsciously.
It was only brought to my full attention when an already polished solution in verbal form popped into my head when I was taking a shower.
It felt like a popup on a computer screen that had "Cancel" and "OK" buttons, and after I chose OK the rest continued automatically.
After this single short moment, I found a subconscious habit was already in place that ensured changing my previous thought patterns, and it proved to work reliably long after.
That's it! I hope I've left you better off reading this, than not reading this.
Meta-note about my writing agenda: I've developed a few useful (I hope) and unique techniques and ideas for applied rationality, which I don't (yet) know how to share with the community. To get that chunk of data birthed out of me, I need some continued engagement from readers who would give me feedback and generally show interest (this needs to be done slowly and in the right order, so I would have trouble persisting otherwise). So for now I'm writing separate posts noncommittally, to test reactions and (hopefully) gather some folks that could support me in the process of communicating my more developed ideas.
What is the future of nootropic drugs? Why can't there be ones more effective than ones that have existed for 15+ years?
So Scott Alexander's post at http://slatestarcodex.com/2016/03/01/2016-nootropics-survey-results/ shows that the most "effective" "nootropics" have still been the ones that have existed for a long time. What do these results really mean, though? Is it possible that people are just worse at noticing the subtler effects of the other drugs, or are just much worse at disciplining themselves enough to correctly use the racetams or noopept (as in, with choline)?
How much potential is there in innovation in nootropics? What is holding this innovation back, if anything? It feels like there hasn't been any real progress over the last 15 years (other than massively increased awareness), but could targeted drug discovery (along with people willing to be super-liberal with their experimentation) finally lead to some real breakthroughs?
Preference over preference
Each individual person has a preference. Some preferences are strong, others are weak. For many preferences it's more complicated than that; they aren’t static, and we change our preferences all the time. Some days we don't like certain foods, sometimes we may strongly dislike a certain song then another time we may not care so much. Our preferences can change in scope, as well as intensity.
Sometimes people can have preferences over other people's preferences.
- Example 1: I prefer to be surrounded by people who enjoy exercise, that way I will be motivated to exercise more.
- Example 2: I prefer to be surrounded by people who don't care how they look, that way I look prettier than everyone else.
- Example 3: I prefer when other people like my clothes.
- Example 4: I prefer my partners to be polyamorous.
- Example 5: I prefer people around me to not smoke.
The interesting thing about example 3; is that there are multiple ways to achieve that preference:
- Find out what clothes people like and acquire those clothes, then wear them regularly.
- Find people who already like the clothes that you have, then hang around those people regularly.
- Change the preference of the people around you so that they like your clothes.
Changing someone’s preference over clothing seems pretty harmless, and that way you get to wear clothes you like, they get to like the clothes you wear, and you get to be around people who like the clothes you wear without finding new people. The scary and maybe uncomfortable thing is that the other preferences can be also achieved through these means.
Example 4:
- Find out where poly people are, and hang out with them. (and ask to be their partners - etc)
- Find out which of the people you know are already poly and hang out with them (and ask to be their partners - etc)
- Change the preferences of your existing partner/s.
Example 1:
- Find out where people who enjoy exercise hang out, and join them.
- Find out which of your friends already enjoy exercise and hang out with them.
- Change the preferences of those around you to also enjoy exercise.
Example 5:
- Find out where people don't smoke, hang out in those places.
- Figure out who already doesn't smoke and hang out with them.
- Encourage people you know to not smoke.
(I think that's enough examples)
Is it wrong?
There is nothing inherently wrong with having a preference. Having a preference over another person’s preference is also not inherently wrong. Such is the nature of having a preference (usually a strong one by the time you are dictating it to your surroundings). What really matters is what you do about it.
In this day and age; no one would be discouraged from figuring out where people are not smoking and being in those places instead of the smoking places. In this day and age you wouldn't be criticised for finding out which of your friends don't smoke and only hanging out with them either - but maybe it makes some people uncomfortable to do it, or to feel that the reciprocal might happen if someone strongly didn't like their preferences. In this day and age; encouraging those around you to not smoke can come across as an action with questionable motives.
So let's look at some of the motives:
- I prefer it when people don't smoke around me because then I don't get second hand smoke.
- I prefer it when my friends don't smoke because I don't like chemical dependency in my environment.
- I prefer it when my friends don't smoke so that we look better than that other group of people who do smoke.
- I prefer it when my friends don't smoke because I don't want them to get cancer and die (and not be around to be my friends any more).
Motive 1 seems very much about self-preservation. We can't really fault an entity for trying to self-preserve.
Motive 2 is a more broad example of self-preservation - the idea that having dependency in your environment might negatively impact you enough to warrant the need to maintain an environment without it - it's a stretch, but not an unreasonable self-preservation drive.
Motive 3 appears to be a superficial drive to be better than other people. We often don't like admitting that this is the reason we do things; but I don't mind it either. If it were me; I'd get pretty tired of being motivated by *keeping up with the Joneses* type attitudes but some people care greatly about that.
Motive 4 seems like a potentially altruistic desire to protect your friends; but then it seems less so once you include the bracketed sub-motive.
Herein lies the problem. If a preference looks like it is designed to improve someone else's life like "others shouldn't smoke" (remember that "looks like to me" is equivalent to "I believe it looks like..."), and we believe that having a preference over their preference would improve their life - should we enforce that preference? Do we have a right or even a burden to encourage those around us to quit smoking? To take up exercise? To become poly? To like us (or our clothes)?
The idea of preference over preference is a big one. What if my preference is that people eat my birthday cake? and Bob’s preference is that he sticks to his diet today? Who should win? It’s My Birthday. On Bob’s birthday he doesn’t have to eat cake, but on My Birthday he does. Or does he?
The truth is neither way is the best way. Sometimes hypothetical bob should eat the birthday cake and sometimes hypothetical birthday-kid should respect other people’s dietary choices. What we really have control over is our own preference for ourselves. My only advice it to tread delicately when having preferences over other people’s preferences.
If we think we know better (and we might but also might not) and are trying to uphold a preference over a preference (p/p), then what happens?
Either we are right, we are wrong, or something else happens. And depends on whether the other party conformed or not (or did something else). Then what happens when things resolve.
Examples:
- A is smoking
- B says not to because it's bad for you
- A doesn't stop
- It turns out to be bad for you
- A gets sick
B was right, tried to push a p/p and lost. (either by not pushing hard enough or by A being stubborn). Did the p/p serve any good here? Should it have happened? What if an alternative 5 exists; “A keeps smoking, never gets sick and lives to 90”. Then was the p/p useful?
- A is monogamous
- B says to be poly
- A does
- It goes badly
- A is hurt
B was wrong, tried to push a p/p and won. But was wrong and shouldn't have pushed it? Or maybe A shouldn't have conformed.
This can be represented in a table:
|
|
B prefers to maintain P/P |
B does not maintain P/P |
|
A is susceptible to pressure |
A gives in |
A does not change (because there is no pressure) |
|
A is not susceptible |
A does not change (stubborn) |
A does not change (because there is no pressure) |
And a second table of results:
|
change was negative (or caused a negative result) |
change was positive (or caused a positive result) |
|
|
A is susceptible |
A loses. |
A wins! |
|
A is not susceptible |
A wins! |
A loses. |
Assuming also that if A loses; B takes a hit as well. Ideally we want everyone to win all the time. But just showing these things in a table is not enough. We should be assigning estimated probability to these choices as well.
For example (my made up numbers of whether I think smoking will lead to a bad result):
|
Smoking: |
98% smoking causes problems |
2% smoking does not cause problems. |
If we edit the earlier table:
|
Smoking |
B prefers to maintain P/P |
B does not maintain P/P |
|
A is susceptible to pressure |
A gives in (2% estimate that the change was pointless) |
A does not change (because there is no pressure) (98% estimate that this is a bad outcome) |
|
A is not susceptible to pressure |
A does not change (stubborn) (98% estimate that this is a bad outcome) |
A does not change (because there is no pressure) (98% estimate that this is a bad outcome) |
To a rationalist; seeing your p/p table with estimates should help to understand whether they should take you up on fulfilling your preference or not. Assuming of course that rationalists never lie; and can accurately estimate the confidence of their beliefs.
If you meet someone with a 98% belief they should be able to produce evidence that will also reasonably convince you of similar ideas and encourage you to update your beliefs. So maybe in the smoking case A should be listening to B; or checking the evidence very seriously.
What should you do when you hold a strong p/p that will be to your benefit at the same time as being to someone else’s detriment. (and part 2: what if you are unsure of the benefit or detriment)
Examples:
B want's A to try a new street drug "splice". B says it's lots of fun and encourages A to do it. B is unsure of the risks; but sure of the benefits (lots of fun). Should B encourage A? (what more do we need to know to make that sort of judgement call?)
B has a sexual interest that is specific, and A’s are indifferent B could easily encourage A to "try out this". should B?
B has an old crappy car that B doesn’t like very much. B prefers to make friends with shady A’s who will steal the car. then B can claim on insurance that it was stolen. and get a nicer care with the payout. Should B?
B wants A to pay for the two of them to go on a carnival ride. the cost is simple (several dollars) the benefit is not. Should B pressure A? (what more do we need to know in order to answer that question?)
A always crosses the street dangerously because they are often running late. B believes that A should be more safe - walk a distance to the nearest crossing before crossing the road; B knows that this will make A late. Should B pressure A? (will more information help us answer?)
It was suggested that the Veil of ignorance might help to create a rule in this situation. However the bounds of this situation dictate that you know which party you are; and that you have a preference over a preference. So the Veil of ignorance does not so much apply to give us insight.
- It is possible to be a selfish entity, hold p/p and encourage others to fulfil your preference
- it is also possible to be a non-influential entity, and never push a preference over others.
- it is possible to be a stubborn entity and never conform to someone else’s p/p.
- It is also possible to be a conforming entity and always conform.
It is also possible to be a mix of these 4 in different situations and/or different preferences.
Partial Solution
Know your preferences, know your p/p’s and think very carefully about pushing your p/p’s, hiding your p/p’s; changing your preferences to conform, or being needlessly stubborn about your preferences. (warning: this is hard; don’t think it’s easy just because it fits into one sentence)
Knowing what your strong preferences are; knowing which of your preferences are potentially not beneficial for others and understanding whether you have a tendency to push your p/p on other people will possibly help you to be more careful when handling p/p and avoid manipulating people (to their detriment). In addition to this; knowing what culture you come from and what culture others come from will help to know how weak p/p might be misinterpreted as strong p/p (see "ask culture", "guess culture" and "tell culture"). (some cultures aim to please when asked, and ask little of each other; some cultures are stubborn, vocal and demanding. In the middle of the two cultures is the crazy-confused zone. Of course these are the obvious cases. Sometimes cultural taboo will come up around some topics and not others; i.e. dinner etiquette might be something you never ask about - because it would be bad etiquette; but expressing a strong preference over what you want to drink is expected)
In conclusion there are no rules to be drawn around p/p other than - Try to understand it; and how it can go wrong and be careful.
Meta: 4.5 hours to write, 30mins to take feedback and edit. Thanks to the slack for being patient while I asked tricky example questions.
My Table of contents - contains links to the other things I have written.
Further comments adjustments and suggestions welcome.
Rationality Quotes Thread March 2016
Another month, another rationality quotes thread. The rules are:
- Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.
- Post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself.
- Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
- No more than 5 quotes per person per monthly thread, please.
[paper] [link] Defining human values for value learners
MIRI recently blogged about the workshop paper that I presented at AAAI.
My abstract:
Hypothetical “value learning” AIs learn human values and then try to act according to those values. The design of such AIs, however, is hampered by the fact that there exists no satisfactory definition of what exactly human values are. After arguing that the standard concept of preference is insufficient as a definition, I draw on reinforcement learning theory, emotion research, and moral psychology to offer an alternative definition. In this definition, human values are conceptualized as mental representations that encode the brain’s value function (in the reinforcement learning sense) by being imbued with a context-sensitive affective gloss. I finish with a discussion of the implications that this hypothesis has on the design of value learners.
Their summary:
Economic treatments of agency standardly assume that preferences encode some consistent ordering over world-states revealed in agents’ choices. Real-world preferences, however, have structure that is not always captured in economic models. A person can have conflicting preferences about whether to study for an exam, for example, and the choice they end up making may depend on complex, context-sensitive psychological dynamics, rather than on a simple comparison of two numbers representing how much one wants to study or not study.
Sotala argues that our preferences are better understood in terms of evolutionary theory and reinforcement learning. Humans evolved to pursue activities that are likely to lead to certain outcomes — outcomes that tended to improve our ancestors’ fitness. We prefer those outcomes, even if they no longer actually maximize fitness; and we also prefer events that we have learned tend to produce such outcomes.
Affect and emotion, on Sotala’s account, psychologically mediate our preferences. We enjoy and desire states that are highly rewarding in our evolved reward function. Over time, we also learn to enjoy and desire states that seem likely to lead to high-reward states. On this view, our preferences function to group together events that lead on expectation to similarly rewarding outcomes for similar reasons; and over our lifetimes we come to inherently value states that lead to high reward, instead of just valuing such states instrumentally. Rather than directly mapping onto our rewards, our preferences map onto our expectation of rewards.
Sotala proposes that value learning systems informed by this model of human psychology could more reliably reconstruct human values. On this model, for example, we can expect human preferences to change as we find new ways to move toward high-reward states. New experiences can change which states my emotions categorize as “likely to lead to reward,” and they can thereby modify which states I enjoy and desire. Value learning systems that take these facts about humans’ psychological dynamics into account may be better equipped to take our likely future preferences into account, rather than optimizing for our current preferences alone.
Would be curious to hear whether anyone here has any thoughts. This is basically a "putting rough ideas together and seeing if they make any sense" kind of paper, aimed at clarifying the hypothesis and seeing whether others kind find any obvious holes in it, rather than being at the stage of a serious scientific theory yet.
The Philosophical Implications of Quantum Information Theory
I was asked to write up a pithy summary of the upshot of this paper. This is the best I could manage.
One of the most remarkable features of the world we live in is that we can make measurements that are consistent across space and time. By "consistent across space" I mean that you and I can look at the outcome of a measurement and agree on what that outcome was. By "consistent across time" I mean that you can make a measurement of a system at one time and then make the same measurement of that system at some later time and the results will agree.
It is tempting to think that the reason we can do these things is that there exists an objective reality that is "actually out there" in some metaphysical sense, and that our measurements are faithful reflections of that objective reality. This hypothesis works well (indeed, seems self-evidently true!) until we get to very small systems, where it seems to break down. We can still make measurements that are consistent across space and time, but as soon as we stop making measurements, then things start to behave very differently than they did before. The classical example of this is the two-slit experiment: whenever we look at a particle we only ever find it in one particular place. When we look continuously, we see the particle trace out an unambiguous and continuous trajectory. But when we don't look, the particle behaves as if it is in more than one place at once, a behavior that manifests itself as interference.
The problem of how to reconcile the seemingly incompatible behavior of physical systems depending on whether or not they are under observation has come to be called the measurement problem. The most common explanation of the measurement problem is the Copenhagen interpretation of quantum mechanics which postulates that the act of measurement changes a system via a process called wave function collapse. In the contemporary popular press you will often read about wave function collapse in conjunction with the phenomenon of quantum entanglement, which is usually referred to as "spooky action at a distance", a phrase coined by Einstein, and intended to be pejorative. For example, here's the headline and first sentence of the above piece:
More evidence to support quantum theory’s ‘spooky action at a distance’
It’s one of the strangest concepts in the already strange field of quantum physics: Measuring the condition or state of a quantum particle like an electron can instantly change the state of another electron—even if it’s light-years away." (emphasis added)
This sort of language is endemic in the popular press as well as many physics textbooks, but it is demonstrably wrong. The truth is that measurement and entanglement are actually the same physical phenomenon. What we call "measurement" is really just entanglement on a large scale. If you want to see the demonstration of the truth of this statement, read the paper or watch the video or read the original paper on which my paper and video are based. Or go back and read about Von Neumann measurements or quantum decoherence or Everett's relative state theory (often mis-labeled "many-worlds") or relational quantum mechanics or the Ithaca interpretation of quantum mechanics, all of which turn out to be saying exactly the same thing.
Which is: the reason that measurements are consistent across space and time is not because these measurements are a faithful reflection of an underlying objective reality. The reason that measurements are consistent across space and time is because this is what quantum mechanics predicts when you consider only parts of the wave function and ignore other parts.
Specifically, it is possible to write down a mathematical description of a particle and two observers as a quantum mechanical system. If you ignore the particle (this is a formal mathematical operation called a partial trace of an operator matrix ) what you are left with is a description of the observers. And if you then apply information theoretical operations to that, what pops out is that the two observers are in classically correlated states. The exact same thing happens for observations made of the same particle at two different times.
The upshot is that nothing special happens during a measurement. Measurements are not instantaneous (though they are very fast ) and they are in principle reversible, though not in practice.
The final consequence of this, the one that grates most heavily on the intuition, is that your existence as a classical entity is an illusion. Because measurements are not a faithful reflection of an underlying objective reality, your own self-perception (which is a kind of measurement) is not a faithful reflection of an underlying objective reality either. You are not, in point of metaphysical fact, made of atoms. Atoms are a very (very!) good approximation to the truth, but they are not the truth. At the deepest level, you are a slice of the quantum wave function that behaves, to a very high degree of approximation, as if it were a classical system but is not in fact a classical system. You are in a very real sense living in the Matrix, except that the Matrix you are living in is running on a quantum computer, and so you -- the very close approximation to a classical entity that is reading these words -- can never "escape" the way Neo did.
As a corollary to this, time travel is impossible, because in point of metaphysical fact there is no time. Your perception of time is caused by the accumulation of entanglements in your slice of the wave function, resulting in the creation of information that you (and the rest of your classically-correlated slice of the wave function) "remember". It is those memories that define the past, you could even say create the past. Going "back to the past" is not merely impossible it is logically incoherent, no different from trying to construct a four-sided triangle. (And if you don't buy that argument, here's a more prosaic one: having a physical entity suddenly vanish from one time and reappear at a different time would violate conservation of energy.)
Open Thread Feb 22 - Feb 28, 2016
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
[spoilers] EY's “A Girl Corrupted...?!” new story is an allegorical study of quantum immortality?
If you haven't read "A Girl Corrupted by the Internet is the Summoned Hero?!” yet, you should.
Spoilers ahead:
Continuing...
The Spell summons the hero with the best chance of defeating the Evil Emperor. This sounds like Quantum Immortality...
Specifically: Imagine the set of all possible versions of myself that are alive 50 years in the future, in the year 2066. My conscious observation at that point tends to summon the self most likely to be alive in 2066.
To elaborate: Computing all possible paths forward from the present moment to 2066 results in a HUGE set of possible future-selves that exist in 2066. But, some are more likely than others. For example, there will be a bunch of paths to a high-probability result, where I worked a generic middle-class job for years but don't clearly remember a lot of the individual days. There will also be a few paths where I do low-probability things. Thus, a random choice from that HUGE set will tend to pick a generic (high-probability) future self.
But, my conscious awareness observes one life path, not one discrete moment in the future. Computing all possible paths forward from the present moment to the end of the Universe results in a HUGE x HUGE set of possible life-paths, again with considerable overlap. My consciousness tends to pick a high-probability path.
In the story, a hero with a 100% probability of victory exists, so that hero is summoned. The hero observing their own probability of victory ensures they converge on a 100% probability of victory.
In real life, life paths with infinite survival time exist, so these life paths tend to be chosen. Observing one's own probability of infinite survival ensures convergence on 100% survival probability.
In the story, other characters set up conditions such that a desired outcome was the most likely one, by resolving to let a summoned hero with certain traits win easily.
In real life, an equivalent is the quantum suicide trick: resolving to kill oneself if certain conditions are not met ensures that the life path observed is one where those conditions are met.
In the story, a demon is summoned, and controlled when the demon refuses to fulfill its duty. Control of the demon was guaranteed by 100% probability of victory.
In real life, AI is like a demon, with the power to grant wishes but with perverse and unpredictable consequences that become worse with more powerful demons summoned. But, a guarantee of indefinite survival ensures that this demon will not end my consciousness. There are many ways this could go wrong. But, I desire to create as many copies of my mind as possible, but only in conditions where these copies could have lives at least as good as my own, so assuming I have some power to increase how quickly copies of my mind are created, and assuming I might be a mind-copy created in this way, this suggests that the most likely Universe for me to find myself in (out of the set of all possible Universes) is one in which the AI and I cooperate to create a huge number of long-lived copies of my mind.
tl;dr: AI Safety is guaranteed by Quantum Immortality. P.S. God's promise to Abraham that his descendants will be "beyond number" is fulfilled.
Estimating the probability of human extinction
I'm looking for feedback on the following idea. The article from which it's been excerpted can be found here: http://ieet.org/index.php/IEET/more/torres20120213
"But not only has the number of scenarios increased in the past 71 years, many riskologists believe that the probability of a global disaster has also significantly risen. Whereas the likelihood of annihilation for most of our species’ history was extremely low, Nick Bostrom argues that “setting this probability lower than 25% [this century] would be misguided, and the best estimate may be considerably higher.” Similarly, Sir Martin Rees claims that a civilization-destroying event before the year 02100 is as likely as getting a “heads” after flipping a coin. These are only two opinions, of course, but to paraphrase the Russell-Einstein Manifesto, my experience confirms that those who know the < most tend to be the most gloomy.
"I [would] argue that Rees’ figure is plausible. To adapt a maxim from the philosopher David Hume, wise people always proportion their fears to the best available evidence, and when one honestly examines this evidence, one finds that there really is good reason for being alarmed. But I also offer a novel — to my knowledge — argument for why we may be systematically underestimating the overall likelihood of doom. In sum, just as a dog can’t possibly comprehend any of the natural and anthropogenic risks mentioned above, so too could there be risks that forever lie beyond our epistemic reach. All biological brains have intrinsic limitations that constrain the library of concepts to which one has access. And without concepts, one can’t mentally represent the external world. It follows that we could be “cognitively closed” to a potentially vast number of cosmic risks that threaten us with total annihilation. This being said, one might argue that such risks, if they exist at all, must be highly improbable, since Earth-originating life has existed for some 3.5 billion years without an existential catastrophe having happened. But this line of reasoning is deeply flawed: it fails to take into account that the only worlds in which observers like us could find ourselves are ones in which such a catastrophe has never occurred. It follows that a record of past survival on our planetary spaceship provides no useful information about the probability of certain existential disasters happening in the future. The facts of cognitive closure plus the observation selection effect suggest that our probability conjectures of total annihilation may be systematically underestimated, perhaps by a lot."
Thoughts?
Open Thread Feb 16 - Feb 23, 2016
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
Cheerful one-liners and disjointed anecdotes
It would be good to have a way of telling people what they should expect from jobs - especially "intellectual" jobs - they consider taking. NOT how easy or lousy the work is going to turn out, just what might happen and approximately what do they have to do, so that they will decide if they want this.
Rationality Reading Group: Part T: Science and Rationality
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Part T: Science and Rationality (pp. 1187-1265) and Interlude: A Technical Explanation of Technical Explanation (pp. 1267-1314). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
T. Science and Rationality
243. The Failures of Eld Science - A short story set in the same world as "Initiation Ceremony". Future physics students look back on the cautionary tale of quantum physics.
244. The Dilemma: Science or Bayes? - The failure of first-half-of-20th-century-physics was not due to straying from the scientific method. Science and rationality - that is, Science and Bayesianism - aren't the same thing, and sometimes they give different answers.
245. Science Doesn't Trust Your Rationality - The reason Science doesn't always agree with the exact, Bayesian, rational answer, is that Science doesn't trust you to be rational. It wants you to go out and gather overwhelming experimental evidence.
246. When Science Can't Help - If you have an idea, Science tells you to test it experimentally. If you spend 10 years testing the idea and the result comes out negative, Science slaps you on the back and says, "Better luck next time." If you want to spend 10 years testing a hypothesis that will actually turn out to be right, you'll have to try to do the thing that Science doesn't trust you to do: think rationally, and figure out the answer before you get clubbed over the head with it.
247. Science Isn't Strict Enough - Science lets you believe any damn stupid idea that hasn't been refuted by experiment. Bayesianism says there is always an exactly rational degree of belief given your current evidence, and this does not shift a nanometer to the left or to the right depending on your whims. Science is a social freedom - we let people test whatever hypotheses they like, because we don't trust the village elders to decide in advance - but you shouldn't confuse that with an individual standard of rationality.
248. Do Scientists Already Know This Stuff? - No. Maybe someday it will be part of standard scientific training, but for now, it's not, and the absence is visible.
249. No Safe Defense, Not Even Science - Why am I trying to break your trust in Science? Because you can't think and trust at the same time. The social rules of Science are verbal rather than quantitative; it is possible to believe you are following them. With Bayesianism, it is never possible to do an exact calculation and get the exact rational answer that you know exists. You are visibly less than perfect, and so you will not be tempted to trust yourself.
250. Changing the Definition of Science - Many of these ideas are surprisingly conventional, and being floated around by other thinkers. I'm a good deal less of a lonely iconoclast than I seem; maybe it's just the way I talk.
251. Faster Than Science - Is it really possible to arrive at the truth faster than Science does? Not only is it possible, but the social process of science relies on scientists doing so - when they choose which hypotheses to test. In many answer spaces it's not possible to find the true hypothesis by accident. Science leaves it up to experiment to socially declare who was right, but if there weren't some people who could get it right in the absence of overwhelming experimental proof, science would be stuck.
252. Einstein's Speed - Albert was unusually good at finding the right theory in the presence of only a small amount of experimental evidence. Even more unusually, he admitted it - he claimed to know the theory was right, even in advance of the public proof. It's possible to arrive at the truth by thinking great high-minded thoughts of the sort that Science does not trust you to think, but it's a lot harder than arriving at the truth in the presence of overwhelming evidence.
253. That Alien Message - Einstein used evidence more efficiently than other physicists, but he was still extremely inefficient in an absolute sense. If a huge team of cryptographers and physicists were examining a interstellar transmission, going over it bit by bit, we could deduce principles on the order of Galilean gravity just from seeing one or two frames of a picture. As if the very first human to see an apple fall, had, on the instant, realized that its position went as the square of the time and that this implied constant acceleration.
254. My Childhood Role Model - I looked up to the ideal of a Bayesian superintelligence, not Einstein.
255. Einstein's Superpowers - There's an unfortunate tendency to talk as if Einstein had superpowers - as if, even before Einstein was famous, he had an inherent disposition to be Einstein - a potential as rare as his fame and as magical as his deeds. Yet the way you acquire superpowers is not by being born with them, but by seeing, with a sudden shock, that they are perfectly normal.
256. Class Project - The students are given one month to develop a theory of quantum gravity.
Interlude: A Technical Explanation of Technical Explanation
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Ends: An Introduction (pp. 1321-1325) and Part U: Fake Preferences (pp. 1329-1356). The discussion will go live on Wednesday, 24 February 2016, right here on the discussion forum of LessWrong.
[Link] How I Escaped The Darkness of Mental Illness
Value learners & wireheading
Dewey 2011 lays out the rules for one kind of agent with a mutable value system. The agent has some distribution over utility functions, which it has rules for updating based on its interaction history (where "interaction history" means the agent's observations and actions since its origin). To choose an action, it looks through every possible future interaction history, and picks the action that leads to the highest expected utility, weighted both by the possibility of making that future happen and the utility function distribution that would hold if that future came to pass.
We might motivate this sort of update strategy by considering a sandwich-drone bringing you a sandwich. The drone can either go to your workplace, or go to your home. If we think about this drone as a value-learner, then the "correct utility function" depends on whether you're at work or at home - upon learning your location, the drone should update its utility function so that it wants to go to that place. (Value learning is unnecessarily indirect in this case, but that's because it's a simple example.)
Suppose the drone begins its delivery assigning equal measure to the home-utility-function and to the work-utility-function (i.e. ignorant of your location), and can learn your location for a small cost. If the drone evaluated this idea with its current utility function, it wouldn't see any benefit, even though it would in fact deliver the sandwich properly - because under its current utility function there's no point to going to one place rather than the other. To get sensible behavior, and properly deliver your sandwich, the drone must evaluate actions based on what utility function it will have in the future, after the action happens.
If you're familiar with how wireheading or quantum suicide look in terms of decision theory, this method of deciding based on future utility functions might seem risky. Fortunately, value learning doesn't permit wireheading in the traditional sense, because the updates to the utility function are an abstract process, not a physical one. The agent's probability distribution over utility functions, which is conditional on interaction histories, defines which actions and observations are allowed to change the utility function during the process of predicting expected utility.
Dewey also mentions that so long as the probability distribution over utility functions is well-behaved, you cannot deliberately take action to raise the probability of one of the utility functions being true. But I think this is only useful to safety when we understand and trust the overarching utility function that gets evaluated at the future time horizon. If instead we start at the present, and specify a starting utility function and rules for updating it based on observations, this complex system can evolve in surprising directions, including some wireheading-esque behavior.
The formalism of Dewey 2011 is, at bottom, extremely simple. I'm going to be a bad pedagogue here: I think this might only make sense if you go look at equations 2 and 3 in the paper, and figure out what all the terms do, and see how similar they are. The cheap summary is that if your utility is a function of the interaction history, trying to change utility functions based on interaction history just gives you back a utility function. If we try to think about what sort of process to use to change an agent's utility function, this formalism provides only one tool: look out to some future time horizon, and define an effective utility function in terms of what utility functions are possible at that future time horizon. This is different from the approximations or local utility functions we would like in practice.
If we take this scheme and try to approximate it, for example by only looking N steps into the future, we run into problems; the agent will want to self-modify so that next timestep it only looks ahead N-1 steps, and then N-2 steps, and so on. Or more generally, many simple approximation schemes are "sticky" - from inside the approximation, an approximation that changes over time looks like undesirable value drift.
Common sense says this sort of self-sabotage should be eliminable. One should be able to really care about the underlying utility function, not just its approximation. However, this problem tends to crop up, for example whenever the part of the future you look at does not depend on which action you are considering; modifying to keep looking at the same part of the future unsurprisingly improve the results you get in that part of the future. If we want to build a paperclip maximizer, it shouldn't be necessary to figure out every single way to self-modify and penalize them appropriately.
We might evade this particular problem using some other method of approximation that does something more like reasoning about actions than reasoning about futures. The reasoning doesn't have to be logically impeccable - we might imagine an agent that identifies a small number of salient consequences of each action, and chooses based on those. But it seems difficult to show how such an agent would have good properties. This is something I'm definitely interested in.
One way to try to make things concrete is to pick a local utility function and specify rules for changing it. For example, suppose we wanted an AI to flag all the 9s in the MNIST dataset. We define a single-time-step utility function by a neural network that takes in the image and the decision of whether to flag or not, and returns a number between -1 and 1. This neural network is deterministically trained for each time step on all previous examples, trying to assign 1 to correct flaggings and -1 to mistakes. Remember, this neural net is just a local utility function - we can make a variety of AI designs involving it. The goal of this exercise is to design an AI that seems liable to make good decisions in order to flag lots of 9s.
The simplest example is the greedy agent - it just does whatever has a high score right now. This is pretty straightforward, and doesn't wirehead (unless the scoring function somehow encodes wireheading), but it doesn't actually do any planning - 100% of the smarts have to be in the local evaluation, which is really difficult to make work well. This approach seems unlikely to extend well to messy environments.
Since Go-playing AI is topical right now, I shall digress. Successful Go programs can't get by with only smart evaluations of the current state of the board, they need to look ahead to future states. But they also can't look all the way until the ultimate time horizon, so they only look a moderate way into the future, and evaluate that future state of the board using a complicated method that tries to capture things important to planning. In sufficiently clever and self-aware agents, this approximation would cause self-sabotage to pop up. Even if the Go-playing AI couldn't modify itself to only care about the current way it computes values of actions, it might make suboptimal moves that limit its future options, because its future self will compute values of actions the 'wrong' way.
If we wanted to flag 9s using a Dewian value learner, we might score actions according to how good they will be according to the projected utility function at some future time step. If this is done straightforwardly, there's a wireheading risk - the changes to its utility function are supplied by humans who might be influenced by actions. I find it useful to apply a sort of "magic button" test - if the AI had a magic button that could rewrite human brains, would it pressing that button have positive expected utility for it? If yes, then this design has problems, even though in our current thought experiment it's just flagging pictures.
To eliminate wireheading, the value learner can use a model of the future inputs and outputs and the probability of different value updates given various inputs and outputs, which doesn't model ways that actions could influence the utility updates. This model doesn't have to be right, it just has to exist. On one hand, this seems like a sort of weird doublethink, to judge based on a counterfactual where your actions don't have impacts you could otherwise expect. On the other hand, it also bears some resemblance to how we actually reason about moral information. Regardless, this agent will now not wirehead, and will want to get good results by learning about the world, if only in the very narrow sense of wanting to play unscored rounds that update its value function. If its value function and value updating made better use of unlabeled data, it would also want to learn about the world in the broader sense.
Overall I am somewhat frustrated, because value learners have these nice properties, but are computationally unrealistic and do not play well with approximation. One can try to get the nice properties elsewhere, such as relying on an action-suggester to not suggest wireheading, but it would be nice to be able to talk about this as an approximation to something fancier.
Learning Mathematics in Context
I have almost no direct knowledge of mathematics. I took various mathematics courses in school, but I put in the minimal amount of effort required to pass and immediately forgot everything afterwards.
When people learn foreign languages, they often learn vocabulary and grammar out of context. They drill vocabulary and grammar in terms of definitions and explanations written in their native language. I, however, have found this to be intolerably boring. I'm conversational in Japanese, but every ounce of my practice came in context: either hanging out with Japanese friends who speak limited English, or watching shows and adding to Anki new words or sentence structures I encounter.
I'm convinced that humans must spike their blood sugar and/or pump their body full of stimulants such as caffeine in order to get past the natural tendency to find it unbearably dull to memorize words and syntax by rote and lifeless connection with the structures in their native language.
I've tried to delve into some mathematics recently, but I get the impression that most of the expositions fall into one of two categories: Either (1) they assume that I'm a student powering my day with coffee and chips and that I won't find it unusual if I'm supposed to just trust that once I spend 300 hours pushing arbitrary symbols around I'll end up with some sort of insight. Or (2) they do enter the world of proper epistemological explanations and deep real-world relevance, but only because they expect that I'm already quite well-versed in various background information.
I don't want an introduction that assumes I'm the average unthinking student, and I don't want an exposition that expects me to understand five different mathematical fields before I can read it. What I want seems likely to be uncommon enough that I might as well simply say: I don't care what field it is; I just want to jump into something which assumes no specifically mathematical background knowledge but nevertheless delves into serious depths that assume a thinking mind and a strong desire for epistemological sophistication.
I bought Calculus by Michael Spivak quite a while ago because the Amazon reviews led me to believe it may fit these considerations. I don't know whether that's actually the case or not though, as I haven't tried reading it yet.
Any suggestions would be appreciated.
Study partner matching thread
Nate Soares recommends pairing up when studying, so I figured it would be useful to facilitate that.
If you are looking for a study partner, please post a top-level comment saying:
- What you want to study
- Your level of relevant background knowledge
- If you have sources in mind (MOOCs, textbooks, etc), what those are
- Your time zone
[Link] Video Presentation: Rationality 101 for Secular People
Secular people are a natural target group for pitching rationality, since they don't suffer from one of the most debilitating forms of irrationality and also because they have warm fuzzies toward the concept of reason. From reason, it's easy to transition toward what it would be reasonable to do, namely be reasonable about how our minds work and how we should improve them. I did a Rationality 101 for Secular People presentation that was pretty successful, with a number of people following up and showing an interest in gaining further rationality knowledge. Here's a video of the presentation I made, and it has the PP slides I made uploaded into SlideShare. Anyone who wishes to do so is free to use these materials for their own needs, whether sharing the video with secular friends or doing a version of this workshop for local secular groups.
Tackling the subagent problem: preliminary analysis
A putative new idea for AI control; index here.
Status: preliminary. This mainly to put down some of the ideas I've had, for later improvement or abandonment.
The subagent problem, in a nutshell, is that "create a powerful subagent with goal U that takes over the local universe" is a solution for many of the goals an AI could have - in a sense, the ultimate convergent instrumental goal. And it tends to evade many clever restrictions people try to program into the AI (eg "make use of only X amount of negentropy", "don't move out of this space").
So if the problem could be solved, many other control approaches could be potentially available.
The problem is very hard, because an imperfect definition of a subagent is simply an excuse to create an a subagent that skirts the limits of that definition (hum, that style of problem sounds familiar). For instance, if we want to rule out subagents by preventing the AI from having much influence if the AI itself were to stop ("If you die, you fail, no other can continue your quest"), then it is motivated to create powerful subagents that carefully reverse their previous influence if the AI were to be destroyed.
Controlling subagents
Some of the methods I've developed seem suitable for controlling the existence or impact of subagents.
- Reduced impact methods can prevent subagents from being created, by requiring that the AI's interventions be non-disruptive ("Twenty million questions") or undetectable.
- Reducing the AI's output options to a specific set can prevent them from being able to create any in the first place.
- Various methods around detecting importance can be used to ensure that, though subagents may exist, they won't be very influential.
- Pre-corriged methods can be used to ensure that any subagents remain value aligned with the original agent. Then, if there is some well-defined "die" goal for the agent, this could take all the agents with them.
These can be thought as ruling out the agent's existence, their creation, their influence (or importance) and their independence. The last two can be particularly tricky, as we want to make sure that our formal definition of importance matches up with our informal one, and we currently lack a well defined "die" goal.
We could also think of defining identity by using some of the tricks and restrictions that have caused humans to develop one (such as our existing in a single body with no east of copying), but it's not clear that this definition would remain stable once the restrictions were lifted (and it's not clear that a sense of identity prevents the creation of subagents in the first place).
Subagents processing information
Here I want to look at one other aspect of the subagents, the fact that they are subagents, and, as such, do some of the stuff that agents do - such as processing information and making decisions. Can we use the information processing as a definition?
Consider the following model. Our lovely Clippy wants to own a paperclip. They know that it exists behind one of a hundred doors; opening one of them seals all the others, for ever. In a few minutes, Clippy will be put to sleep, but it has a simple robot that it can program to go and open one of the doors and recuperate the paperclip for it.
Clippy currently doesn't know where the paperclip is, but it knows that its location will be announced a few seconds after Clippy sleeps. The robot includes a sound recording system inside it.
It seems there are two clear levels of agency the robot could have: either it goes to a random door, or it processes the announcement, to pick the correct door. In the first case, the robot and Clippy have a 1% chance of getting the paperclip; in the second, a 100% chance. The distributions of outcomes is clearly different.
But now suppose there is a human guard longing around, trying to prevent the robot from getting to any door. If the robot has a neutron bomb, it can just kill the human without looking. If it has a giant cannon, it can simply shoot the guard where they stand. If it has neither of these, then it can attempt to bargain with the guard, negotiating, seducing and tricking them in the course of a conversation. One the guard is removed, the situation reverts to the above.
Each of the three methods requires different level of information processing (different levels of optimisation power, in essence). We can add more guards, make them more or less competent, add other agents with other goals, add more ways of achieving the paperclip, and so on, to grade how much information processing the robot has.
To calibrate such a scale, we could use the upper bound as "Clippy moves into the robot and controls it fully" to represent the robot having maximal information processing power (and a very focused outcome space). And we could use some pre-prepared actions (such as the robot randomly choosing a door) to calibrate the lower end. The aim is to construct a definition of information processing that could be used to define the existence of (effective) subagents.
Still feels likely that this will fail, though, without something more.
Open Thread, January 4-10, 2016
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
January 2016 Media Thread
This is the monthly thread for posting media of various types that you've found that you enjoy. Post what you're reading, listening to, watching, and your opinion of it. Post recommendations to blogs. Post whatever media you feel like discussing! To see previous recommendations, check out the older threads.
Rules:
- Please avoid downvoting recommendations just because you don't personally like the recommended material; remember that liking is a two-place word. If you can point out a specific flaw in a person's recommendation, consider posting a comment to that effect.
- If you want to post something that (you know) has been recommended before, but have another recommendation to add, please link to the original, so that the reader has both recommendations.
- Please post only under one of the already created subthreads, and never directly under the parent media thread.
- Use the "Other Media" thread if you believe the piece of media you want to discuss doesn't fit under any of the established categories.
- Use the "Meta" thread if you want to discuss about the monthly media thread itself (e.g. to propose adding/removing/splitting/merging subthreads, or to discuss the type of content properly belonging to each subthread) or for any other question or issue you may have about the thread or the rules.
Survey: What's the most negative*plausible cryonics-works story that you know?
Warning: people will be trying to be pessimistic here. Don't read this if you don't want to be reminded of scary outcomes.
Request: if you get an idea that you think might be too scary to post publicly even under the above warning, but you are willing to send it to me in a private message to aid in my personal decision-making, then please do :)
Motivation:
I like cryonics. According to my parents and grandmother, I started talking about building an AI to help with medical research to revive frozen dead people when I was about 10 years old, and my memory agrees. I began experimenting with freeing and unfreezing insects, and figured based on some positive results that it was physically possible to preserve life in a frozen state. Cool!
But now that I'm in middle of convincing some folks I know to sign up for cryonics, I want to do due-diligence on some of the vague, hard-to-verbalize aversions they have to doing it. This way, I can help them plan contingencies for / hedges against those aversions if possible, thereby making cryonics more viable for them, and maybe avoid accidentally persuading people do cryonics when it really isn't right for them (yes, I think that can actually happen).
There's already been a post on far negative outcomes, and another one on why cryonics maybe isn't worth it. But what I really want to do here is conduct an interactive survey to compute which disutilities should be taken most seriously when talking to a new person about cryonics, to avoid accidentally persuading them into making a wrong-for-them decision.
And for that, what I really want to ask is:
What's the most negative*plausible cryonics-works story that you know of?
Examples:
(1) A well-meaning but slightly-too-obsessed cryonics scientist wakes up some semblance of me in a semi-conscious virtual delirium for something like 1000 very unpleasant subjective years of tinkering to try recovering me. She eventually quits, and I never wake up again.
(2) A rich sadist finds it somehow legally or logistically easier to lay hands on the brains/minds of cryonics patients than of living people, and runs some virtual torture scenarios on me where I'm not allowed to die for thousands of subjective years or more.
I think on reflection I'd consider (1) to be around 10x and maybe 100x more likely than (2)*, but depending on your preferences, you might find (2) to be more than 100x worse than (1), enough to make it account for the biggest chunk of disutility that can be attributed to any particular simple story or story-feature where cryonics works.
[* I would have said (1) was definitely more than 100x more likely before so many of my female friends have, over the years, mentioned that they were subject to some pretty scary sexual violence at some point in their dating lives.]
(Note: There's a separate question of whether the outcome is positive enough to be worth the money, which I'd rather discuss in a different thread.)
How to participate:
- Top-level comments = stories. Post your most negative*plausible story or story-feature as a top-level comment.
- A top-level upvote shall mean "essentially in my top-three". Upvote stories that you'd consider essentially the same as one of your top-two stories, ranked by negativity*probability. This means you can vote more than three times if your top stories get represented in variety of ways, so don't be shy.
- Lower-level comments = discussion! Let's disagree about the relative probabilities and negativities of things and maybe change some of our minds!
Thanks for playing :)
PS I hope folks use these ideas to come up with ways to decrease the likelihood that cryonics leads to negative outcomes, and not to cause or experience premature fears that derail productive conversations. So, please don't share/post this in ways where you think it might have the latter effect, but rather, use it as a part of a sane and thorough evaluation of all the pros and cons that one should reasonably consider in deciding whether cryonics working is on-net a positive outcome.
ETA -- What not to post:
Some non-examples of what this survey should contain...
- Examples where you don't get revived in any way. These scenarios factor into the "will cryonics work for me" question, a question of probability that does not depend on your values, which I'd prefer to discuss is a separate thread because probabilities are easier to converge on without distracting ourselves with values questions.
Rationality Reading Group: Part P: Reductionism 101
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Part P: Reductionism (pp. 887-935). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
P. Reductionism 101
189. Dissolving the Question - This is where the "free will" puzzle is explicitly posed, along with criteria for what does and does not constitute a satisfying answer.
190. Wrong Questions - Where the mind cuts against reality's grain, it generates wrong questions - questions that cannot possibly be answered on their own terms, but only dissolved by understanding the cognitive algorithm that generates the perception of a question.
191. Righting a Wrong Question - When you are faced with an unanswerable question - a question to which it seems impossible to even imagine an answer - there is a simple trick which can turn the question solvable. Instead of asking, "Why do I have free will?", try asking, "Why do I think I have free will?"
192. Mind Projection Fallacy - E. T. Jaynes used the term Mind Projection Fallacy to denote the error of projecting your own mind's properties into the external world. The Mind Projection Fallacy generalizes as an error. It is in the argument over the real meaning of the word sound, and in the magazine cover of the monster carrying off a woman in the torn dress, and Kant's declaration that space by its very nature is flat, and Hume's definition of a priori ideas as those "discoverable by the mere operation of thought, without dependence on what is anywhere existent in the universe"...
193. Probability is in the Mind - Probabilities express uncertainty, and it is only agents who can be uncertain. A blank map does not correspond to a blank territory. Ignorance is in the mind.
194. The Quotation is Not the Referent - It's very easy to derive extremely wrong conclusions if you don't make a clear enough distinction between your beliefs about the world, and the world itself.
195. Qualitatively Confused - Using qualitative, binary reasoning may make it easier to confuse belief and reality; if we use probability distributions, the distinction is much clearer.
196. Think Like Reality - "Quantum physics is not "weird". You are weird. You have the absolutely bizarre idea that reality ought to consist of little billiard balls bopping around, when in fact reality is a perfectly normal cloud of complex amplitude in configuration space. This is your problem, not reality's, and you are the one who needs to change."
197. Chaotic Inversion - If a problem that you're trying to solve seems unpredictable, then that is often a fact about your mind, not a fact about the world. Also, this feeling that a problem is unpredictable can stop you from trying to actually solve it.
198. Reductionism - We build models of the universe that have many different levels of description. But so far as anyone has been able to determine, the universe itself has only the single level of fundamental physics - reality doesn't explicitly compute protons, only quarks.
199. Explaining vs. Explaining Away - Apparently "the mere touch of cold philosophy", i.e., the truth, has destroyed haunts in the air, gnomes in the mine, and rainbows. This calls to mind a rather different bit of verse:
One of these things Is not like the others One of these things Doesn't belong
The air has been emptied of its haunts, and the mine de-gnomed—but the rainbow is still there!
200. Fake Reductionism - There is a very great distinction between being able to see where the rainbow comes from, and playing around with prisms to confirm it, and maybe making a rainbow yourself by spraying water droplets, versus some dour-faced philosopher just telling you, "No, there's nothing special about the rainbow. Didn't you hear? Scientists have explained it away. Just something to do with raindrops or whatever. Nothing to be excited about." I think this distinction probably accounts for a hell of a lot of the deadly existential emptiness that supposedly accompanies scientific reductionism.
201. Savannah Poets - Equations of physics aren't about strong emotions. They can inspire those emotions in the mind of a scientist, but the emotions are not as raw as the stories told about Jupiter (the god). And so it might seem that reducing Jupiter to a spinning ball of methane and ammonia takes away some of the poetry in those stories. But ultimately, we don't have to keep telling stories about Jupiter. It's not necessary for Jupiter to think and feel in order for us to tell stories, because we can always write stories with humans as its protagonists.
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part Q: Joy in the Merely Real (pp. 939-979). The discussion will go live on Wednesday, 30 December 2015, right here on the discussion forum of LessWrong.
Agent-Simulates-Predictor Variant of the Prisoner's Dilemma
I don't know enough math and I don't know if this is important, but in the hopes that it helps someone figure something out that they otherwise might not, I'm posting it.
In Soares & Fallenstein (2015), the authors describe the following problem:
Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of UDT. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:
"I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear."
The human writes down 9, and UDT, predicting this, prescribes writing down 1. This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.
More precisely: two agents A and B must choose integers m and n with 0 ≤ m, n ≤ 10, and if m + n ≤ 10, then A receives a payoff of m dollars and B receives a payoff of n dollars, and if m + n > 10, then each agent receives a payoff of zero dollars. B has perfect predictive accuracy and A knows that B has perfect predictive accuracy.
Consider a variant of the aforementioned decision problem in which the same two agents A and B must choose integers m and n with 0 ≤ m, n ≤ 3; if m + n ≤ 3, then {A, B} receives a payoff of {m, n} dollars; if m + n > 3, then {A, B} receives a payoff of zero dollars. This variant is similar to a variant of the Prisoner's Dilemma with a slightly modified payoff matrix:

Likewise, A reasons as follows:
If I cooperate, then B will predict that I will cooperate, and B will defect. If I defect, then B will predict that I will defect, and B will cooperate. Therefore, I defect.
And B:
I predict that A will defect. Therefore, I cooperate.
I figure it's good to have multiple takes on a problem if possible, and that this particular take might be especially valuable, what with all of the attention that seems to get put on the Prisoner's Dilemma and its variants.
= 783df68a0f980790206b9ea87794c5b6)

Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)