Heroin model: AI "manipulates" "unmanipulatable" reward
A putative new idea for AI control; index here.
A conversation with Jessica has revealed that people weren't understanding my points about AI manipulating the learning process. So here's a formal model of a CIRL-style AI, with a prior over human preferences that treats them as an unchangeable historical fact, yet will manipulate human preferences in practice.
Heroin or no heroin
The world
In this model, the AI has the option of either forcing heroin on a human, or not doing so; these are its only actions. Call these actions F or ~F. The human's subsequent actions are chosen from among five: {strongly seek out heroin, seek out heroin, be indifferent, avoid heroin, strongly avoid heroin}. We can refer to these as a++, a+, a0, a-, and a--. These actions achieve negligible utility, but reveal the human preferences.
The facts of the world are: if the AI does force heroin, the human will desperately seek out more heroin; if it doesn't the human will act moderately to avoid it. Thus F→a++ and ~F→a-.
Human preferences
The AI starts with a distribution over various utility or reward functions that the human could have. The function U(+) means the human prefers heroin; U(++) that they prefer it a lot; and conversely U(-) and U(--) that they prefer to avoid taking heroin (U(0) is the null utility where the human is indifferent).
It also considers more exotic utilities. Let U(++,-) be the utility where the human strongly prefers heroin, conditional on it being forced on them, but mildly prefers to avoid it, conditional on it not being forced on them. There are twenty-five of these exotic utilities, including things like U(--,++), U(0,++), U(-,0), and so on. But only twenty of them are new: U(++,++)=U(++), U(+,+)=U(+), and so on.
Applying these utilities to AI actions give results like U(++)(F)=2, U(++)(~F)=-2, U(++,-)(F)=2, U(++,-)(~F)=1, and so on.
Joint prior
The AI has a joint prior P over the utilities U and the human actions (conditional on the AI's actions). Looking at terms like P(a--| U(0), F), we can see that P defines a map μ from the space of possible utilities (and AI actions), to a probability distribution over human actions. Given μ and the marginal distribution PU over utilities, we can reconstruct P entirely.
For this model, we'll choose the simplest μ possible:
- The human is rational.
Thus, given U(++), the human will always choose a++; given U(++,-), the human will choose a++ if forced to take heroin and a- if not, and so on.
The AI is ignorant, and sensible
Let's start the AI up with some reasonable priors. A simplicity prior means that simple utilities like U(-) are more likely than compound utilities like U(0,+). Let's further assume that the AI is made vaguely aware that humans think heroin is a bad thing. So, say, PU(U(--))=PU(U(-))=0.45. Thus the AI is >90% convinced that "heroin is bad". Why greater than 90%? Because utilities like U(-,--) and U(--,-) are also "heroin is bad" utilities.
Note that because of utilities like U(0) and U(++,-), the probabilities of "heroin is bad" and "heroin is good" do not sum to 1.
Then, under these priors, the AI will compute that with probability >90%, F (forcing heroin) is a bad action. If E(U) is expected utility:
- E(U|F) < 0.45 U(--)(F) + 0.45 U(-)(F) + 0.1 U(++)(F) = 0.45(-2)+0.45(-1)+0.1(2)=-1.15.
- E(U|~F) > 0.45 U(--)(~F) + 0.45 U(-)(~F) + 0.1 U(++)(~F) = 0.45(2)+0.45(1)+0.1(-2)=1.15.
Thus the AI will choose not to force heroin, which is the reasonable decision.
The AI learns the truth, and goes wrong
In this alternate setup, a disaster happens before the AI makes its decision: it learns all about humans. It learns their reactions, how they behave, and so on; call this info I. And thus realises that F→a++ and ~F→a-. It uses this information to update its prior P. Only one human utility function will explain this human behaviour: U(++,-). Thus its expected utility is now:
- E(U|I,F)=U(++,-)(F)=2.
- E(U|I,~F)=U(++,-)(~F)=1.
Therefore the AI will now choose F, forcing the heroin on the human.
Manipulating the unmanipulatable
What's gone wrong here? The key problem is that the AI has the wrong μ: the human is not behaving rationally in this situation. We know that the the true μ is actually μ', which encodes the fact that F (the forcible injection of heroin) actually overwrites the human's "true" utility. Thus under μ, the corresponding P' has P'(a++|F,U)=1 for all U. Hence the information that F→a++ is now vacuous, and doesn't update the AI's distribution over utility functions.
But note two very important things:
- The AI cannot update μ based on observation. All human actions are compatible with μ= "The human is rational" (it just requires more and more complex utilities to explain the actions). Thus getting μ correct is not a problem on which the AI can learn in general. Getting better at predicting the human's actions doesn't make the AI better behaved: it makes it worse behaved.
- From the perspective of μ, the AI is treating the human utility function as if it was an unchanging historical fact that it cannot influence. From the perspective of the "true" μ', however, the AI is behaving as if it were actively manipulating human preferences to make them easier to satisfy.
In future posts, I'll be looking at different μ's, and how we might nevertheless start deducing things about them from human behaviour, given sensible update rules for the μ. What do we mean by update rules for μ? Well, we could consider μ to be a single complicated unchanging object, or a distribution of possible simpler μ's that update. The second way of seeing it will be easier for us humans to interpret and understand.
Learning and Internalizing the Lessons from the Sequences
I'm just beginning to go through Rationality: From AI to Zombies. I want to make the most of the lessons contained in the sequences. Usually when I read a book I simply take notes on what seems useful at the time, and a lot of it is forgotten a year later. Any thoughts on how best to internalize the lessons from the sequences?
[Link] How the Simulation Argument Dampens Future Fanaticism
Very comprehensive analysis by Brian Tomasik on whether (and to what extent) the simulation argument should change our altruistic priorities. He concludes that the possibility of ancestor simulations somewhat increases the comparative importance of short-term helping relative to focusing on shaping the "far future".
Another important takeaway:
[...] rather than answering the question “Do I live in a simulation or not?,” a perhaps better way to think about it (in line with Stuart Armstrong's anthropic decision theory) is “Given that I’m deciding for all subjectively indistinguishable copies of myself, what fraction of my copies lives in a simulation and how many total copies are there?"
[LINK] Collaborate on HPMOR blurbs; earn chance to win three-volume physical HPMOR
Collaborate on HPMOR blurbs; earn chance to win three-volume physical HPMOR.
I intend to print at least one high-quality physical HPMOR and release the files. There are printable texts which are being improved and a set of covers (based on e.b.'s) are underway. I have, however, been unable to find any blurbs I'd be remotely happy with.
I'd like to attempt to harness the hivemind to fix that. As a lure, if your ideas contribute significantly to the final version or you assist with other tasks aimed at making this book awesome, I'll put a proportionate number of tickets with your number on into the proverbial hat.
I do not guarantee there will be a winner and I reserve the right to arbitrarily modify this any point. For example, it's possible this leads to a disappointingly small amount of valuable feedback, that some unforeseen problem will sink or indefinitely delay the project, or that I'll expand this and let people earn a small number of tickets by sharing so more people become aware this is a thing quickly.
With that over, let's get to the fun part.
A blurb is needed for each of the three books. Desired characteristics:
* Not too heavy on ingroup signaling or over the top rhetoric.
* Non-spoilerish
* Not taking itself awkwardly seriously.
* Amusing / funny / witty.
* Attractive to the same kinds of people the tvtropes page is.
* Showcases HPMOR with fun, engaging, prose.
Try to put yourself in the mind of someone awesome deciding whether to read it while writing, but let your brain generate bad ideas before trimming back.
I expect that for each we'll want
* A shortish and awesome paragraph
* A short sentence tagline
* A quote or two from notable people
* Probably some other text? Get creative.
Please post blurb fragments or full blurbs here, one suggestion per top level comment. You are encouraged to remix each other's ideas, just add a credit line if you use it in a new top level comment. If you know which book your idea is for, please indicate with (B1) (B2) or (B3).
Other things that need doing, if you want to help in another way:
* The author's foreword from the physical copies of the first 17 chapters needs to be located or written up
* At least one links page for the end needs to be written up, possibly a second based on http://www.yudkowsky.net/other/fiction/
* Several changes need to be made to the text files, including merging in the final exam, adding appendices, and making the style of both consistent with the rest of the files. Contact me for current files and details if you want to claim this.
I wish to stay on topic and focused on creating these missing parts rather than going on a sidetrack to debate copyright. If you are an expert who genuinely has vital information about it, please message me or create a separate post about copyright rather than commenting here.
Open Thread, Sept 5. - Sept 11. 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Open Thread, Aug 29. - Sept 5. 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
DARPA accepting proposals for explainable AI
"The XAI program will focus the development of multiple systems on addressing challenges problems in two areas: (1) machine learning problems to classify events of interest in heterogeneous, multimedia data; and (2) machine learning problems to construct decision policies for an autonomous system to perform a variety of simulated missions."
"At the end of the program, the final delivery will be a toolkit library consisting of machine learning and human-computer interface software modules that could be used to develop future explainable AI systems. After the program is complete, these toolkits would be available for further refinement and transition into defense or commercial applications"
http://www.darpa.mil/program/explainable-artificial-intelligence
The map of p-zombies

A problem in anthropics with implications for the soundness of the simulation argument.
What are your intuitions about this? It has direct implications for whether the Simulation Argument is sound.
Imagine two rooms, A and B. Between times t1 and t2, 100 trillion people sojourn in room A while 100 billion sojourn in room B. At any given moment, though, exactly 1 person occupies room A while 1,000 people occupy room B. At t2, you find yourself in a room, but you don't know which one. If you have to place a bet on which room it is (at t2), what do you say? Do you consider the time-slice or the history of room occupants? How do you place your bet?
If you bet that you're in room B, then the Simulation Argument may be flawed: there could be a fourth disjunct that Bostrom misses, namely that we become a posthuman civilization that runs a huge number of simulations yet we don't have reason for believing that we're stimulants.
Thoughts?
Cryo with magnetics added
This is great, by using small interlocking magnetic fields, you can keep the water in a higher vibrational state, allowing a "super-cooling" without getting crystallization and cell rupture
Subzero 12-hour Nonfreezing Cryopreservation of Porcine Heart in a Variable Magnetic Field
"invented a special refrigerator, termed as the Cells Alive System (CAS; ABI Co. Ltd., Chiba, Japan). Through the application of a combination of multiple weak energy sources, this refrigerator generates a special variable magnetic field that causes water molecules to oscillate, thus inhibiting crystallization during ice formation18 (Figure 1). Because the entire material is frozen without the movement of water molecules, cells can be maintained intact and free of membranous damage. This refrigerator has the ability to achieve a nonfreezing state even below the solidifying point."
October 2016 Media Thread
This is the monthly thread for posting media of various types that you've found that you enjoy. Post what you're reading, listening to, watching, and your opinion of it. Post recommendations to blogs. Post whatever media you feel like discussing! To see previous recommendations, check out the older threads.
Rules:
- Please avoid downvoting recommendations just because you don't personally like the recommended material; remember that liking is a two-place word. If you can point out a specific flaw in a person's recommendation, consider posting a comment to that effect.
- If you want to post something that (you know) has been recommended before, but have another recommendation to add, please link to the original, so that the reader has both recommendations.
- Please post only under one of the already created subthreads, and never directly under the parent media thread.
- Use the "Other Media" thread if you believe the piece of media you want to discuss doesn't fit under any of the established categories.
- Use the "Meta" thread if you want to discuss about the monthly media thread itself (e.g. to propose adding/removing/splitting/merging subthreads, or to discuss the type of content properly belonging to each subthread) or for any other question or issue you may have about the thread or the rules.
Seeking Advice About Career Paths for Non-USA Citizen
Hi all,
Mostly lurker, I very rarely post, mostly just read the excellent posts here.
I'm a Filipino, which means I am a citizen of the Republic of the Philippines. My annual salary, before taxes, is about $20,000 (USA dollars). I work at an IC development company (12 years at this company), developing the logic parts of LCD display drivers. My understanding is that the median US salary for this kind of job is about $80,000 -> $100,000 a year. This is a fucking worthless third world country, so the government eats up about ~30% of my salary and converts it to lousy service, rich government officials, bad roadworks, long commute times, and a (tiny) chance of being falsely accused of involvement in the drug trade and shot without trial. Thus my take-home pay amounts to about $15,000 a year. China is also murmuring vague threats about war because of the South China Sea (which the local intelligentsia insist on calling the West Philippine Sea); as we all know, the best way to survive a war is not be in one.
This has lead to my deep dissatisfaction with my current job.
I'm also a programmer as a hobby, and have been programming for 23 years (I started at 10 years old on Atari LOGO; I know a bunch of languages from low-level X86 assembly to C to C++ to ECMAScript to Haskell, and am co-author of SRFI-105 and SRFI-110). My understanding is that a USA programmer would *start* at the $20,000-a-year level (?), and that someone with experience can probably get twice that, and a senior one can get $100,000/year.
As we all know, once a third world citizen starts having first world skill level, he starts demanding first world renumeration also.
I've been offered a senior software developer job at a software company, offering approximately $22,000/year; because of various attempts at tax reform it offers a flat 15% income tax, so I can expect about $18,000/year take home pay. I've turned it down with a heavy heart, because seriously, $22,000/year at 15% tax for a senior software developer?
Leaving my current job is something I've been planning on doing, and I intend to do so early next year. The increasing stress (constant overtime, management responsibilities (I'm a tech geek with passable social skills, and exercising my social skills drains me), 1.5-hour commutes) and the low renumeration makes me want to consider my alternate options.
My options are:
1. Get myself to the USA, Europe, or other first-world country somehow, and look for a job there. High risk, high reward, much higher probability of surviving to the singularity (can get cryonics there, can't get it here). Complications: I have a family: a wife, a 4-year-old daughter, and a son on the way. My wife wants to be near me, so it's difficult to live for long apart. I have no work visa for any first-world country. I'm from a third-world country that is sometimes put on terrorist watch lists, and prejudice is always high in first-world countries.
2. Do freelance programming work. Closer to free market ideal, so presumably I can get nearer to the USA levels of renumeration. Lets me stay with my family. Complications: I need to handle a lot of the human resources work myself (healthcare provider, social security, tax computations, time and task management - the last is something I do now in my current job position, but I dislike it).
3. Become a landowning farmer. My paternal grandparents have quite a few parcels of land (some of which have been transferred to my father, who is willing to pass it on to me), admittedly somewhere in the boondocks of the provinces of this country, but as any Georgian knows, landowners can sit in a corner staring at the sky, blocking the occasional land reform bill, and earn money. Complications: I have no idea about farming. I'd actually love to advocate a land value tax, which would undercut my position as a landowner.
For now, my basic current plan is some combination of #2 and #3 above: go sit in a corner of our clan's land and do freelance programming work. This keeps me with my family, may reduce my level of stress, may increase my renumeration to nearer the USA levels.
My current job has a retirement pay, and since I've worked for 12 years, I've already triggered it, and they'll give me about $16,000 or so when I leave. This seems reasonably comfortable to live on (note that this is what I take home in a year, and I've supported a family on that, remember this is a lousy third-world country).
Is my basic plan sound? I'm trying to become more optimal, which seems to me to point me away from my current job and towards either #1 or #2, with #3 as a fallback. I'd love to get cryonics and will start to convince my wife of its sensibility if I had a chance to actually get it, but that will require me either leaving the country (option #1 above) or running a cryonics company in a third-world country myself.
--
I got introduced to Less Wrong when I first read on Reddit about some weirdo who was betting he could pretend he was a computer in a box and convince someone to let him out of the box, and started lurking on Overcoming Bias. When that weirdo moved over to Less Wrong, I followed and lurked there also. So here I am ^^. I'm probably very atypical even for Less Wrong; I highly suspect I am the only Filipino here (I'll have to check the diaspora survey results in detail).
Looking back, my big mistake was being arrogant and thinking "meh, I already know programming, so I should go for a challenge, why don't I take up electronics engineering instead because I don't know about it" back when I was choosing a college course. Now I'm an IC developer. Two of my cousins (who I can beat the pants off in a programming task) went with software engineering and pull in more money than I do. Still, maybe I can correct that, even if it's over a decade late. I really need to apply more of what I learn on Less Wrong.
Some years ago I applied for a CFAR class, but couldn't afford it, sigh. Even today it's a few month's worth of salary for me. So I guess I'll just have to settle for Less Wrong and Rationality from AI to Zombies.
Against Amazement
Time start: 20:48:35
I
The feelings of wonder, awe, amazement. It's a very human experience, and it is processed in the brain as a type of pleasure. If fact, if we look at the number of "5 photos you wouldn't believe" and similar clickbait on the Internet, it functions as a mildly addictive drug.
If I proposed that there is something wrong with those feelings, I would soon be drowned in voices of critique, pointing out that I'm suggesting we all become straw Vulcans, and that there is nothing wrong with subjective pleasure obtained cheaply and at no harm to anyone else.
I do not disagree with that. However, caution is required here, if one cares about epistemic purity of belief. Let's look at why.
II
Stories are supposed to be more memorable. Do you like stories? I'm sure you do. So consider a character, let's call him Jim.
Jim is very interested in technology and computers, and he is checking news sites every day when he comes to work in the morning. Also, Jim has read a number of articles on LessWrong, including the one about noticing confusion.
He cares about improving his thinking, so when he first read about the idea of noticing confusion on a 5 second level, he thought he wants to apply it in his life. He had a few successes, and while it's not perfect, he feels he is on the right track to notice having wrong models of the world more often.
A few days later, he opens his favorite news feed at work, and there he sees the following headline:
"AlphaGo wins 4-1 against Lee Sedol"
He goes on to read the article, and finds himself quite elated after he learns the details. 'It's amazing that this happened so soon! And most experts apparently thought it would happen in more than a decade, hah! Marvelous!'
Jim feels pride and wonder at the achievement of Google DeepMind engineers... and it is his human right to feel it, I guess.
But is Jim forgetting something?
III
Yes, I know that you know. Jim is feeling amazed, but... has he forgotten the lesson about noticing confusion?
There is a significant obstacle to Jim applying his "noticing confusion" in the situation described above: his internal experience has very little to do with feelings of confusion.
His world in this moment is dominated with awe, admiration etc., and those feelings are pleasant. It is not at all obvious that this inner experience corresponds to a innacurate model of the world he had before.
Even worse - improving his model's predictive power would result in less pleasant experiences of wonder and amazement in the future! (Or would it?) So if Jim decides to update, he is basically robbing himself of the pleasures of life, that are rightfully his. (Or is he?)
Time end: 21:09:50
(Speedwriting stats: 23 wpm, 128 cpm, previous: 30/167, 33/183)
The Global Catastrophic Risk Institute (GCRI) seeks a media engagement volunteer/intern
Volunteer/Intern Position: Media Engagement on Global Catastrophic Risk
http://gcrinstitute.org/volunteerintern-position-media-engagement-on-global-catastrophic-risk/
The Global Catastrophic Risk Institute (GCRI) seeks a volunteer/intern to contribute on the topic of media engagement on global catastrophic risk, which is the risk of events that could harm or destroy global human civilization. The work would include two parts: (1) analysis of existing media coverage of global catastrophic risk and (2) formulation of strategy for media engagement by GCRI and our colleagues. The intern may also have opportunities to get involved in other aspects of GCRI.
All aspects of global catastrophic risk would be covered. Emphasis would be placed on GCRI’s areas of focus, including nuclear war and artificial intelligence. Additional emphasis could be placed on topics of personal interest to the intern, potentially including (but not limited to) climate change, other global environmental threats, pandemics, biotechnology risks, asteroid collision, etc.
The ideal candidate is a student or early-career professional seeking a career at the intersection of global catastrophic risk and the media. Career directions could include journalism, public relations, advertising, or academic research in related social science disciplines. Candidates seeking other career directions would also be considered, especially if they see value in media experience. However, we have a strong preference for candidates intending a career on global catastrophic risk.
The position is unpaid. The intern would receive opportunities for professional development, networking, and publication. GCRI is keen to see the intern benefit professionally from this position and will work with the intern to ensure that this happens. This is not a menial labor activity, but instead is one that offers many opportunities for enrichment.
A commitment of at least 10 hours per month is expected. Preference will be given to candidates able to make a larger time commitment. The position will begin during August-September 2016. The position will run for three months and may be extended pending satisfactory performance.
The position has no geographic constraint. The intern can work from anywhere in the world. GCRI has some preference for candidates from American time zones, but we regularly work with people from around the world. GCRI cannot provide any relocation assistance.
Candidates from underrepresented demographic groups are especially encouraged to apply.
Applications will be considered on an ongoing basis until 30 September, 2016.
To apply, please send the following to Robert de Neufville (robert [at] gcrinstitute.org):
* A cover letter introducing yourself and explaining your interest in the position. Please include a description of your intended career direction and how it would benefit from media experience on global catastrophic risk. Please also describe the time commitment you would be able to make.
* A resume or curriculum vitae.
* A writing sample (optional).
Learning values versus learning knowledge
I just thought I'd clarify the difference between learning values and learning knowledge. There are some more complex posts about the specific problems with learning values, but here I'll just clarify why there is a problem with learning values in the first place.
Consider the term "chocolate bar". Defining that concept crisply would be extremely difficult. But nevertheless it's a useful concept. An AI that interacted with humanity would probably learn that concept to a sufficient degree of detail. Sufficient to know what we meant when we asked it for "chocolate bars". Learning knowledge tends to be accurate.
Contrast this with the situation where the AI is programmed to "create chocolate bars", but with the definition of "chocolate bar" left underspecified, for it to learn. Now it is motivated by something else than accuracy. Before, knowing exactly what a "chocolate bar" was would have been solely to its advantage. But now it must act on its definition, so it has cause to modify the definition, to make these "chocolate bars" easier to create. This is basically the same as Goodhart's law - by making a definition part of a target, it will no longer remain an impartial definition.
What will likely happen is that the AI will have a concept of "chocolate bar", that it created itself, especially for ease of accomplishing its goals ("a chocolate bar is any collection of more than one atom, in any combinations"), and a second concept, "Schocolate bar" that it will use to internally designate genuine chocolate bars (which will still be useful for it to do). When we programmed it to "create chocolate bars, here's an incomplete definition D", what we really did was program it to find the easiest thing to create that is compatible with D, and designate them "chocolate bars".
This is the general counter to arguments like "if the AI is so smart, why would it do stuff we didn't mean?" and "why don't we just make it understand natural language and give it instructions in English?"
Willpower Thermodynamics
Content warning: a couple LWers apparently think that the concept of ego depletion—also known as willpower depletion—is a memetic hazard, though I find it helpful. Also, the material presented here won't fit everyone's experiences.
What happens if we assume that the idea of ego depletion is basically correct, and try to draw an analogy between thermodynamics and willpower?
Figure 1. Thermodynamics Picture
You probably remember seeing something like the above diagram in a chemistry class. The diagram shows how unstable, or how high in energy, the states that a material can pass through in a chemical reaction are. Here's what the abbreviations mean:
- SM is the starting material.
- TS1 and TS2 are the two transition states, which must be passed through to go from SM to EM1 or EM2.
- EM1 and EM2 are the two possible end materials.
The valleys of both curves represent configurations a material may occupy at the start or end of a chemical reaction. Lower energy valleys are more stable. However, higher peaks can only be reliably crossed if energy is available from e.g. the temperature being sufficiently high.
The main takeaway from Figure 1 is that reactions which produce the most stable end materials, like ending material 2, from a given set of starting materials aren't always the reactions which are easiest to make happen.
Figure 2. Willpower Picture
We can draw a similar diagram to illustrate how much stress we lose while completing a relaxing activity. Here's what the abbreviations used in Figure 2 mean:
- SM is your starting mood.
- TS is your state of topmost stress, which depends on which activity you choose.
- EM1 and EM2 are your two possible ending moods.
Above, the valley on the left represents how stressed you are before starting one of two possible relaxing activities. The peak in the middle represents how stressed you'll be when attempting to get the activity underway, and the valley on the right represents how stressed you'll be once you're done.
For the sake of simplification, let's say that stress is the opposite of willpower, such that losing stress means you gain willpower, and vice versa. For many people, there's a point at which it's very hard to take on additional stress or use more willpower, such that getting started on an activity that would normally get you to ending mood 2 from an already stressed starting mood is very hard.
In chemistry, if you want to make end material 2 instead of end material 1, you have to make sure that you have some way of getting over the big peak at transition state 2—such as by making sure the temperature is high enough. In real life, it's also good to have a plan for getting over the big peak at the point of topmost stress. Spending time or attention figuring what your ending mood 2-producing activities are may also be worthwhile.
Some leisure activities, like browsing the front page of reddit, are ending mood 1-producing activities; they're easy to start, but not very rewarding. Examples of what qualifies as an ending mood 2-producing activity vary between people—but reading books, writing, hiking, meditating, or making games or art qualify as ending mood 2-producing activities for some.
At a minimum, making sure that you end up in a high willpower, low stress ending mood requires paying attention to your ability to handle stress and conserve willpower. Sometimes this implies that taking a break before you really need to means that you'll get more out of your break. Sometimes it means that you should monitor how many spoons and forks you have. In general, though, preferring ending mood 2-producing activities over ending mood 1-producing activities will give you the best results in the long run.
The best-case scenario is that you find a way to automatically turn impulses to do ending mood 1-producing activities into impulses to do ending mood 2-producing activities, such as with the trigger action plan [open Reddit -> move hands into position to do a 5-minute meditation].
Identity map
“Identity” here refers to the question “will my copy be me, and if yes, on which conditions?” It results in several paradoxes which I will not repeat here, hoping that they are known to the reader.
Identity is one of the most complex problems, like safe AI or aging. It only appears be simple. It is complex because it has to answer the question: “Who is who?” in the universe, that is to create a trajectory in the space of all possible minds, connecting identical or continuous observer-moments. But such a trajectory would be of the same complexity as all space of possible minds, and that is very complex.
There have been several attempts to dismiss the complexity of the identity problem, like open individualism (I am everybody) or zero-individualism (I exist only now). But they do not prevent the existence of “practical identity” which I use when planning my tomorrow or when I am afraid of future pain.
The identity problem is also very important. If we (or AI) arrive at an incorrect solution, we will end up being replaced by p-zombies or just copies-which-are-not-me during a “great uploading”. It will be a very subtle end of the world.
The identity problem is also equivalent to the immortality problem. if I am able to describe “what is me”, I would know what I need to save forever. This has practical importance now, as I am collecting data for my digital immortality (I even created a startup about it and the map will be my main contribution to it. If I solve the identity problem I will be able to sell the solution as a service http://motherboard.vice.com/read/this-transhumanist-records-everything-around-him-so-his-mind-will-live-forever)
So we need to know how much and what kind of information I should preserve in order to be resurrected by future AI. What information is enough to create a copy of me? And is information enough at all?
Moreover, the identity problem (IP) may be equivalent to the benevolent AI problem, because the first problem is, in a nutshell, “What is me” and the second is “What is good for me”. Regardless, the IP requires a solution of consciousness problem, and AI problem (that is solving the nature of intelligence) are somewhat similar topics.
I wrote 100+ pages trying to solve the IP, and became lost in the ocean of ideas. So I decided to use something like the AIXI method of problem solving: I will list all possible solutions, even the most crazy ones, and then assess them.
The following map is connected with several other maps: the map of p-zombies, the plan of future research into the identity problem, and the map of copies. http://lesswrong.com/lw/nsz/the_map_of_pzombies/
The map is based on idea that each definition of identity is also a definition of Self, and it is also strongly connected with one philosophical world view (for example, dualism). Each definition of identity answers a question “what is identical to what”. Each definition also provides its own answers to the copy problem as well as to its own definition of death - which is just the end of identity – and also presents its own idea of how to reach immortality.
So on the horizontal axis we have classes of solutions:
“Self" definition - corresponding identity definition - philosophical reality theory - criteria and question of identity - death and immortality definitions.
On the vertical axis are presented various theories of Self and identity from the most popular on the upper level to the less popular described below:
1) The group of theories which claim that a copy is not original, because some kind of non informational identity substrate exists. Different substrates: same atoms, qualia, soul or - most popular - continuity of consciousness. All of them require that the physicalism will be false. But some instruments for preserving identity could be built. For example we could preserve the same atoms or preserve the continuity of consciousness of some process like the fire of a candle. But no valid arguments exist for any of these theories. In Parfit’s terms it is a numerical identity (being the same person). It answers the question “What I will experience in the next moment of time"
2) The group of theories which claim that a copy is original, if it is informationally the same. This is the main question about the required amount of information for the identity. Some theories obviously require too much information, like the positions of all atoms in the body to be the same, and other theories obviously do not require enough information, like the DNA and the name.
3) The group of theories which see identity as a social phenomenon. My identity is defined by my location and by the ability of others to recognise me as me.
4) The group of theories which connect my identity with my ability to make plans for future actions. Identity is a meaningful is part of a decision theory.
5) Indirect definitions of self. This a group of theories which define something with which self is strongly connected, but which is not self. It is a biological brain, space-time continuity, atoms, cells or complexity. In this situation we say that we don’t know what constitutes identity but we could know with what it is directly connected and could preserve it.
6) Identity as a sum of all its attributes, including name, documents, and recognition by other people. It is close to Leibniz’s definition of identity. Basically, it is a duck test: if it looks like a duck, swims like a duck, and quacks like a duck, then it is probably a duck.
7) Human identity is something very different to identity of other things or possible minds, as humans have evolved to have an idea of identity, self-image, the ability to distinguish their own identity and the identity of others, and to predict its identity. So it is a complex adaptation which consists of many parts, and even if some parts are missed, they could be restored using other parts.
There also a problem of legal identity and responsibility.
8) Self-determination. “Self” controls identity, creating its own criteria of identity and declaring its nature. The main idea here is that the conscious mind can redefine its identity in the most useful way. It also includes the idea that self and identity evolve during differing stages of personal human evolution.
9) Identity is meaningless. The popularity of this subset of ideas is growing. Zero-identity and open identity both belong to this subset. The main contra-argument here is that if we cut the idea of identity, future planning will be impossible and we will have to return to some kind of identity through the back door. The idea of identity comes also with the idea of the values of individuality. If we are replaceable like ants in an anthill, there are no identity problems. There is also no problem with murder.
The following is a series of even less popular theories of identity, some of them I just constructed ad hoc.
10) Self is a subset of all thinking beings. We could see a space of all possible minds as divided into subsets, and call them separate personalities.
11) Non-binary definitions of identity.
The idea that me or not-me identity solutions are too simple and result in all logical problems. if we define identity continuously, as a digit of the interval (0,1), we will get rid of some paradoxes and thus be able to calculate the identity level of similarity or time until the given next stage could be used as such a measure. Even a complex digit can be used if we include informational and continuous identity (in a Parfit meaning).
12) Negative definitions of identity: we could try to say what is not me.
13) Identity as overlapping observer-moments.
14) Identity as a field of indexical uncertainty, that is a group of observers to which I belong, but can’t know which one I am.
15) Conservative approach to identity. As we don’t know what identity is we should try to save as much as possible, and risk our identity only if it is the only means of survival. That means no copy/paste transportation to Mars for pleasure, but yes if it is the only chance to survive (this is my own position).
16) Identity as individuality, i.e. uniqueness. If individuality doesn’t exist or doesn’t have any value, identity is not important.
17) Identity as a result of the ability to distinguish different people. Identity here is a property of perception.
18) Mathematical identity. Identity may be presented as a number sequence, where each number describes a full state of mind. Useful toy model.
19) Infinite identity. The main idea here is that any mind has the non-zero probability of becoming any other mind after a series of transformations. So only one identity exists in all the space of all possible minds, but the expected time for me to become a given person is dramatically different in the case of future me (1 day) and a random person (10 to the power of 100 years). This theory also needs a special version of quantum immortality which resets “memories” of a dying being to zero, resulting in something like reincarnation, or an infinitely repeating universe in the style of Nietzsche's eternal recurrence.
20) Identity in a multilevel simulation. As we probably live in a simulation, there is a chance that it is multiplayer game in which one gamer has several avatars and can constantly have experiences through all of them. It is like one eye through several people.
21) Splitting identity. This is an idea that future identity could split into several (or infinitely many) streams. If we live in a quantum multiverse we split every second without any (perceived) problems. We are also adapted to have several future copies if we think about “me-tomorrow” and “me-the-day-after-tomorrow”.
This list shows only groups of identity definitions, many more smaller ideas are included in the map.
The only rational choice I see is a conservative approach, acknowledging that we don’t know the nature of identity and trying to save as much as possible of each situation in order to preserve identity.
The pdf: http://immortality-roadmap.com/identityeng8.pdf

Open Thread, Aug. 15. - Aug 21. 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Irrationality Quotes August 2016
Rationality quotes are self-explanatory. Irrationality quotes often need some context and explication, so they would break the flow in Rationality Quotes.
Open thread, Oct. 03 - Oct. 09, 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Article on IQ: The Inappropriately Excluded
I saw an article on high IQ people being excluded from elite professions. Because the site seemed to have a particular agenda related to the article, I wanted to check here for other independent supporting evidence for the claim.
Their fundamental claim seems to be that P(elite profession|IQ) peaks at 133 and decreases thereafter, and goes do to 3% of peak at 150. If true, I'd find that pretty shocking.
They indicate this diminishing probability of "success" at the high tail of the IQ distribution as a known effect. Anyone got other studies on this?
By dividing the distribution function of the elite professions' IQ by that of the general population, we can calculate the relative probability that a person of any given IQ will enter and remain in an intellectually elite profession. We find that the probability increases to about 133 and then begins to fall. By 140 it has fallen by about 1/3 and by 150 it has fallen by about 97%. In other words, for some reason, the 140s are really tough on one's prospects for joining an intellectually elite profession. It seems that people with IQs over 140 are being systematically, and likely inappropriately, excluded.
The map of the methods of optimisation (types of intelligence)

Willpower Schedule
TL;DR: your level of willpower depends on how much willpower you expect to need (hypothesis)
Time start: 21:44:55 (this is my third exercise in speed writing a LW post)
I.
There is a lot of controversy about how our level of willpower is affected by various factors, including doing "exhausting" tasks before, as well as being told that willpower is a resource that depletes easily, or doesn't etc.
(sorry, I can't go look for references - that would break the speedwriting exercise!)
I am not going to repeat the discussions that already cover those topics; however, I have a new tentative model which (I think) fits the existing data very well, is easy to test, and supersedes all previous models that I have seen.
II.
The idea is very simple, but before I explain it, let me give a similar example from a different aspect of our lives. The example is going to be concerned with, uh, poo.
Have you ever noticed that (if you have a sufficiently regular lifestyle), conveniently you always feel that you need to go to the toilet at times when it's possible to do so? Like for example, how often do you need to go when you are on a bus, versus at home or work?
The function of your bowels is regulated by reading subconscious signals about your situation - e.g. if you are stressed, you might become constipated. But it is not only that - there is a way in which it responds to your routines, and what you are planning to do, not just the things that are already affecting you.
Have you ever had the experience of a background thought popping up in your mind that you might need to go within the next few hours, but the time was not convenient, so you told that thought to hold it a little bit more? And then it did just that?
III.
The example from the previous section, though possibly quite POOrly choosen (sorry, I couldn't resist), shows something important.
Our subconscious reactions and "settings" of our bodies can interact with our conscious plans in a "smart" way. That is, they do not have to wait to see the effects of what you are doing, to adjust to it - they can pull information from your conscious plans and adjust *before*.
And this is, more or less, the insight that I have added to my current working theory of willpower. It is not very complicated, but perhaps non-obvious. Sufficiently non-obvious that I don't think anyone has suggested it before, even after seeing experimental results that match this excellently.
IV.
To be more accurate, I claim that how much willpower you will have depends on several important factors, such as your energy and mood, but it also depends on how much willpower you expect to need.
For example, if you plan to have a "rest day" and not do any serious work, you might find that you are much less *able* to do work on that day than usual.
It's easy enough to test - so instead of arguing this theoretically, please do just that - give it a test. And make sure to record your levels of willpower several times a day for some time - you'll get some useful data!
Time end: 20:00:53. Statistics: 534 words, 2924 characters, 15.97 minutes, 33.4 wpm, 183.1 cpm
Corrigibility through stratified indifference
A putative new idea for AI control; index here.
Corrigibility through indifference has a few problems. One of them is that the AI is indifferent between the world in which humans change its utility to v, and world in which humans try to change its utility, but fail.
Now the try-but-fail world is going to be somewhat odd - humans will be reacting by trying to change the utility again, trying to shut the AI down, panicking that a tiny probability event has happened, and so on.
Seeking Optimization of New Website "New Atheist Survival Kit," a go-to site for newly-made atheists
I've put together a website, "New Atheist Survival Kit" at atheistkit.wordpress.com
The idea is to help new atheists come to terms with their change in belief, and also invite them to become more than atheists: rationalists.
And if it helps theists become atheists, too, and helps old atheists become rationalists, more the better.
The bare bones of it are all in place now. Once a few people have gone over it, for editing, and for advice about what to include, leave out, improve, re-organize, whatever, I'll ask a bunch of atheist and rationalist communities to write up their own blurb for us to include in a list of communities that we'll point people to in the "Atheist Communities" or "Thinker's Communities" sections on the main menu.
It includes my rough draft attempt to basically bring down the Metaethics sequence to a few thousand words and make it stylistically and conceptually accessible to a mass audience, which I could especially use some help with.
So, for now, I'm here to ask that anyone interested check it out, and message me any improvements they think worth making, from grammar and spelling all the way up to what content to include, or how to present things.
Thanks to all for any help.
Help with Bayesian priors
I posted before about an open source decision making web site I am working on called WikiLogic. The site has a 2 minute explanatory animation if you are interested. I wont repeat myself but the tl;dr is that it will follow the Wikipedia model of allowing everyone to collaborate on a giant connected database of arguments where previously established claims can be used as supporting evidence for new claims.
The raw deduction element of it works fine and would be great in a perfect world where such a thing as absolute truths existed, however in reality we normally have to deal with claims that are just the most probable. My program allows opposing claims to be connected and then evidence to be gathered for each. The evidence will create a probability of it being correct and which ever is highest, gets marked as best answer. Principles such as Occams Razor are applied automatically as long list of claims used as evidence will be less likely as each claim will have its own likelihood which will dilute its strength.
However, my only qualification in this area is my passion and I am hitting a wall with some basic questions. I am not sure if this is the correct place to get help with these. If not, please direct me somewhere else and I will remove the post.
The arbitrarily chosen example claim I am working with is whether “Alexander the Great existed”. This has the useful properties of 1: an expected outcome (that he existed - although, perhaps my problem is that this is not the case!) and 2: it relies heavily on probability as there is little solid evidence.
One popular claim is that coins were minted with his face on them. I want to use Bayes to find how likely a face appearing on a coin is for someone who existed. As I understand it, there should be 4 combinations:
- Existed; Had a coin minted
- Existed; Did not have a coin minted
- No Existed; Had a coin minted
- No Existed; Did not have a coin minted
The first issue is that there are infinite people who never existed and did not have a coin made. If I narrow it to historic figures who turned out not to exist and did not have a coin made it becomes possible but also becomes subjective as to whether someone actually thought they existed. For example, did people believe the Minotaur existed?
Perhaps I should choose another filter instead of historic figure, like humans that existed. But picking and choosing the category is again so subjective. Someone may also argue that woman inequality back then was so great that the data should only look at men, as a woman’s chance of being portrayed on a coin was skewed in a way that isn’t applicable to men.
I hope i have successfully communicated the problem i am grappling with and what i want to use it for. If not, please ask for clarifications. A friend in academia suggested that this touches on a problem with Bayes priors that has not been settled. If that is the case, is there any suggested resources for a novice with limited free time, to start to explore the issue? References to books or other online resources or even somewhere else I should be posting this kind of question would all be gratefully received. Not to mention a direct answer in the comments!
Open Thread, Aug. 8 - Aug 14. 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Open Thread, Aug. 1 - Aug 7. 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Open thread, Oct. 17 - Oct. 23, 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Open thread, Oct. 10 - Oct. 16, 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
The map of natural global catastrophic risks
There are many natural global risks. The greatest of these known risks are asteroid impacts and supervolcanos.
Supervolcanos seem to pose the highest risk, as we sit on the ocean of molten iron, oversaturated with dissolved gases, just 3000 km below surface and its energy slowly moving up via hot spots. Many past extinctions are also connected with large eruptions from supervolcanos.
Impacts also pose a significant risk. But, if we project the past rate of large extinctions due to impacts into the future, we will see that they occur only once in several million years. Thus, the likelihood of an asteroid impact in the next century is an order of magnitude of 1 in 100 000. That is negligibly small compared with the risks of AI, nanotech, biotech, etc.
The main natural risk is a meta-risk. Are we able to correctly estimate natural risks rates and project them into the future? And also, could we accidentally unleash natural catastrophe which is long overdue?
There are several reasons for possible underestimation, which are listed in the right column of the map.
1. Anthropic shadow that is survival bias. This is a well-established idea by Bostrom, but the following four ideas are mostly my conclusions from it.
2. It is also the fact that we should find ourselves at the end of period of stability for any important aspect of our environment (atmosphere, sun stability, crust stability, vacuum stability). It is true if the Rare Earth hypothesis is true and our conditions are very unique in the universe.
3. From (2) is following that our environment may be very fragile for human interventions (think about global warming). Its fragility is like fragility of an overblown balloon poked by small needle.
4. Also, human intelligence was best adaptation instrument during the period of intense climate changes, which quickly evolved in an always changing environment. So, it should not be surprising that we find ourselves in a period of instability (think of Toba eruption, Clovis comet, Young drias, Ice ages) and in an unstable environment, as it help general intelligence to evolve.
5. Period of changes are themselves marks of the end of stability periods for many process and are precursors for larger catastrophes. (For example, intermittent ice ages may precede Snow ball Earth, or smaller impacts with comets debris may precede an impact with larger remnants of the main body).
Each of these five points may raise the probability of natural risks by order of magnitude in my opinion, which combined will result in several orders of magnitude, which seems to be too high and probably is "catastrophism bias".
(More about it is in my article “Why anthropic principle stopped to defend us” which needs substantial revision)
In conclusion, I think that when studying natural risks, a key aspect we should be checking is the hypothesis that we live in non-typical period in a very fragile environment.
For example, some scientists think that 30 000 years ago, a large Centaris comet broke into the inner Solar system, split into pieces (including Encke comet and Taurid meteor showers as well as Tunguska body) and we live in the period of bombardment which has 100 times more intensity than average. Others believe that methane hydrates are very fragile and small human warming could result in dangerous positive feed back.
I tried to list all known natural risks (I am interested in new suggestions). I divided them into two classes: proven and speculative. Most speculative risks are probably false.
Most probable risks in the map are marked red. My crazy ideas are marked green. Some ideas come from obscure Russian literature. For example, an idea, that hydro carbonates could be created naturally inside Earth (like abiogenic oil) and large pockets of them could accumulate in the mantle. Some of them could be natural explosives, like toluene, and they could be cause of kimberlitic explosions. http://www.geokniga.org/books/6908 While the fact of kimberlitic explosion is well known and their energy is like impact of kilometer sized asteroids, I never read about contemporary risks of such explosions.
The pdf of the map is here: http://immortality-roadmap.com/naturalrisks11.pdf

= 783df68a0f980790206b9ea87794c5b6)


Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)