Rationality Reading Group: Part X: Yudkowsky's Coming of Age
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Beginnings: An Introduction (pp. 1527-1530) and Part X: Yudkowsky's Coming of Age (pp. 1535-1601). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
Beginnings: An Introduction
X. Yudkowsky's Coming of Age
292. My Childhood Death Spiral - Wherein Eliezer describes how a history of being rewarded for believing that 'intelligence is more important than experience or wisdom' initially led him to dismiss the possibility that most possible smarter-than-human artificial intelligences will cause unvaluable futures if constructed.
293. My Best and Worst Mistake - When Eliezer went into his death spiral around intelligence, he wound up making a lot of mistakes that later became very useful.
294. Raised in Technophilia - When Eliezer was quite young, it took him a very long time to get to the point where he was capable of considering that the dangers of technology might outweigh the benefits.
295. A Prodigy of Refutation - Eliezer's skills at defeating other people's ideas led him to believe that his own (mistaken) ideas must have been correct.
296. The Sheer Folly of Callow Youth - Eliezer's big mistake was when he took a mysterious view of morality.
297. That Tiny Note of Discord - Eliezer started to dig himself out of his philosophical hole when he noticed a tiny inconsistency.
298. Fighting a Rearguard Action Against the Truth - When Eliezer started to consider the possibility of Friendly AI as a contingency plan, he permitted himself a line of retreat. He was now able to slowly start to reconsider positions in his metaethics, and move gradually towards better ideas.
299. My Naturalistic Awakening - Eliezer actually looked back and realized his mistakes when he imagined the idea of an optimization process.
300. The Level Above Mine - There are people who have acquired more mastery over various fields than Eliezer has over his.
301. The Magnitude of His Own Folly - Eliezer considers his training as a rationalist to have started the day he realized just how awfully he had screwed up.
302. Beyond the Reach of God - Compare the world in which there is a God, who will intervene at some threshold, against a world in which everything happens as a result of physical laws. Which universe looks more like our own?
303. My Bayesian Enlightenment - The story of how Eliezer Yudkowsky became a Bayesian.
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part Y: Challenging the Difficult (pp. 1605-1647). The discussion will go live on Wednesday, 20 April 2016, right here on the discussion forum of LessWrong.
Rationality Reading Group: Part W: Quantified Humanism
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Part W: Quantified Humanism (pp. 1453-1514) and Interlude: The Twelve Virtues of Rationality (pp. 1516-1521). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
W. Quantified Humanism
281. Scope Insensitivity - The human brain can't represent large quantities: an environmental measure that will save 200,000 birds doesn't conjure anywhere near a hundred times the emotional impact and willingness-to-pay of a measure that would save 2,000 birds.
282. One Life Against the World - Saving one life and saving the whole world provide the same warm glow. But, however valuable a life is, the whole world is billions of times as valuable. The duty to save lives doesn't stop after the first saved life. Choosing to save one life when you could have saved two is as bad as murder.
283. The Allais Paradox - Offered choices between gambles, people make decision-theoretically inconsistent decisions.
284. Zut Allais! - Eliezer's second attempt to explain the Allais Paradox, this time drawing motivational background from the heuristics and biases literature on incoherent preferences and the certainty effect.
285. Feeling Moral - Our moral preferences shouldn't be circular. If a policy A is better than B, and B is better than C, and C is better than D, and so on, then policy A really should be better than policy Z.
286. The "Intuitions" for "Utilitarianism" - Our intuitions, the underlying cognitive tricks that we use to build our thoughts, are an indispensable part of our cognition. The problem is that many of those intuitions are incoherent, or are undesirable upon reflection. But if you try to "renormalize" your intuition, you wind up with what is essentially utilitarianism.
287. Ends Don't Justify Means (Among Humans) - Humans have evolved adaptations that allow them to simultaneously deceive themselves into thinking that their policy suggestions are helpful to the tribe and actually enact policies that are self-serving. As a general rule, there are certain things that you should never do, even if you come up with persuasive reasons that they're good for the tribe.
288. Ethical Injunctions - Understanding more about ethics should make your moral choices stricter, but people usually use a surface-level knowledge of moral reasoning as an excuse to make their moral choices more lenient.
289. Something to Protect - Many people only start to grow as a rationalist when they find something that they care about more than they care about rationality itself. It takes something really scary to cause you to override your intuitions with math.
290. When (Not) to Use Probabilities - When you don't have a numerical procedure to generate probabilities, you're probably better off using your own evolved abilities to reason in the presence of uncertainty.
291. Newcomb's Problem and Regret of Rationality - Newcomb's problem is a very famous decision theory problem in which the rational move appears to be consistently punished. This is the wrong attitude to take. Rationalists should win. If your particular ritual of cognition consistently fails to yield good results, change the ritual.
Interlude: The Twelve Virtues of Rationality
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Beginnings: An Introduction (pp. 1527-1530) and Part X: Yudkowsky's Coming of Age (pp. 1535-1601). The discussion will go live on Wednesday, 6 April 2016, right here on the discussion forum of LessWrong.
You Only Live Twice
"It just so happens that your friend here is only mostly dead. There's a big difference between mostly dead and all dead."
-- The Princess Bride
My co-blogger Robin and I may disagree on how fast an AI can improve itself, but we agree on an issue that seems much simpler to us than that: At the point where the current legal and medical system gives up on a patient, they aren't really dead.
Robin has already said much of what needs saying, but a few more points:
• Ben Best's Cryonics FAQ, Alcor's FAQ, Alcor FAQ for scientists, Scientists' Open Letter on Cryonics
• I know more people who are planning to sign up for cryonics Real Soon Now than people who have actually signed up. I expect that more people have died while cryocrastinating than have actually been cryopreserved. If you've already decided this is a good idea, but you "haven't gotten around to it", sign up for cryonics NOW. I mean RIGHT NOW. Go to the website of Alcor or the Cryonics Institute and follow the instructions.
AIFoom Debate - conclusion?
I've been going through the AIFoom debate, and both sides makes sense to me. I intend to continue, but I'm wondering if there're already insights in LW culture I can get if I just ask for them.
My understanding is as follows:
The difference between a chimp and a human is only 5 million years of evolution. That's not time enough for many changes.
Eliezer takes this as proof that the difference between the two in the brain architecture can't be much. Thus, you can have a chimp-intelligent AI that doesn't do much, and then with some very small changes, suddenly get a human-intelligent AI and FOOM!
Robin takes the 5-million year gap as proof that the significant difference between chimps and humans is only partly in the brain architecture. Evolution simply can't be responsible for most of the relevant difference; the difference must be elsewhere.
So he concludes that when our ancestors got smart enough for language, culture became a thing. Our species stumbled across various little insights into life, and these got passed on. An increasingly massive base of cultural content, made of very many small improvements is largely responsible for the difference between chimps and humans.
Culture assimilated new information into humans much faster than evolution could.
So he concludes that you can get a chimp-level AI, and to get up to human-level will take, not a very few insights, but a very great many, each one slowly improving the computer's intelligence. So no Foom, it'll be a gradual thing.
So I think I've figured out the question. Is there a commonly known answer, or are there insights towards the same?
Rationality Reading Group: Part V: Value Theory
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Part V: Value Theory (pp. 1359-1450). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
V. Value Theory
264. Where Recursive Justification Hits Bottom - Ultimately, when you reflect on how your mind operates, and consider questions like "why does Occam's Razor work?" and "why do I expect the future to be like the past?", you have no other option but to use your own mind. There is no way to jump to an ideal state of pure emptiness and evaluate these claims without using your existing mind.
265. My Kind of Reflection - A few key differences between Eliezer Yudkowsky's ideas on reflection and the ideas of other philosophers.
266. No Universally Compelling Arguments - Because minds are physical processes, it is theoretically possible to specify a mind which draws any conclusion in response to any argument. There is no argument that will convince every possible mind.
267. Created Already in Motion - There is no computer program so persuasive that you can run it on a rock. A mind, in order to be a mind, needs some sort of dynamic rules of inference or action. A mind has to be created already in motion.
268. Sorting Pebbles into Correct Heaps - A parable about an imaginary society that has arbitrary, alien values.
269. 2-Place and 1-Place Words - It is possible to talk about "sexiness" as a property of an observer and a subject. It is also equally possible to talk about "sexiness" as a property of a subject, as long as each observer can have a different process to determine how sexy someone is. Failing to do either of these will cause you trouble.
270. What Would You Do Without Morality? - If your own theory of morality was disproved, and you were persuaded that there was no morality, that everything was permissible and nothing was forbidden, what would you do? Would you still tip cabdrivers?
271. Changing Your Metaethics - Discusses the various lines of retreat that have been set up in the discussion on metaethics.
272. Could Anything Be Right? - You do know quite a bit about morality. It's not perfect information, surely, or absolutely reliable, but you have someplace to start. If you didn't, you'd have a much harder time thinking about morality than you do.
273. Morality as Fixed Computation - A clarification about Yudkowsky's metaethics.
274. Magical Categories - We underestimate the complexity of our own unnatural categories. This doesn't work when you're trying to build a FAI.
275. The True Prisoner's Dilemma - The standard visualization for the Prisoner's Dilemma doesn't really work on humans. We can't pretend we're completely selfish.
276. Sympathetic Minds - Mirror neurons are neurons that fire both when performing an action oneself, and watching someone else perform the same action - for example, a neuron that fires when you raise your hand or watch someone else raise theirs. We predictively model other minds by putting ourselves in their shoes, which is empathy. But some of our desire to help relatives and friends, or be concerned with the feelings of allies, is expressed as sympathy, feeling what (we believe) they feel. Like "boredom", the human form of sympathy would not be expected to arise in an arbitrary expected-utility-maximizing AI. Most such agents would regard any agents in its environment as a special case of complex systems to be modeled or optimized; it would not feel what they feel.
277. High Challenge - Life should not always be made easier for the same reason that video games should not always be made easier. Think in terms of eliminating low-quality work to make way for high-quality work, rather than eliminating all challenge. One needs games that are fun to play and not just fun to win. Life's utility function is over 4D trajectories, not just 3D outcomes. Values can legitimately be over the subjective experience, the objective result, and the challenging process by which it is achieved - the traveller, the destination and the journey.
278. Serious Stories - Stories and lives are optimized according to rather different criteria. Advice on how to write fiction will tell you that "stories are about people's pain" and "every scene must end in disaster". I once assumed that it was not possible to write any story about a successful Singularity because the inhabitants would not be in any pain; but something about the final conclusion that the post-Singularity world would contain no stories worth telling seemed alarming. Stories in which nothing ever goes wrong, are painful to read; would a life of endless success have the same painful quality? If so, should we simply eliminate that revulsion via neural rewiring? Pleasure probably does retain its meaning in the absence of pain to contrast it; they are different neural systems. The present world has an imbalance between pain and pleasure; it is much easier to produce severe pain than correspondingly intense pleasure. One path would be to address the imbalance and create a world with more pleasures, and free of the more grindingly destructive and pointless sorts of pain. Another approach would be to eliminate pain entirely. I feel like I prefer the former approach, but I don't know if it can last in the long run.
279. Value is Fragile - An interesting universe, that would be incomprehensible to the universe today, is what the future looks like if things go right. There are a lot of things that humans value that if you did everything else right, when building an AI, but left out that one thing, the future would wind up looking dull, flat, pointless, or empty. Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
280. The Gift We Give to Tomorrow - How did love ever come into the universe? How did that happen, and how special was it, really?
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part W: Quantified Humanism (pp. 1453-1514) and Interlude: The Twelve Virtues of Rationality (pp. 1516-1521). The discussion will go live on Wednesday, 23 March 2016, right here on the discussion forum of LessWrong.
True answers from AI: Summary
A putative new idea for AI control; index here.
This post summaries the methods for getting honest answers from certain AIs, presented in the previous two posts.
You can basically safely ask an AI for expected utility estimates (including conditional ones), probability estimates, and get the AI to identify events of highest and lowest probability and utility, and, arguably, highest and lowest conditional utility.
The setup is a boxed AI, which emits a message M, and a stochastic stochastic process E that would erase M with low probability, before anyone could see it. There is also a stochastic process that emits a message Y. Let v_E be the indicator function for E and v_y the indicator function for Y=y.

Then, given a utility u bounded between 0 and 1, and a specific y, we can give the AI a utility u# that will cause it to give us the message M_1={q,q^y,p^y,q^{|y}} (which we will see with high probability P(¬E), an event the AI will not optimise for).
Define f(q,u)=-q^2 +2qu. The utility is then
- u#(1) = v_E(f(q,u) + f(q^y,uv_y) + f(p^y,v_y) + f(q^{|y},u)v_y).
Maximising this u# ensures that, in the AI's best estimate,
- q = E(u),
- q^y = E(u ∧ (Y=y)),
- p^{y} = P(Y=y),
- q^{|y} = q^{y}/p^{y}=E(u|Y=y).
But we need not consider any specific y. The AI could send out the message M_2={y^+,y^-,y^{p+},y^{p-},q^+,q^-,p^+,p^-}, while maximising the utility
- u#(2) = v_E (f(q^+,uv_{y^+}) + f(q^-,(1-u)v_{y^-}) + f(p^+,v_{y^{p+}}) + f(p^-,1-v_{y^{p-}})).
Then M will be such that, in the AI's best estimate,
- (y^+,q^+) = (argmax_y, max_y) E(u ∧ (Y=y)),
- (y^-,q^-) = (argmin_y, min_y) E(u ∧ (Y=y)),
- (y^{p+},p^+) = (argmax_y, max_y) P(Y=y),
- (y^{p-},p^-) = (argmin_y, min_y) P(Y=y),
Finally, define M_{3,r} = {y^{|+},y^{|-},q^{|+},q^{|-}}. And set f_r(q,u)= -(r-1)q^r+ rq^{r-1}u, and define the utility function:
- u#(3,r) = v_E v_{y^{|+}}f_r(q{|+},u) + v_E v_{y^{|-}}f_r(q^{|-},1-u).
Then, for large enough r, the AI will output M so that, according to its best estimate,
- (y^{|+},q^{|+}) = (argmax_y, max_y) E(u|(Y=Y^+)),
- (y^{|-},q^{|-}) = (argmin_y, min_y) E(u|(Y=Y^+)).
Unfortunately the value of r cannot be known ahead of time. It might be possible to define a "limit" utility u#(3, ∞).
Note that all these messages and utilities are independent, so the AI can generate messages M_1 + M_2 + M_{3,r} + M_{3,r'} when maximising
- u#(1) + u#(2) + u#(3,r) + u#(3,r').
But there are issues with very low probabilities, as explained in the previous post.
Common Misconceptions about Dual Process Theories of Human Reasoning
(This is mostly a summary of Evans (2012); the fifth misconception mentioned is original research, although I have high confidence in it.)
It seems that dual process theories of reasoning are often underspecified, so I will review some common misconceptions about these theories in order to ensure that everyone's beliefs about them are compatible. Briefly, the key distinction (and it seems, the distinction that implies the fewest assumptions) is the amount of demand that a given process places on working memory.
(And if you imagine what you actually use working memory for, then a consequence of this is that Type 2 processing always has a quality of 'cognitive decoupling' or 'counterfactual reasoning' or 'imagining of ways that things could be different', dynamically changing representations that remain static in Type 1 processing; the difference between a cached and non-cached thought, if you will. When you are transforming a Rubik's cube in working memory so that you don't have to transform it physically, this is an example of the kind of thing that I'm talking about from the outside.)
The first common confusion is that Type 1 and Type 2 refer to specific algorithms or systems within the human brain. It is a much stronger proposition, and not a widely accepted one, to assert that the two types of cognition refer to particular systems or algorithms within the human brain, as opposed to particular properties of information processing that we may identify with many different algorithms in the brain, characterized by the degree to which they place a demand on working memory.
The second and third common confusions, and perhaps the most widespread, are the assumptions that Type 1 processes and Type 2 processes can be reliably distinguished, if not defined, by their speed and/or accuracy. The easiest way to reject this is to say that the mistake of entering a quickly retrieved, unreliable input into a deliberative, reliable algorithm is not the same mistake as entering a quickly retrieved, reliable input into a deliberative, unreliable algorithm. To make a deliberative judgment based on a mere unreliable feeling is a different mistake from experiencing a reliable feeling and arriving at an incorrect conclusion through an error in deliberative judgment. It also seems easier to argue about the semantics of the 'inputs', 'outputs', and 'accuracy' of algorithms running on wetware, than it is to argue about the semantics of their demand on working memory and the life outcomes of the brains that execute them.
The fourth common confusion is that Type 1 processes involve 'intuitions' or 'naivety' and Type 2 processes involve thought about abstract concepts. You might describe a fast-and-loose rule that you made up as a 'heuristic' and naively think that it is thus a 'System 1 process', but it would still be the case that you invented that rule by deliberative means, and thus by means of a Type 2 process. When you applied the rule in the future it would be by means of a deliberative process that placed a demand on working memory, not by some behavior that is based on association or procedural memory, as if by habit. (Which is also not the same as making an association or performing a procedure that entails you choosing to use the deliberative rule, or finding a way to produce the same behavior that the deliberative rule originally produced by developing some sort of habit or procedural skill.) When facing novel situations, it is often the case that one must forego association and procedure and thus use Type 2 processes, and this can make it appear as though the key distinction is abstractness, but this is only because there are often no clear associations to be made or procedures to be performed in novel situations. Abstractness is not a necessary condition for Type 2 processes.
The fifth common confusion is that, although language is often involved in Type 2 processing, this is likely a mere correlate of the processes by which we store and manipulate information in working memory, and not the defining characteristic per se. To elaborate, we are widely believed to store and manipulate auditory information in working memory by means of a 'phonological store' and an 'articulatory loop', and to store and manipulate visual information by means of a 'visuospatial sketchpad', so we may also consider the storage and processing in working memory of non-linguistic information in auditory or visuospatial form, such as musical tones, or mathematical symbols, or the possible transformations of a Rubik's cube, for example. The linguistic quality of much of the information that we store and manipulate in working memory is probably noncentral to a general account of the nature of Type 2 processes. Conversely, it is obvious that the production and comprehension of language is often an associative or procedural process, not a deliberative one. Otherwise you still might be parsing the first sentence of this article.
Rationality Reading Group: Part T: Science and Rationality
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Part T: Science and Rationality (pp. 1187-1265) and Interlude: A Technical Explanation of Technical Explanation (pp. 1267-1314). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
T. Science and Rationality
243. The Failures of Eld Science - A short story set in the same world as "Initiation Ceremony". Future physics students look back on the cautionary tale of quantum physics.
244. The Dilemma: Science or Bayes? - The failure of first-half-of-20th-century-physics was not due to straying from the scientific method. Science and rationality - that is, Science and Bayesianism - aren't the same thing, and sometimes they give different answers.
245. Science Doesn't Trust Your Rationality - The reason Science doesn't always agree with the exact, Bayesian, rational answer, is that Science doesn't trust you to be rational. It wants you to go out and gather overwhelming experimental evidence.
246. When Science Can't Help - If you have an idea, Science tells you to test it experimentally. If you spend 10 years testing the idea and the result comes out negative, Science slaps you on the back and says, "Better luck next time." If you want to spend 10 years testing a hypothesis that will actually turn out to be right, you'll have to try to do the thing that Science doesn't trust you to do: think rationally, and figure out the answer before you get clubbed over the head with it.
247. Science Isn't Strict Enough - Science lets you believe any damn stupid idea that hasn't been refuted by experiment. Bayesianism says there is always an exactly rational degree of belief given your current evidence, and this does not shift a nanometer to the left or to the right depending on your whims. Science is a social freedom - we let people test whatever hypotheses they like, because we don't trust the village elders to decide in advance - but you shouldn't confuse that with an individual standard of rationality.
248. Do Scientists Already Know This Stuff? - No. Maybe someday it will be part of standard scientific training, but for now, it's not, and the absence is visible.
249. No Safe Defense, Not Even Science - Why am I trying to break your trust in Science? Because you can't think and trust at the same time. The social rules of Science are verbal rather than quantitative; it is possible to believe you are following them. With Bayesianism, it is never possible to do an exact calculation and get the exact rational answer that you know exists. You are visibly less than perfect, and so you will not be tempted to trust yourself.
250. Changing the Definition of Science - Many of these ideas are surprisingly conventional, and being floated around by other thinkers. I'm a good deal less of a lonely iconoclast than I seem; maybe it's just the way I talk.
251. Faster Than Science - Is it really possible to arrive at the truth faster than Science does? Not only is it possible, but the social process of science relies on scientists doing so - when they choose which hypotheses to test. In many answer spaces it's not possible to find the true hypothesis by accident. Science leaves it up to experiment to socially declare who was right, but if there weren't some people who could get it right in the absence of overwhelming experimental proof, science would be stuck.
252. Einstein's Speed - Albert was unusually good at finding the right theory in the presence of only a small amount of experimental evidence. Even more unusually, he admitted it - he claimed to know the theory was right, even in advance of the public proof. It's possible to arrive at the truth by thinking great high-minded thoughts of the sort that Science does not trust you to think, but it's a lot harder than arriving at the truth in the presence of overwhelming evidence.
253. That Alien Message - Einstein used evidence more efficiently than other physicists, but he was still extremely inefficient in an absolute sense. If a huge team of cryptographers and physicists were examining a interstellar transmission, going over it bit by bit, we could deduce principles on the order of Galilean gravity just from seeing one or two frames of a picture. As if the very first human to see an apple fall, had, on the instant, realized that its position went as the square of the time and that this implied constant acceleration.
254. My Childhood Role Model - I looked up to the ideal of a Bayesian superintelligence, not Einstein.
255. Einstein's Superpowers - There's an unfortunate tendency to talk as if Einstein had superpowers - as if, even before Einstein was famous, he had an inherent disposition to be Einstein - a potential as rare as his fame and as magical as his deeds. Yet the way you acquire superpowers is not by being born with them, but by seeing, with a sudden shock, that they are perfectly normal.
256. Class Project - The students are given one month to develop a theory of quantum gravity.
Interlude: A Technical Explanation of Technical Explanation
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Ends: An Introduction (pp. 1321-1325) and Part U: Fake Preferences (pp. 1329-1356). The discussion will go live on Wednesday, 24 February 2016, right here on the discussion forum of LessWrong.
Rationality Reading Group: Part U: Fake Preferences
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Ends: An Introduction (pp. 1321-1325) and Part U: Fake Preferences (pp. 1329-1356). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
Ends: An Introduction
U. Fake Preferences
257. Not for the Sake of Happiness (Alone) - Tackles the Hollywood Rationality trope that "rational" preferences must reduce to selfish hedonism - caring strictly about personally experienced pleasure. An ideal Bayesian agent - implementing strict Bayesian decision theory - can have a utility function that ranges over anything, not just internal subjective experiences.
258. Fake Selfishness - Many people who espouse a philosophy of selfishness aren't really selfish. If they were selfish, there are a lot more productive things to do with their time than espouse selfishness, for instance. Instead, individuals who proclaim themselves selfish do whatever it is they actually want, including altruism, but can always find some sort of self-interest rationalization for their behavior.
259. Fake Morality - Many people provide fake reasons for their own moral reasoning. Religious people claim that the only reason people don't murder each other is because of God. Selfish-ists provide altruistic justifications for selfishness. Altruists provide selfish justifications for altruism. If you want to know how moral someone is, don't look at their reasons. Look at what they actually do.
260. Fake Utility Functions - Describes the seeming fascination that many have with trying to compress morality down to a single principle. The sequence leading up to this post tries to explain the cognitive twists whereby people smuggle all of their complicated other preferences into their choice of exactly which acts they try to justify using their single principle; but if they were really following only that single principle, they would choose other acts to justify.
261. Detached Lever Fallacy - There is a lot of machinery hidden beneath the words, and rationalist's taboo is one way to make a step towards exposing it.
262. Dreams of AI Design - It can feel as though you understand how to build an AI, when really, you're still making all your predictions based on empathy. Your AI design will not work until you figure out a way to reduce the mental to the non-mental.
263. The Design Space of Minds-in-General - When people talk about "AI", they're talking about an incredibly wide range of possibilities. Having a word like "AI" is like having a word for everything which isn't a duck.
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part V: Value Theory (pp. 1359-1450). The discussion will go live on Wednesday, 9 March 2016, right here on the discussion forum of LessWrong.
Toy model: convergent instrumental goals
tl;dr: Toy model to illustrate convergent instrumental goals.
Steve Omohundro identified 'AI drives' (also called 'Convergent Instrumental goals') that almost all intelligent agents would converge to:Self-improve
- Be rational
- Protect utility function
- Prevent counterfeit utility
- Self-protective
- Acquire resources and use them efficiently
This post will attempt to illustrate some of these drives, by building on the previous toy model of the control problem, which was further improved by Jaan Tallinn.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)