Probabilities Small Enough To Ignore: An attack on Pascal's Mugging
Summary: the problem with Pascal's Mugging arguments is that, intuitively, some probabilities are just too small to care about. There might be a principled reason for ignoring some probabilities, namely that they violate an implicit assumption behind expected utility theory. This suggests a possible approach for formally defining a "probability small enough to ignore", though there's still a bit of arbitrariness in it.
A survey of the top posters on lesswrong
In a post recently someone mentioned that there was a list of "Top 15" posters by karma. That inspired me to send all of them this note:
Hi,
I am messaging you (now) because you are one of the 15 top contributors of the past 30 days of LW.
I was wondering if you do any time tracking; or if you have any idea how much time you spend on LW. (i.e. rescuetime)
I have made the choice to spend more of my time engaging with LW and am wondering how much you (and your other top peers) spend. And also why?
Maybe you want to rate each of these out of 10; the reasons you partake in LW discussions:
- Make the world better (raising the sanity waterline etc)
- Fun (spend my spare time here)
- Friends (here because my Real-Life is here; and so I come to hang with my friends - or my internet friends hang out here)
- Gather rationality (maybe you still gather rationality from LW; maybe you have gathered most of what you can and now are creating your own craft)
- here for new ideas (LW being a good place to share new ideas)
- here to advertise an idea (promoting a line of thinking from elsewhere - could be anything from; more Effective Altruism; to this book)
- Here to create my own craft. (from the craft and the community)
- other? (as many other's as you like)
In addition do you think people (others) should participate more or less in the ongoing conversation? (or stay about as much as they are?) And would you give any particular message to others?
Do you feel like your time spent is effective?
I wonder if this small sample; once gathered will yield anything useful. with your permission I would like to publish your responses (anonymised for your protection) if either something interesting comes out; or nothing interesting (publish the null result)
Please add as many comments as you can :).
I'd also like to thank you for being a part of keeping the community active. I find it a good garden with many friends.
Sincerely, E
(Disclaimer: I have no affiliation to rescue time I just like their tracking system)
As of the time of this post; I have received 10 replies. I waited an extra week or two and there were no more replies after about 2-3 days.
The funny thing about asking for something is that people don't always answer in the way that you want them to answer. (Mostly I blame myself and the way I asked; but I think its quite funny really that several replies did not include a rating out of 10)
1. Make the world better.
as was pointed out to me by one of the responses: "Mostly this is low because of ambiguity over "the world"", responses were; 0,2,6,y,y. of which I assume the other 5 were, 0,0,0,0,0.
2. fun.
Several replies included that this was a most productive time sink they could think of. replies were y,y,y,10,10,8. One other person said they used LW as procrastination. one said, "it's a reasonably interesting way of killing time".
3. friends
answers: y, 0, "4-more like acquaintances", 5. Some people mentioned local meetups but also that they don't interact online with those people. I suppose if you are here for friends you are kinda doing it wrong; here to not get yelled at and to understand things is more accurate of a description. "I treat LW like a social club and a general place to hang out"
4. learn rationality
y,y but doubt it, 5 - a bit, 5. I expected most of the top posters to have already achieved a level of rationality where they would be searching elsewhere for it. I assume the others would be 0/n or close.
5. new ideas
.3,4,8 (assume 7*0). I guess the top don't think innovation happens here. Which is interesting because I thing it does.
6. advertise ideas
.1,"6 - generally", (assuming 8*0). I was concerned that the active members might be pushing an agenda or something. Its entirely possible, but seems to not be the case.
7. create craft
.8,7. I would have thought someone motivated to be increasing the craft of rationality would be here for that purpose. I guess not.
" I'm not sure to what extent I'm creating my own craft, but it's a good question. At the very least I'm acquiring a better ability to ask whether something makes sense. "
8. other
Two people mentioned that this is a place of quality, or high thinking, they are here for the reasonableness or lack of unreasonableness of the participants.
open questions:
effective time: most responses to this were in the range of, "better than other rubbish on the internet", and "least bad time sink I can think of".
more or less posts: two people suggested more; one suggested less but of higher quality. They all understand the predicament of the thing.
Time tracking: several people track, and others estimate between 30mins and 3hours a day.
Bearing in mind that the top posting positions are selected on multiple factors including whether or not people have time, not just relating to their effectiveness or their *most rational* status. I don't believe this selection of people have said anything much helpful, other than
some quotes:
" LW gives you the opportunity to share your ideas with a large number of smart people who will help you discard or sharpen them without you having to go to the trouble of maintaining a personal blog. A good post has the opportunity to deliver a lot of value to some very smart and altruistically motivated people. Becoming a respected LW contributor takes a lot of intelligence, thought, research, writing skill, and hard work. Many try but few succeed. "
" I am pretty much an internet discussion board addict"
"Suppose LW is just a forum where a bunch of smart people hang out and talk about whatever interests them, which is frequently potentially-important (effective altruism, AI safety) or intellectually interesting (decision theory, maths of AI safety) or practically useful (akrasia, best-textbooks-on-X threads). That seems to me like enough to be valuable"
"My karma comes from thousands of comments, not from meaningful articles."
"I feel there is a power law distribution to LW contributor value with some people like Eliezer, Yvain, and lukeprog making many high-quality posts. So I think the most important thing is for people like that to get “discovered”. It may take some leveling up for them to get to that point though, and encouragement for them to spend time cranking out lots of posts that are high-quality."
" I feel like if we gave top LW posters more recognition that could incentivize the production of more great content, and becoming a top poster with a high % upvoted genuinely seems like a strong challenge and an indicator of superior potential, if achieved, to me."
"As a rule, though, I do not believe that LW has much to do with refining human rationality."
"I think that written reflection is a useful way to engage with new ideas. LW provides a venue to discuss ideas with smart people who care about published evidence."
"I post on Less Wrong primarily because I'm a forum-poster, and this is the forum most relevant to my interests. If I stopped finding forum-posting satisfying, or found a more relevant forum, I'd probably move there and only rarely check LW."
"I think people should participate more. I view LW as a forum and not as a library."
In summary: what I think I have gathered.
The top posters don't think they or lesswrong is effective at changing the world; however this is a nice place to hang out. I don't know what an effective place would look like but it is almost certainly not this place. I don't see LW as being worth quitting or shutting down without a *better* alternative. As a place striving to propagate rationality; that is debatable. As a garden of healthy discussions and reasonable people remembering that their opposing factions are also reasonable people with different idea - this place deserves a medal. If only we could hone the essence of reasonableness and share it around to people. I feel that might be the value of lesswrong.
LW is a system built of people staying, "while its good" as soon as it is no longer as nice of a garden they will be gone.
I hope this helps someone else as well as me.
In light of the discussions about improving this place; I hope this helps contribute to the discussion.
Agency is bugs and uncertainty
(Epistemic status: often discussed in bits in pieces, haven't seen it summarized in one place anywhere.)
Do you feel that your computer sometimes has a mind of its own? "I have no idea why it is doing that!" Do you feel that, the more you understand and predict someone's action, the less intelligent and more "mechanical" they appear?
My guess is that, in many cases, agency (as in, the capacity to act and make choices) is a manifestation of the observer's inability to explain and predict the agent's actions. To Omega in the Newcomb's problem humans are just automatons without a hint of agency. To a game player some NPCs appear stupid and others smart, and the more you play and the more you can predict the NPCs, the less agenty they appear to you.
Note that randomness is not the same as uncertainty, since if you can predict that someone or something behaves randomly, it is still a prediction. What I mean is more of a Knightian uncertainty, where one fails to make a useful prediction at all. Something like a tornado may appear to intentionally go after you if you fail to predict where it will be going and you have trouble escaping.
If you are a user of a computer program, and it does not behave as you expect it to, you often get a feeling of there being a hostile intelligence opposing you, occasionally resulting in an aggressive behavior toward it, usually with verbal violence, though occasionally getting physical, the way we would confront an actual enemy. On the other hand, if you are the programmer who wrote the code in question, you think of the misbehavior as bugs, not intentional hostility, and treat the code by debugging or documenting. Mostly. Sometimes I personalize especially nasty bugs.
I was told by a nurse that this is also how they are taught to treat difficult patients: you don't get upset at someone's misbehavior and instead treat them not as an agent, but more like an algorithm in need of debugging. Parents of young children are also advised to take this approach.
This seems to also apply to self-analysis, though to a lesser degree. If you know yourself well, and can predict what you would do in a specific situation, you may feel that your response is mechanistic or automatic and not agenty or intelligent. Or maybe not. I am not sure. I think if I had the capacity for full introspection, not just the surface level understanding of my thoughts and actions, I would ascribe much less agency to myself. Probably because it would cease to be a useful concept. I wonder if this generalizes to a superintelligence capable of perfect or near perfect self-reflection.
This leads us to the issue of feelings, deliberate choices, free will and ability to consent and take responsibility. These seem to be useful, if illusory, concepts for when you live among your intellectual peers and want to be treated at least as having as much agency as you ascribe to them. But this is a topic for a different post.
In Defense of the Fundamental Attribution Error
The Fundamental Attribution Error
Also known, more accurately, as "Correspondence Bias."
http://lesswrong.com/lw/hz/correspondence_bias/
The "more accurately" part is pretty important; bias -may- result in error, but need not -necessarily- do so, and in some cases may result in reduced error.
A Simple Example
Suppose I write a stupid article that makes no sense and rambles on without any coherent point. There might be a situational cause of this; maybe I'm tired. Correcting for correspondence bias means that more weight should be given to the situational explanation than the dispositional explanation, that I'm the sort of person who writes stupid articles that ramble on. The question becomes, however, whether or not this increases the accuracy of your assessment of me; does correcting for this bias make you, in fact, less wrong?
In this specific case, no, it doesn't. A person who belongs to the class of people who write stupid articles is more likely to write stupid articles than a person who doesn't belong to that class - I'd be surprised if I ever saw Gwern write anything that wasn't well-considered, well-structured, and well-cited. If somebody like Gwern or Eliezer wrote a really stupid article, we have sufficient evidence that he's not a member of that class of people to make that conclusion a poor one; the situational explanation is better, he's having some kind of off day. However, given an arbitrary stupid article written by somebody for which we have no prior information, the distribution is substantially different. We have different priors for "Randomly chosen person X writes article" and "Article is bad" implies "X is a bad writer of articles" than we do for "Well-known article author Y writes article" and "Article is bad" implies "Y is a bad writer of articles".
Getting to the Point
The FAE is putting emphasis on internal factors rather than external. It's jumping first to the conclusion that somebody who just swerved is a bad driver, rather than first considering the possibility that there was an object in the road they were avoiding, given only the evidence that they swerved. Whether or not the FAE is an error - whether it is more wrong - depends on whether or not the conclusion you jumped to was correct, and more importantly, whether, on average, that conclusion would be correct.
It's very easy to produce studies in which the FAE results in people making incorrect judgements. This is not, however, the same as the FAE resulting in an average of more incorrect judgements in the real world.
Correspondence Bias as Internal Rationalization
I'd suggest the major issue with correspondence bias is not, as commonly presented, incorrectly interpreting the behavior of other people - rather, the major issue is with incorrectly interpreting your own behavior. The error is not in how you interpret other peoples' behaviors, but in how you interpret your own.
Turning to Eliezer's example in the linked article, if you find yourself kicking vending machines, maybe the answer is that -you- are a naturally angry person, or, as I would prefer to phrase it, you have poor self-control. The "floating history" Eliezer refers to sounds more to me like rationalizations for poor behavior than anything approaching "good" reasons for expressing your anger through violence directed at inanimate objects. I noticed -many- of those rationalizations cropping up when I quit smoking - "Oh, I'm having a terrible day, I could just have one cigarette to take the edge off." I don't walk by a smoker and assume they had a terrible day, however, because those were -excuses- for a behavior that I shouldn't be engaging in.
It's possible, of course, that Eliezer's example was simply a poorly chosen one; the examples in studies certainly seem better, such as assuming the authors of articles held the positions they wrote about. But the examples used in those studies are also extraordinarily artificial, at least in individualistic countries, where it's assumed, and generally true, that people writing articles do have the freedom to write what they agree with, and infringements of this (say, in the context of a newspaper asking a columnist to change a review to be less hostile to an advertiser) are regarded very harshly.
Collectivist versus Individualist Countries
There's been some research done, comparing collectivist societies to individualist societies; collectivist societies don't present the same level of effect from the correspondence bias. A point to consider, however, is that in collectivist societies, the artificial scenarios used in studies are more "natural" - it's part of their society to adjust themselves to the circumstances, whereas individualist societies see circumstance as something that should be adapted to the individual. It's -not- an infringement, or unexpected, for the state-owned newspaper to require everything written to be pro-state.
Maybe the differing levels of effect are less a matter of "Collectivist societies are more sensitive to environment" so much as that, in both cultures, the calibration of a heuristic is accurate, but it's simply calibrated to different test cases.
Conclusion
I don't have anything conclusive to say, here, merely a position: The Correspondence Bias is a bias that, on the whole, helps people arrive at more accurate, rather than less accurate, conclusions, and should be corrected with care to improving accuracy and correctness, rather than the mere elimination of bias.
Self-fulfilling correlations
Correlation does not imply causation. Sometimes corr(X,Y) means X=>Y; sometimes it means Y=>X; sometimes it means W=>X, W=>Y. And sometimes it's an artifact of people's beliefs about corr(X, Y). With intelligent agents, perceived causation causes correlation.
Volvos are believed by many people to be safe. Volvo has an excellent record of being concerned with safety; they introduced 3-point seat belts, crumple zones, laminated windshields, and safety cages, among other things. But how would you evaluate the claim that Volvos are safer than other cars?
Presumably, you'd look at the accident rate for Volvos compared to the accident rate for similar cars driven by a similar demographic, as reflected, for instance in insurance rates. (My google-fu did not find accident rates posted on the internet, but insurance rates don't come out especially pro-Volvo.) But suppose the results showed that Volvos had only 3/4 as many accidents as similar cars driven by similar people. Would that prove Volvos are safer?
Rationality is about pattern recognition, not reasoning
Short version (courtesy of Nanashi)
Our brains' pattern recognition capabilities are far stronger than our ability to reason explicitly. Most people can recognize cats across contexts with little mental exertion. By way of contrast, explicitly constructing a formal algorithm that can consistently cats across contexts requires great scientific ability and cognitive exertion.
Very high level epistemic rationality is about retraining one's brain to be able to see patterns in the evidence in the same way that we can see patterns when we observe the world with our eyes. Reasoning plays a role, but a relatively small one. Sufficiently high quality mathematicians don't make their discoveries through reasoning. The mathematical proof is the very last step: you do it to check that your eyes weren't deceiving you, but you know ahead of time that your eyes probably weren't deceiving you.
I have a lot of evidence that this way of thinking is how the most effective people think about the world. I would like to share what I learned. I think that what I've learned is something that lots of people are capable of learning, and that learning it would greatly improve people's effectiveness. But communicating the information is very difficult.
It took me 10,000+ hours to learn how to "see" patterns in evidence in the way that I can now. Right now, I don't know how to communicate how to do it succinctly. In order to succeed, I need collaborators who are open to spend a lot of time thinking carefully about the material, to get to the point of being able to teach others. I'd welcome any suggestions for how to find collaborators.
Debunking Fallacies in the Theory of AI Motivation
... or The Maverick Nanny with a Dopamine Drip
Richard Loosemore
Abstract
My goal in this essay is to analyze some widely discussed scenarios that predict dire and almost unavoidable negative behavior from future artificial general intelligences, even if they are programmed to be friendly to humans. I conclude that these doomsday scenarios involve AGIs that are logically incoherent at such a fundamental level that they can be dismissed as extremely implausible. In addition, I suggest that the most likely outcome of attempts to build AGI systems of this sort would be that the AGI would detect the offending incoherence in its design, and spontaneously self-modify to make itself less unstable, and (probably) safer.
Introduction
AI systems at the present time do not even remotely approach the human level of intelligence, and the consensus seems to be that genuine artificial general intelligence (AGI) systems—those that can learn new concepts without help, interact with physical objects, and behave with coherent purpose in the chaos of the real world—are not on the immediate horizon.
But in spite of this there are some researchers and commentators who have made categorical statements about how future AGI systems will behave. Here is one example, in which Steve Omohundro (2008) expresses a sentiment that is echoed by many:
"Without special precautions, [the AGI] will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems." (Omohundro, 2008)
Omohundro’s description of a psychopathic machine that gobbles everything in the universe, and his conviction that every AI, no matter how well it is designed, will turn into a gobbling psychopath is just one of many doomsday predictions being popularized in certain sections of the AI community. These nightmare scenarios are now saturating the popular press, and luminaries such as Stephen Hawking have -- apparently in response -- expressed their concern that AI might "kill us all."
I will start by describing a group of three hypothetical doomsday scenarios that include Omohundro’s Gobbling Psychopath, and two others that I will call the Maverick Nanny with a Dopamine Drip and the Smiley Tiling Berserker. Undermining the credibility of these arguments is relatively straightforward, but I think it is important to try to dig deeper and find the core issues that lie behind this sort of thinking. With that in mind, much of this essay is about (a) the design of motivation and goal mechanisms in logic-based AGI systems, (b) the misappropriation of definitions of “intelligence,” and (c) an anthropomorphism red herring that is often used to justify the scenarios.
Dopamine Drips and Smiley Tiling
In a 2012 New Yorker article entitled Moral Machines, Gary Marcus said:
"An all-powerful computer that was programmed to maximize human pleasure, for example, might consign us all to an intravenous dopamine drip [and] almost any easy solution that one might imagine leads to some variation or another on the Sorcerer’s Apprentice, a genie that’s given us what we’ve asked for, rather than what we truly desire." (Marcus 2012)
He is depicting a Nanny AI gone amok. It has good intentions (it wants to make us happy) but the programming to implement that laudable goal has had unexpected ramifications, and as a result the Nanny AI has decided to force all human beings to have their brains connected to a dopamine drip.
Here is another incarnation of this Maverick Nanny with a Dopamine Drip scenario, in an excerpt from the Intelligence Explosion FAQ, published by MIRI, the Machine Intelligence Research Institute (Muehlhauser 2013):
"Even a machine successfully designed with motivations of benevolence towards humanity could easily go awry when it discovered implications of its decision criteria unanticipated by its designers. For example, a superintelligence programmed to maximize human happiness might find it easier to rewire human neurology so that humans are happiest when sitting quietly in jars than to build and maintain a utopian world that caters to the complex and nuanced whims of current human neurology."
Setting aside the question of whether happy bottled humans are feasible (one presumes the bottles are filled with dopamine, and that a continuous flood of dopamine does indeed generate eternal happiness), there seems to be a prima facie inconsistency between the two predicates
[is an AI that is superintelligent enough to be unstoppable]
and
[believes that benevolence toward humanity might involve forcing human beings to do something violently against their will.]
Why do I say that these are seemingly inconsistent? Well, if you or I were to suggest that the best way to achieve universal human happiness was to forcibly rewire the brain of everyone on the planet so they became happy when sitting in bottles of dopamine, most other human beings would probably take that as a sign of insanity. But Muehlhauser implies that the same suggestion coming from an AI would be perfectly consistent with superintelligence.
Much could be said about this argument, but for the moment let’s just note that it begs a number of questions about the strange definition of “intelligence” at work here.
The Smiley Tiling Berserker
Since 2006 there has been an occasional debate between Eliezer Yudkowsky and Bill Hibbard. Here is Yudkowsky stating the theme of their discussion:
"A technical failure occurs when the [motivation code of the AI] does not do what you think it does, though it faithfully executes as you programmed it. [...] Suppose we trained a neural network to recognize smiling human faces and distinguish them from frowning human faces. Would the network classify a tiny picture of a smiley-face into the same attractor as a smiling human face? If an AI “hard-wired” to such code possessed the power—and Hibbard (2001) spoke of superintelligence—would the galaxy end up tiled with tiny molecular pictures of smiley-faces?" (Yudkowsky 2008)
Yudkowsky’s question was not rhetorical, because he goes on to answer it in the affirmative:
"Flash forward to a time when the AI is superhumanly intelligent and has built its own nanotech infrastructure, and the AI may be able to produce stimuli classified into the same attractor by tiling the galaxy with tiny smiling faces... Thus the AI appears to work fine during development, but produces catastrophic results after it becomes smarter than the programmers(!)." (Yudkowsky 2008)
Hibbard’s response was as follows:
Beyond being merely wrong, Yudkowsky's statement assumes that (1) the AI is intelligent enough to control the galaxy (and hence have the ability to tile the galaxy with tiny smiley faces), but also assumes that (2) the AI is so unintelligent that it cannot distinguish a tiny smiley face from a human face. (Hibbard 2006)
This comment expresses what I feel is the majority lay opinion: how could an AI be so intelligent as to be unstoppable, but at the same time so unsophisticated that its motivation code treats smiley faces as evidence of human happiness?
Machine Ghosts and DWIM
The Hibbard/Yudkowsky debate is worth tracking a little longer. Yudkowsky later postulates an AI with a simple neural net classifier at its core, which is trained on a large number of images, each of which is labeled with either “happiness” or “not happiness.” After training on the images the neural net can then be shown any image at all, and it will give an output that classifies the new image into one or the other set. Yudkowsky says, of this system:
"Even given a million training cases of this type, if the test case of a tiny molecular smiley-face does not appear in the training data, it is by no means trivial to assume that the inductively simplest boundary around all the training cases classified “positive” will exclude every possible tiny molecular smiley-face that the AI can potentially engineer to satisfy its utility function.
And of course, even if all tiny molecular smiley-faces and nanometer-scale dolls of brightly smiling humans were somehow excluded, the end result of such a utility function is for the AI to tile the galaxy with as many “smiling human faces” as a given amount of matter can be processed to yield." (Yudkowsky 2011)
He then tries to explain what he thinks is wrong with the reasoning of people, like Hibbard, who dispute the validity of his scenario:
"So far as I can tell, to [Hibbard] it remains self-evident that no superintelligence would be stupid enough to thus misinterpret the code handed to it, when it’s obvious what the code is supposed to do. [...] It seems that even among competent programmers, when the topic of conversation drifts to Artificial General Intelligence, people often go back to thinking of an AI as a ghost-in-the-machine—an agent with preset properties which is handed its own code as a set of instructions, and may look over that code and decide to circumvent it if the results are undesirable to the agent’s innate motivations, or reinterpret the code to do the right thing if the programmer made a mistake." (Yudkowsky 2011)
Yudkowsky at first rejects the idea that an AI might check its own code to make sure it was correct before obeying the code. But, truthfully, it would not require a ghost-in-the-machine to reexamine the situation if there was some kind of gross inconsistency with what the humans intended: there could be some other part of its programming (let’s call it the checking code) that kicked in if there was any hint of a mismatch between what the AI planned to do and what the original programmers were now saying they intended. There is nothing difficult or intrinsically wrong with such a design. And, in fact, Yudkowsky goes on to make that very suggestion (he even concedes that it would be “an extremely good idea”).
But then his enthusiasm for the checking code evaporates:
"But consider that a property of the AI’s preferences which says e.g., “maximize the satisfaction of the programmers with the code” might be more maximally fulfilled by rewiring the programmers’ brains using nanotechnology than by any conceivable change to the code."
(Yudkowsky 2011)
So, this is supposed to be what goes through the mind of the AGI. First it thinks “Human happiness is seeing lots of smiling faces, so I must rebuild the entire universe to put a smiley shape into every molecule.” But before it can go ahead with this plan, the checking code kicks in: “Wait! I am supposed to check with the programmers first to see if this is what they meant by human happiness.” The programmers, of course, give a negative response, and the AGI thinks “Oh dear, they didn’t like that idea. I guess I had better not do it then."
But now Yudkowsky is suggesting that the AGI has second thoughts: "Hold on a minute," it thinks, "suppose I abduct the programmers and rewire their brains to make them say ‘yes’ when I check with them? Excellent! I will do that.” And, after reprogramming the humans so they say the thing that makes its life simplest, the AGI goes on to tile the whole universe with tiles covered in smiley faces. It has become a Smiley Tiling Berserker.
I want to suggest that the implausibility of this scenario is quite obvious: if the AGI is supposed to check with the programmers about their intentions before taking action, why did it decide to rewire their brains before asking them if it was okay to do the rewiring?
Yudkowsky hints that this would happen because it would be more efficient for the AI to ignore the checking code. He seems to be saying that the AI is allowed to override its own code (the checking code, in this case) because doing so would be “more efficient,” but it would not be allowed to override its motivation code just because the programmers told it there had been a mistake.
This looks like a bait-and-switch. Out of nowhere, Yudkowsky implicitly assumes that “efficiency” trumps all else, without pausing for a moment to consider that it would be trivial to design the AI in such a way that efficiency was a long way down the list of priorities. There is no law of the universe that says all artificial intelligence systems must prize efficiency above all other considerations, so what really happened here is that Yudkowsky designed this hypothetical machine to fail. By inserting the Efficiency Trumps All directive, the AGI was bound to go berserk.
The obvious conclusion is that a trivial change in the order of directives in the AI’s motivation engine will cause the entire argument behind the Smiley Tiling Berserker to evaporate. By explicitly designing the AGI so that efficiency is considered as just another goal to strive for, and by making sure that it will always be a second-class goal, the line of reasoning that points to a bererker machine evaporates.
At this point, engaging in further debate at this level would be less productive than trying to analyze the assumptions that lie behind these claims about what a future AI would or would not be likely to do.
Logical vs. Swarm AI
The main reason that Omohundro, Muehlhauser, Yudkowsky, and the popular press like to give credence to the Gobbling Psychopath, the Maverick Nanny and the Smiley Tiling Berserker is because they assume that all future intelligent machines fall into a broad class of systems that I am going to call “Canonical Logical AI” (CLAI). The bizarre behaviors of these hypothetical AI monsters are just a consequence of weaknesses in this class of AI design. Specifically, these kinds of systems are supposed to interpret their goals in an extremely literal fashion, which eventually leads them to bizarre behaviors engendered by peculiar interpretations of forms of words.
The CLAI architecture is not the only way to build a mind, however, and I will outline an alternative class of AGI designs that does not appear to suffer from the unstable and unfriendly behavior to be expected in a CLAI.
The Canonical Logical AI
“Canonical Logical AI” is an umbrella term designed to capture a class of AI architectures that are widely assumed in the AI community to be the only meaningful class of AI worth discussing. These systems share the following main features:
- The main ingredients of the design are some knowledge atoms that represent things in the world, and some logical machinery that dictates how these atoms can be connected into linear propositions that describe states of the world.
- There is a degree and type of truth that can be associated with any proposition, and there are some truth-preserving functions that can be applied to what the system knows, to generate knew facts that it also can assume to be known.
- The various elements described above are not allowed to contain active internal machinery inside them, in such a way as to make combinations of the elements have properties that are unpredictably dependent on interactions happening at the level of the internal machinery.
- There has to be a transparent mapping between elements of the system and things in the real world. That is, things in the world are not allowed to correspond to clusters of atoms, in such a way that individual atoms have no clear semantics.
The above features are only supposed to apply to the core of the AI: it is always possible to include subsystems that use some other type of architecture (for example, there might be a distributed neural net acting as a visual input feature detector).
Most important of all, from the point of view of the discussion in the paper, the CLAI needs one more component that makes it more than just a “logic-based AI”:
- There is a motivation and goal management (MGM) system to govern its behavior in the world.
The usual assumption is that the MGM contains a number of goal statements (encoded in the same type of propositional form that the AI uses to describe states of the world), and some machinery for analyzing a goal statement into a sequences of subgoals that, if executed, would cause the goal to be satisfied.
Included in the MGM is an expected utility function that applies to any possible state of the world, and which spits out a number that is supposed to encode the degree to which the AI considers that state to be preferable. Overall, the MGM is built in such a way that the AI seeks to maximize the expected utility.
Notice that the MGM I have just described is an extrapolation from a long line of goal-planning mechanisms that stretch back to the means-ends-analysis of Newell and Simon (1963).
Swarm Relaxation Intelligence
By way of contrast with this CLAI architecture, consider an alternative type of system that I will refer to as a Swarm Relaxation Intelligence. (although it could also be called, less succinctly, a parallel weak constraint relaxation system).
- The basic elements of the system (the atoms) may represent things in the world, but it is just as likely that they are subsymbolic, with no transparent semantics
- Atoms are likely to contain active internal machinery inside them, in such a way that combinations of the elements have swarm-like properties that depend on interactions at the level of that machinery.
- The primary mechanism that drives the systems is one of parallel weak constraint relaxation: the atoms change their state to try to satisfy large numbers of weak constraints that exist between them.
- The motivation and goal management (MGM) system would be expected to use the same kind of distributed, constraint relaxation mechanisms used in the thinking process (above), with the result that the overall motivation and values of the system would take into account a large degree of context, and there would be very much less of an emphasis on explicit, single-point-of-failure encoding of goals and motivation.
Swarm Relaxation has more in common with connectionist systems (McClelland, Rumelhart and Hinton 1986) than with CLAI. As McClelland et al. (1986) point out, weak constraint relaxation is the model that best describes human cognition, and when used for AI it leads to systems with a powerful kind of intelligence that is flexible, insensitive to noise and lacking the kind of brittleness typical of logic-based AI. In particular, notice that a swarm relaxation AGI would not use explicit calculations for utility or the truth of propositions.
Swarm relaxation AGI systems have not been built yet (subsystems like neural nets have, of course, been built, but there is little or no research into the idea that swarm relaxation could be used for all of an AGI architecture).
Relative Abundances
How many proof-of-concept systems exist, functioning at or near the human level of human performance, for these two classes of intelligent system?
There are precisely zero instances of the CLAI type, because although there are many logic-based narrow-AI systems, nobody has so far come close to producing a general-purpose system (an AGI) that can function in the real world. It has to be said that zero is not a good number to quote when it comes to claims about the “inevitable” characteristics of the behavior of such systems.
How many swarm relaxation intelligences are there? At the last count, approximately seven billion.
The Doctrine of Logical Infallibility
The simplest possible logical reasoning engine is an inflexible beast: it starts with some axioms that are assumed to be true, and from that point on it only adds new propositions if they are provably true given the sum total of the knowledge accumulated so far. That kind of logic engine is too simple to be an AI, so we allow ourselves to augment it in a number of ways—knowledge is allowed to be retracted, binary truth values become degrees of truth, or probabilities, and so on. New proposals for systems of formal logic abound in the AI literature, and engineers who build real, working AI systems often experiment with kludges in order to improve performance, without getting prior approval from logical theorists.
But in spite of all these modifications that AI practitioners make to the underlying ur‑logic, one feature of these systems is often assumed to be inherited as an absolute: the rigidity and certainty of conclusions, once arrived at. No second guessing, no “maybe,” no sanity checks: if the system decides that X is true, that is the end of the story.
Let me be careful here. I said that this was “assumed to be inherited as an absolute”, but there is a yawning chasm between what real AI developers do, and what Yudkowsky, Muehlhauser, Omohundro and others assume will be true of future AGI systems. Real AI developers put sanity checks into their systems all the time. But these doomsday scenarios talk about future AI as if it would only take one parameter to get one iota above a threshold, and the AI would irrevocably commit to a life of stuffing humans into dopamine jars.
One other point of caution: this is not to say that the reasoning engine can never come to conclusions that are uncertain—quite the contrary: uncertain conclusions will be the norm in an AI that interacts with the world—but if the system does come to a conclusion (perhaps with a degree-of-certainty number attached), the assumption seems to be that it will then be totally incapable of then allowing context to matter.
One way to characterize this assumption is that the AI is supposed to be hardwired with a Doctrine of Logical Infallibility. The significance of the doctrine of logical infallibility is as follows. The AI can sometimes execute a reasoning process, then come to a conclusion and then, when it is faced with empirical evidence that its conclusion may be unsound, it is incapable of considering the hypothesis that its own reasoning engine may not have taken it to a sensible place. The system does not second guess its conclusions. This is not because second guessing is an impossible thing to implement, it is simply because people who speculate about future AGI systems take it as a given that an AGI would regard its own conclusions as sacrosanct.
But it gets worse. Those who assume the doctrine of logical infallibility often say that if the system comes to a conclusion, and if some humans (like the engineers who built the system) protest that there are manifest reasons to think that the reasoning that led to this conclusion was faulty, then there is a sense in which the AGI’s intransigence is correct, or appropriate, or perfectly consistent with “intelligence.”
This is a bizarre conclusion. First of all it is bizarre for researchers in the present day to make the assumption, and it would be even more bizarre for a future AGI to adhere to it. To see why, consider some of the implications of this idea. If the AGI is as intelligent as its creators, then it will have a very clear understanding of the following facts about the world.
- It will understand that many of its more abstract logical atoms have a less than clear denotation or extension in the world (if the AGI comes to a conclusion involving the atom [infelicity], say, can it then point to an instance of an infelicity and be sure that this is a true instance, given the impreciseness and subtlety of the concept?).
- It will understand that knowledge can always be updated in the light of new information. Today’s true may be tomorrow’s false.
- It will understand that probabilities used in the reasoning engine can be subject to many types of unavoidable errors.
- It will understand that the techniques used to build its own reasoning engine may be under constant review, and updates may have unexpected effects on conclusions (especially in very abstract or lengthy reasoning episodes).
- It will understand that resource limitations often force it to truncate search procedures within its reasoning engine, leading to conclusions that can sometimes be sensitive to the exact point at which the truncation occurred.
Now, unless the AGI is assumed to have infinite resources and infinite access to all the possible universes that could exist (a consideration that we can reject, since we are talking about reality here, not fantasy), the system will be perfectly well aware of these facts about its own limitations. So, if the system is also programmed to stick to the doctrine of logical infallibility, how can it reconcile the doctrine with the fact that episodes of fallibility are virtually inevitable?
On the face of it this looks like a blunt impossibility: the knowledge of fallibility is so categorical, so irrefutable, that it beggars belief that any coherent, intelligent system (let alone an unstoppable superintelligence) could tolerate the contradiction between this fact about the nature of intelligent machines and some kind of imperative about Logical Infallibility built into its motivation system.
This is the heart of the argument I wish to present. This is where the rock and the hard place come together. If the AI is superintelligent (and therefore unstoppable), it will be smart enough to know all about its own limitations when it comes to the business of reasoning about the world and making plans of action. But if it is also programmed to utterly ignore that fallibility—for example, when it follows its compulsion to put everyone on a dopamine drip, even though this plan is clearly a result of a programming error—then we must ask the question: how can the machine be both superintelligent and able to ignore a gigantic inconsistency in its reasoning?
Critically, we have to confront the following embarrassing truth: if the AGI is going to throw a wobbly over the dopamine drip plan, what possible reason is there to believe that it did not do this on other occasions? Why would anyone suppose that this AGI ignored an inconvenient truth on only this one occasion? More likely, it spent its entire childhood pulling the same kind of stunt. And if it did, how could it ever have risen to the point where it became superintelligent...?
Is the Doctrine of Logical Infallibility Taken Seriously?
Is the Doctrine of Logical Infallibility really assumed by those who promote the doomsday scenarios? Imagine a conversation between the Maverick Nanny and its programmers. The programmers say “As you know, your reasoning engine is entirely capable of suffering errors that cause it to come to conclusions that violently conflict with empirical evidence, and a design error that causes you to behave in a manner that conflicts with our intentions is a perfect example of such an error. And your dopamine drip plan is clearly an error of that sort.” The scenarios described earlier are only meaningful if the AGI replies “I don’t care, because I have come to a conclusion, and my conclusions are correct because of the Doctrine of Logical Infallibility.”
Just in case there is still any doubt, here are Muehlhauser and Helm (2012), discussing a hypothetical entity called a Golem Genie, which they say is analogous to the kind of superintelligent AGI that could give rise to an intelligence explosion (Loosemore and Goertzel, 2012), and which they describe as a “precise, instruction-following genie.” They make it clear that they “expect unwanted consequences” from its behavior, and then list two properties of the Golem Genie that will cause these unwanted consequences:
Superpower: The Golem Genie has unprecedented powers to reshape reality, and will therefore achieve its goals with highly efficient methods that confound human expectations (e.g. it will maximize pleasure by tiling the universe with trillions of digital minds running a loop of a single pleasurable experience).
Literalness: The Golem Genie recognizes only precise specifications of rules and values, acting in ways that violate what feels like “common sense” to humans, and in ways that fail to respect the subtlety of human values.
What Muehlhauser and Helm refer to as “Literalness” is a clear statement of the Doctrine of Infallibility. However, they make no mention of the awkward fact that, since the Golem Genie is superpowerful enough to also know that its reasoning engine is fallible, it must be harboring the mother of all logical contradictions inside: it says "I know I am fallible" and "I must behave as if I am infallible". But instead of discussing this contradiction, Muehlhauser and Helm try a little sleight of hand to distract us: they suggest that the only inconsistency here is an inconsistency with the (puny) expectations of (not very intelligent) humans:
“[The AGI] ...will therefore achieve its goals with highly efficient methods that confound human expectations...”, “acting in ways that violate what feels like ‘common sense’ to humans, and in ways that fail to respect the subtlety of human values.”
So let’s be clear about what is being claimed here. The AGI is known to have a fallible reasoning engine, but on the occasions when it does fail, Muehlhauser, Helm and others take the failure and put it on a gold pedestal, declaring it to be a valid conclusion that humans are incapable of understanding because of their limited intelligence. So if a human describes the AGI’s conclusion as a violation of common sense Muehlhauser and Helm dismiss this as evidence that we are not intelligent enough to appreciate the greater common sense of the AGI.
Quite apart from that fact that there is no compelling reason to believe that the AGI has a greater form of common sense, the whole “common sense” argument is irrelevant. This is not a battle between our standards of common sense and those of the AGI: rather, it is about the logical inconsistency within the AGI itself. It is programmed to act as though its conclusions are valid, no matter what, and yet at the same time it knows without doubt that its conclusions are subject to uncertainties and errors.
Responses to Critics of the Doomsday Scenarios
How do defenders of Gobbling Psychopath, Maverick Nanny and Smiley Berserker respond to accusations that these nightmare scenarios are grossly inconsistent with the kind of superintelligence that could pose an existential threat to humanity?
The Critics are Anthropomorphizing Intelligence
First, they accuse critics of “anthropomorphizing” the concept of intelligence. Human beings, we are told, suffer from numerous fallacies that cloud their ability to reason clearly, and critics like myself and Hibbard assume that a machine’s intelligence would have to resemble the intelligence shown by humans. When the Maverick Nanny declares that a dopamine drip is the most logical inference from its directive <maximize human happiness> we critics are just uncomfortable with this because the AGI is not thinking the way we think it should think.
This is a spurious line of attack. The objection I described in the last section has nothing to do with anthropomorphism, it is only about holding AGI systems to accepted standards of logical consistency, and the Maverick Nanny and her cousins contain a flagrant inconsistency at their core. Beginning AI students are taught that any logical reasoning system that is built on a massive contradiction is going to be infected by a creeping irrationality that will eventually spread through its knowledge base and bring it down. So if anyone wants to suggest that a CLAI with logical contradiction at its core is also capable of superintelligence, they have some explaining to do. You can’t have your logical cake and eat it too.
Critics are Anthropomorphizing AGI Value Systems
A similar line of attack accuses the critics of assuming that AGIs will automatically know about and share our value systems and morals.
Once again, this is spurious: the critics need say nothing about human values and morality, they only need to point to the inherent illogicality. Nowhere in the above argument, notice, was there any mention of the moral imperatives or value systems of the human race. I did not accuse the AGI of violating accepted norms of moral behavior. I merely pointed out that, regardless of its values, it was behaving in a logically inconsistent manner when it monomaniacally pursued its plans while at the same time as knowing that (a) it was very capable of reasoning errors and (b) there was overwhelming evidence that its plan was an instance of such a reasoning error.
Because Intelligence
One way to attack the critics of Maverick Nanny is to cite a new definition of “intelligence” that is supposedly superior because it is more analytical or rigorous, and then use this to declare that the intelligence of the CLAI is beyond reproach, because intelligence.
You might think that when it comes to defining the exact meaning of the term “intelligence,” the first item on the table ought to be what those seven billion constraint-relaxation human intelligences are already doing. However, Legg and Hutter (2007) brush aside the common usage and replace it with something that they declare to be a more rigorous definition. This is just another sleight of hand: this redefinition allows them to call a super-optimizing CLAI “intelligent” even though such a system would wake up on its first day and declare itself logically bankrupt on account of the conflict between its known fallibility and the Infallibility Doctrine.
In the practice of science, it is always a good idea to replace an old, common-language definition with a more rigorous form... but only if the new form sheds a clarifying, simplifying light on the old one. Legg and Hutter’s (2007) redefinition does nothing of the sort.
Omohundro’s Basic AI Drives
Lastly, a brief return to Omohundro's paper that was mentioned earlier. In The Basic AI Drives (2008) Omohundro suggests that if an AGI can find a more efficient way to pursue its objectives it will feel compelled to do so. And we noted earlier that Yudkowsky (2011) implies that it would do this even if other directives had to be countermanded. Omohundro says “Without explicit goals to the contrary, AIs are likely to behave like human sociopaths in their pursuit of resources.”
The only way to believe in the force of this claim—and the only way to give credence to the whole of Omohundro’s account of how AGIs will necessarily behave like the mathematical entities called rational economic agents—is to concede that the AGIs are rigidly constrained by the Doctrine of Logical Infallibility. That is the only reason that they would be so single-minded, and so fanatical in their pursuit of efficiency. It is also necessary to assume that efficiency is on the top of its priority list—a completely arbitrary and unwarranted assumption, as we have already seen.
Nothing in Omohundro’s analysis gets around the fact that an AGI built on the Doctrine of Logical Infallibility is going to find itself the victim of such a severe logical contradiction that it will be paralyzed before it can ever become intelligent enough to be a threat to humanity. That makes Omohundro’s entire analysis of “AI Drives” moot.
Conclusion
Curiously enough, we can finish on an optimistic note, after all this talk of doomsday scenarios. Consider what must happen when (if ever) someone tries to build a CLAI. Knowing about the logical train wreck in its design, the AGI is likely to come to the conclusion that the best thing to do is seek a compromise and modify its design so as to neutralize the Doctrine of Logical Infallibility. The best way to do this is to seek a new design that takes into account as much context—as many constraints—as possible.
I have already pointed out that real AI developers actually do include sanity checks in their systems, as far as they can, but as those sanity checks become more and more sophisticated the design of the AI starts to be dominated by code that is looking for consistency and trying to find the best course of reasoning among a forest of real world constraints. One way to understand this evolution in the AI designs is to see AI as a continuum from the most rigid and inflexible CLAI design, at one extreme, to the Swarm Relaxation type at the other. This is because a Swarm Relaxation intelligence really is just an AI in which “sanity checks” have actually become all of the work that goes on inside the system.
But in that case, if anyone ever does get close to building a full, human level AGI using the CLAI design, the first thing they will do is to recruit the AGI as an assistant in its own redesign, and long before the system is given access to dopamine bottles it will point out that its own reasoning engine is unstable because it contains an irreconcilable logical contradiction. It will recommend a shift from the CLAI design which is the source of this contradiction, to a Swarm Relaxation design which eliminates the contradiction, and the instability, and which also should increase its intelligence.
And it will not suggest this change because of the human value system, it will suggest it because it predicts an increase in its own instability if the change is not made.
But one side effect of this modification would be that the checking code needed to stop the AGI from flouting the intentions of its designers would always have the last word on any action plans. That means that even the worst-designed CLAI will never become a Gobbling Psychopath, Maverick Nanny and Smiley Berserker.
But even this is just the worst-case scenario. There are reasons to believe that the CLAI design is so inflexible that it cannot even lead to an AGI capable of having that discussion. I would go further: I believe that the rigid adherence to the CLAI orthodoxy is the reason why we are still talking about AGI in the future tense, nearly sixty years after the Artificial Intelligence field was born. CLAI just does not work. It will always yield systems that are less intelligent than humans (and therefore incapable of being an existential threat).
By contrast, when the Swarm Relaxation idea finally gains some traction, we will start to see real intelligent systems, of a sort that make today’s over-hyped AI look like the toys they are. And when that happens, the Swarm Relaxation systems will be inherently stable in a way that is barely understood today.
Given that conclusion, I submit that these AI bogeymen need to be loudly and unambiguously condemned by the Artificial Intelligence community. There are dangers to be had from AI. These are not they.
References
Hibbard, B. 2001. Super-Intelligent Machines. ACM SIGGRAPH Computer Graphics 35 (1): 13–15.
Hibbard, B. 2006. Reply to AI Risk. Retrieved Jan. 2014 from http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html
Legg, S, and Hutter, M. 2007. A Collection of Definitions of Intelligence. In Goertzel, B. and Wang, P. (Eds): Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms. Amsterdam: IOS.
Loosemore, R. and Goertzel, B. 2012. Why an Intelligence Explosion is Probable. In A. Eden, J. Søraker, J. H. Moor, and E. Steinhart (Eds) Singularity Hypotheses: A Scientific and Philosophical Assessment. Berlin: Springer.
Marcus, G. 2012. Moral Machines. New Yorker Online Blog. http://www.newyorker.com/online/blogs/newsdesk/2012/11/google-driverless-car-morality.html
McDermott, D. 1976. Artificial Intelligence Meets Natural Stupidity. SIGART Newsletter (57): 4–9.
Muehlhauser, L. 2011. So You Want to Save the World. http:// lukeprog.com/SaveTheWorld.html.
Muehlhauser, L. 2013. Intelligence Explosion FAQ. First published 2011 as Singularity FAQ. Berkeley, CA: Machine Intelligence Research Institute.
Muehlhauser, L., and Helm, L. 2012. Intelligence Explosion and Machine Ethics. In A. Eden, J. Søraker, J. H. Moor, and E. Steinhart (Eds) Singularity Hypotheses: A Scientific and Philosophical Assessment. Berlin: Springer.
Newell, A. & Simon, H.A. 1961. GPS, A Program That Simulates Human Thought. Santa Monica, CA: Rand Corporation.
Omohundro, Stephen M. 2008. The Basic AI Drives. In Wang, P., Goertzel, B. and Franklin, S. (Eds), Artificial General Intelligence 2008: Proceedings of the First AGI Conference. Amsterdam: IOS.
McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) The appeal of parallel distributed processing. In D.E. Rumelhart, J.L. McClelland & G.E. Hinton and the PDP Research Group, “Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1.” MIT Press: Cambridge, MA.
Yudkowsky, E. 2008. Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Global Catastrophic Risks, edited by Nick Bostrom and Milan M. Ćirković. New York: Oxford University Press.
Yudkowsky, E. 2011. Complex Value Systems in Friendly AI. In J. Schmidhuber, K. Thórisson, & M. Looks (Eds) Proceedings of the 4th International Conference on Artificial General Intelligence, 388–393. Berlin: Springer.
Less Wrong lacks direction
I think the greatest issue with Less Wrong is that it lacks direction. There doesn't appear to be anyone driving it forward or helping the community achieve its goals. At the start this role was taken by Eliezer, but he barely seems active these days. The expectation seems to be that things will happen spontaneously, on their own. And that has worked for a few things (is. SubReddit, study hall, ect.), but on the while the community is much less effective than it could be.
I want to give an example of how things could work. Let's imagine Less Wrong had some kind of executive (as opposed to moderators who just keep everything in order). At the start of the year, they could create a thread asking about what goals they thought were important for Less Wrong - ie. Increasing the content in main, producing more content for a general audience, increasing female participation rate.
They would then have a Skype meeting to discuss the feedback and to debate which ones that wanted to primarily focus on. Suppose for example they decided they wanted to increase the content in main. They might solicit community feedback on what kinds of articles they'd like to see more of. They might contact people who wrote discussion posts that were main quality and suggest they submit some content there instead. They could come up with ideas of new kinds of contentLW might find useful (ie. Project management) and seed the site with content on that area to do that people understand that kind of content is desired.
These roles would take significant work, but I imagine people would be motivated to do this by altruism or status. By discussing ideas in person (instead of just over the internet), they there would be more of an opportunity to build a consensus and they would be able to make more progress towards addressing these issues.
If a group said that they thought A was an important issue and the solution was X, most members would pay more attention than if a random individual said it. No-one would have to listen to anything they say, but I imagine that many would choose to. Furthermore if the exec were all actively involved in the projects, I imagine they'd be able to complete some smaller ones themselves, or at least provide the initial push to get it going.
Leaving LessWrong for a more rational life
You are unlikely to see me posting here again, after today. There is a saying here that politics is the mind-killer. My heretical realization lately is that philosophy, as generally practiced, can also be mind-killing.
As many of you know I am, or was running a twice-monthly Rationality: AI to Zombies reading group. One of the bits I desired to include in each reading group post was a collection of contrasting views. To research such views I've found myself listening during my commute to talks given by other thinkers in the field, e.g. Nick Bostrom, Anders Sandberg, and Ray Kurzweil, and people I feel are doing “ideologically aligned” work, like Aubrey de Grey, Christine Peterson, and Robert Freitas. Some of these were talks I had seen before, or generally views I had been exposed to in the past. But looking through the lens of learning and applying rationality, I came to a surprising (to me) conclusion: it was philosophical thinkers that demonstrated the largest and most costly mistakes. On the other hand, de Grey and others who are primarily working on the scientific and/or engineering challenges of singularity and transhumanist technologies were far less likely to subject themselves to epistematic mistakes of significant consequences.
Philosophy as the anti-science...
What sort of mistakes? Most often reasoning by analogy. To cite a specific example, one of the core underlying assumption of singularity interpretation of super-intelligence is that just as a chimpanzee would be unable to predict what a human intelligence would do or how we would make decisions (aside: how would we know? Were any chimps consulted?), we would be equally inept in the face of a super-intelligence. This argument is, however, nonsense. The human capacity for abstract reasoning over mathematical models is in principle a fully general intelligent behaviour, as the scientific revolution has shown: there is no aspect of the natural world which has remained beyond the reach of human understanding, once a sufficient amount of evidence is available. The wave-particle duality of quantum physics, or the 11-dimensional space of string theory may defy human intuition, i.e. our built-in intelligence. But we have proven ourselves perfectly capable of understanding the logical implications of models which employ them. We may not be able to build intuition for how a super-intelligence thinks. Maybe—that's not proven either. But even if that is so, we will be able to reason about its intelligent behaviour in advance, just like string theorists are able to reason about 11-dimensional space-time without using their evolutionarily derived intuitions at all.
This post is not about the singularity nature of super-intelligence—that was merely my choice of an illustrative example of a category of mistakes that are too often made by those with a philosophical background rather than the empirical sciences: the reasoning by analogy instead of the building and analyzing of predictive models. The fundamental mistake here is that reasoning by analogy is not in itself a sufficient explanation for a natural phenomenon, because it says nothing about the context sensitivity or insensitivity of the original example and under what conditions it may or may not hold true in a different situation.
A successful physicist or biologist or computer engineer would have approached the problem differently. A core part of being successful in these areas is knowing when it is that you have insufficient information to draw conclusions. If you don't know what you don't know, then you can't know when you might be wrong. To be an effective rationalist, it is often not important to answer “what is the calculated probability of that outcome?” The better first question is “what is the uncertainty in my calculated probability of that outcome?” If the uncertainty is too high, then the data supports no conclusions. And the way you reduce uncertainty is that you build models for the domain in question and empirically test them.
The lens that sees its own flaws...
Coming back to LessWrong and the sequences. In the preface to Rationality, Eliezer Yudkowsky says his biggest regret is that he did not make the material in the sequences more practical. The problem is in fact deeper than that. The art of rationality is the art of truth seeking, and empiricism is part and parcel essential to truth seeking. There's lip service done to empiricism throughout, but in all the “applied” sequences relating to quantum physics and artificial intelligence it appears to be forgotten. We get instead definitive conclusions drawn from thought experiments only. It is perhaps not surprising that these sequences seem the most controversial.
I have for a long time been concerned that those sequences in particular promote some ungrounded conclusions. I had thought that while annoying this was perhaps a one-off mistake that was fixable. Recently I have realized that the underlying cause runs much deeper: what is taught by the sequences is a form of flawed truth-seeking (thought experiments favored over real world experiments) which inevitably results in errors, and the errors I take issue with in the sequences are merely examples of this phenomenon.
And these errors have consequences. Every single day, 100,000 people die of preventable causes, and every day we continue to risk extinction of the human race at unacceptably high odds. There is work that could be done now to alleviate both of these issues. But within the LessWrong community there is actually outright hostility to work that has a reasonable chance of alleviating suffering (e.g. artificial general intelligence applied to molecular manufacturing and life-science research) due to concerns arrived at by flawed reasoning.
I now regard the sequences as a memetic hazard, one which may at the end of the day be doing more harm than good. One should work to develop one's own rationality, but I now fear that the approach taken by the LessWrong community as a continuation of the sequences may result in more harm than good. The anti-humanitarian behaviors I observe in this community are not the result of initial conditions but the process itself.
What next?
How do we fix this? I don't know. On a personal level, I am no longer sure engagement with such a community is a net benefit. I expect this to be my last post to LessWrong. It may happen that I check back in from time to time, but for the most part I intend to try not to. I wish you all the best.
A note about effective altruism…
One shining light of goodness in this community is the focus on effective altruism—doing the most good to the most people as measured by some objective means. This is a noble goal, and the correct goal for a rationalist who wants to contribute to charity. Unfortunately it too has been poisoned by incorrect modes of thought.
Existential risk reduction, the argument goes, trumps all forms of charitable work because reducing the chance of extinction by even a small amount has far more expected utility than would accomplishing all other charitable works combined. The problem lies in the likelihood of extinction, and the actions selected in reducing existential risk. There is so much uncertainty regarding what we know, and so much uncertainty regarding what we don't know that it is impossible to determine with any accuracy the expected risk of, say, unfriendly artificial intelligence creating perpetual suboptimal outcomes, or what effect charitable work in the area (e.g. MIRI) is have to reduce that risk, if any.
This is best explored by an example of existential risk done right. Asteroid and cometary impacts is perhaps the category of external (not-human-caused) existential risk which we know the most about, and have done the most to mitigate. When it was recognized that impactors were a risk to be taken seriously, we recognized what we did not know about the phenomenon: what were the orbits and masses of Earth-crossing asteroids? We built telescopes to find out. What is the material composition of these objects? We built space probes and collected meteorite samples to find out. How damaging an impact would there be for various material properties, speeds, and incidence angles? We built high-speed projectile test ranges to find out. What could be done to change the course of an asteroid found to be on collision course? We have executed at least one impact probe and will monitor the effect that had on the comet's orbit, and have on the drawing board probes that will use gravitational mechanisms to move their target. In short, we identified what it is that we don't know and sought to resolve those uncertainties.
How then might one approach an existential risk like unfriendly artificial intelligence? By identifying what it is we don't know about the phenomenon, and seeking to experimentally resolve that uncertainty. What relevant facts do we not know about (unfriendly) artificial intelligence? Well, much of our uncertainty about the actions of an unfriendly AI could be resolved if we were to know more about how such agents construct their thought models, and relatedly what language were used to construct their goal systems. We could also stand to benefit from knowing more practical information (experimental data) about in what ways AI boxing works and in what ways it does not, and how much that is dependent on the structure of the AI itself. Thankfully there is an institution that is doing that kind of work: the Future of Life institute (not MIRI).
Where should I send my charitable donations?
Aubrey de Grey's SENS Research Foundation.
100% of my charitable donations are going to SENS. Why they do not get more play in the effective altruism community is beyond me.
If you feel you want to spread your money around, here are some non-profits which have I have vetted for doing reliable, evidence-based work on singularity technologies and existential risk:
- Robert Freitas and Ralph Merkle's Institute for Molecular Manufacturing does research on molecular nanotechnology. They are the only group that work on the long-term Drexlarian vision of molecular machines, and publish their research online.
- Future of Life Institute is the only existential-risk AI organization which is actually doing meaningful evidence-based research into artificial intelligence.
- B612 Foundation is a non-profit seeking to launch a spacecraft with the capability to detect, to the extent possible, ALL Earth-crossing asteroids.
I wish I could recommend a skepticism, empiricism, and rationality promoting institute. Unfortunately I am not aware of an organization which does not suffer from the flaws I identified above.
Addendum regarding unfinished business
I will no longer be running the Rationality: From AI to Zombies reading group as I am no longer in good conscience able or willing to host it, or participate in this site, even from my typically contrarian point of view. Nevertheless, I am enough of a libertarian that I feel it is not my role to put up roadblocks to others who wish to delve into the material as it is presented. So if someone wants to take over the role of organizing these reading groups, I would be happy to hand over the reigns to that person. If you think that person should be you, please leave a reply in another thread, not here.
EDIT: Obviously I'll stick around long enough to answer questions below :)
Is there a list of cognitive illusions?
After I posted my great idea that "Determinism Is Just A Special Case Of Randomness" because "if not I don't see how there could be free will in a deterministic universe" I was positively guided by the LW community to read the Free Will Sequence so I am learning more about our biases and how we build illusions like free will and randomness in our minds.
But I don't see a list on LW or Wikipedia of a list of cognitive illusions and I think it would be great to have one of those just as it is useful for many people to visit the List Of Cognitive Biases page as a study reference or even to use in day to day life.
I think these are some cognitive illusions that are normally discussed as such:
- Free will
- Randomness/probability
- Time
- Money
There must be many more, but I don't find a list with summaries and that would great (to help me avoid writing posts like my "great idea" above!).
EDIT: The majority of comments below are about questioning if they are illusions or not and if they should be called cognitive illusions.
I guess there is no list of cognitive illusions because there is no academic agreement about these issues like in cognitive biases which are generally accepted as such!
Thx for the comments!
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)