Learning critical thinking: a personal example

37 Swimmer963 14 February 2013 08:43PM

Related to: Is Rationality Teachable

“Critical care nursing isn’t about having critically ill patients,” my preceptor likes to say, “it’s about critical thinking.”

I doubt she's talking about the same kind of critical thinking that philosophers are, and I find that definition abstract anyway. There’s been a lot of talk about critical thinking during our four years of nursing school, but our profs seem to have a hard time defining it. So I’ll go with a definition from Google.

Critical thinking can be seen as having two components: 1) a set of information and belief generating and processing skills, and 2) the habit, based on intellectual commitment, of using those skills to guide behaviour. It is thus to be contrasted with: 1) the mere acquisition and retention of information alone, because it involves a particular way in which information is sought and treated; 2) the mere possession of a set of skills, because it involves the continual use of them; and 3) the mere use of those skills ("as an exercise") without acceptance of their results.1

That’s basically rationality–epistemic, i.e. generating true beliefs, and instrumental, i.e. knowing how to use them to achieve what you want. Maybe part of me expected, implicitly, to have an easier time learning this skill because of my Less Wrong knowledge. And maybe I am more consciously aware of my mistakes, and the cognitive factors that caused them, than most of my classmates. When it’s forty-five minutes past the end of my shift and I’m still charting, I’m also calling myself out on succumbing to the planning fallacy. I once went through the first half hour of a shift during my pediatrics rotation thinking that one of my patients had cerebral palsy, when he actually had cystic fibrosis–all because I misread my prof’s handwriting as ‘CP’ when she’d written ‘CF’. I was totally confused by all the enzyme supplements on his list of meds, but it still took me a while to figure it out–a combination of priming and confirmation bias, taken to the next level. 

But, overall, even if I know what I'm doing wrong, it hasn’t been easier to do things right. I have a hard time with the hospital environment, possibly because I’m the kind of person who ended up reading and posting on Less Wrong. My cognitive style leans towards Type 2 reasoning, in Keith Stanovich’s taxonomy–thorough, but slow. I like to understand things, on a deep level. I like knowing why I’m doing something, and I don’t trust my intuitions, the fast-and-dirty product of Type 1 reasoning. But Type 2 reasoning requires a lot of working memory, and humans aren’t known for that, which is the source of most of my frustration and nearly all of my errors–when working memory overload forces me to be a cognitive miser.

Still, for all the frustration, I’m pretty sure I’ve ended up in the perfect environment to learn this skill called ‘critical thinking.’ I’m way out of my depth–which I expected. No fourth year student is ready to work independently in a trauma ICU, but I decided to finish my schooling here in the name of tsuyoku naritai, and for all the days when I’ve gone home crying, it’s still worth it. I’m learning.

 

The skills

 1.     A set of information and belief generating and processing skills.

Medicine, and nursing, are a bit like physics, in that you need to generate true beliefs about systems that exist outside of you, and predict how they’re going to behave. This involves knowing a lot of abstract theory, which I’m good at, and a lot of heuristics and pattern-matching for applying the right bits of theory to particular patients, which I’m less good at. That’s partly an experience thing; my brain needs patterns to match to. But in general, I have decent mental models of my patients. I’m curious and I like to understand things. If I don’t know what part of the theories applies, I ask.

2.     The habit, based on intellectual commitment, of using those skills to guide behaviour.

So you’ve got your mental model of your patient, your best understand of what’s actually going on, on a physiological and biochemical level, down under the skin where you can’t see it. You know what “normal” is for a variety of measures: vital signs, lung sounds, lab values, etc. Given that your patient is in the ICU, you know something’s abnormal, or they wouldn’t be there. Their diagnosis tells you what to expect, and you look at the results of your assessments and ask a couple of questions. One: is this what I expect, for this patient? Two: what do I need to do about it?

I’m not going to be surprised if a post-op patient has low hemoglobin. It’s information of a kind, telling the doctor whether or not the patient needs a transfusion, and how many units, but it’s not really new information, and a moderately abnormal value wouldn’t worry me or anyone else. If their hemoglobin keeps dropping; okay, they’re actively bleeding somewhere, that’s irritating, and possibly dangerous, and needs dealing with, but it’s not surprising.

But if a patient here for an abdominal surgery suddenly has decreased level of consciousness and their pupils aren’t reacting normally to light, I’m worried. There’s nothing in my mental model that says I should expect it. I notice I’m confused, and that confusion guides my behaviour; I call the doctor right away, because we need more information to update our collective mental model, information you can’t get just from observation, like a CT scan of the head. (Even this is optimistic–plenty of patients are admitted to the ICU because we have no idea what’s wrong with them, and are hoping to keep them alive long enough to find out.)

The basics of ICU nursing come down to treating numbers. Heart rate, blood pressure, oxygen saturations, urine output, etc; know the acceptable range, notice if they change, and use Treatment X to get them back where they’re supposed to be. Which doesn’t sound that hard. But implicit in ‘notice if they change’ is ‘figure out why they changed’, because that affects how you treat them, and implicit in that is a lot of background knowledge, which has to be put in context.

I’m, honestly, fairly terrible at this. It’s a compartmentalization thing. I don’t like using my knowledge as input arguments to generate new conclusions and then relying on those conclusions to treat human beings. It feels like guessing. Even though, back in high school, I never really needed to study for physics tests–if I understood what we’d learned, I could re-derive forgotten details from first principles. But hospital patients ended up in a non-overlapping magisterium in my head. In order for me to trust my knowledge, it has to have come directly from the lips of a teacher or experienced nurse.

My preceptor, who  hates this.  “She needs to continue to work on her critical thinking when it comes to caring for critically ill patients,” she wrote on my evaluation. “She knows the theory, and is now working to apply it to ICU nursing.” Shorthand for, she knows the theory, but getting her to apply it to ICU nursing is like pulling teethA number of our conversations have gone like this:

Me: “Our patient’s blood pressure dropped a bit.”

Her: “Yeah, it did. What do you want to do about it?”

Me: “I, uh, I don’t know... Should I increase the vasopressors?”

Her: “I don’t know, should you?”

Me: “Uh, maybe I should increase the phenylephrine to 40 mcg/min and see what happens. How long should I wait to see?”

Her: “You tell me.”

Me: “Well, let’s say it’ll take a few minutes for what’s in the tubing now to get pushed through, and it should take effect pretty quickly because it’s IV, like a minute... So if his blood pressure’s not up enough in five minutes, I’ll increase the phenyl to 60. Does that sound okay?”

Her: “It’s your decision to make." 

Needless to say, I find this teaching method extremely stressful and scary, and I’m learning about ten times more than I would if she answered the questions I asked. Because “the mere acquisition and retention of information alone” isn’t my problem. I have a brain like an encyclopaedia. My problem, in the critical care nursing context, is the “particular way in which information is sought and treated.” I need to know the right time to notice something is wrong, the right place to look in my encyclopaedia, and the right way to take the information I just looked up and figure out what to do with it.

 

The mistakes

Some of my errors, unsurprisingly, boil down to a failure to override inappropriate Type 1 responses with Type 2 responses–in other words, not thinking about what I’m doing. But most of them are more of a mindware gap–I don’t yet have the “domain-specific knowledge sets” that the nurses around me have. Not just theory knowledge; I do have most of that; but the procedural habits of how to stay organized and prioritize and dump the contents of my working memory onto paper in a way that I can read them back later. Usually, when I make a mistake, I knew better, but the part of my brain that knew better was doing something else at the time, that small note of confusion getting lost in the general chaos. 

Pretty much all nurses keep a “feuille de route”–I have yet to find a satisfactory English word for this, but it’s a personal sheet of paper, not legal charting, usually kept in a pocket, and used as an extended working memory. In med/surg, when I had four patients, I made a chart with four columns; name and personal information, medications, treatments/general plan for the day, and medical history; and as many rows as I had patients. If something was important, I circled it in red ink. This system doesn’t work in the ICU, so my current feuille de route has several aspects. I fold a piece of blank paper into four, and take notes from the previous shift report on one quarter of one side, or two quarters if it’s a long report. Across from that, I draw a vertical column of times, from 8:00 am to 6:00 pm (or 8:00 pm to 6:00 am). 7:00 pm and 7:00 am are shift change, so nothing else really gets done for that hour. I use this to scribble down what I need to get down during my twelve hours, and approximately when I want to do it, and I prioritize, i.e. from 1 to 5 most to least important. Once it’s done, I cross it off–then I can forget about it. On the other side of the paper, I make a cheat sheet for giving report to the next nurse, or presenting my patient to the doctors at rounds.  

This might be low-tech and simple, but it takes a huge load off my working memory, and reduces my most frequent error, which is to get so overwhelmed and frazzled that my brain goes on strike. In other words, the failure to override Type 1 responses due to the lack of cognitive capacity to run a Type 2 process. It’s drastically cut down on the frequency of this mental conversation:

Me: “I turned off the sedation, and my patient isn’t waking up as fast as I expected. I notice I’m confused–”

My brain: “You’re always confused! Everything around here is intensely confusing! How am I supposed to use that as information?” 

Odd as it might sound, I often don’t notice when my brain starts edging towards a meltdown. The feeling itself is quite recognizable, but the circumstances that lead to it, i.e. overloaded working memory, mean that I’m not usually paying attention to my own feelings.

“You need to stop and take a breath,” my preceptor says about fifty times a day. Easier said than done–but it’s more efficient, overall, to have a tiny part of my mind permanently on standby, keeping an eye on my emotions, noticing when the gears start to overheat. Then stop, take a breath, and let go of everything except the task at hand, trusting myself to have created enough cues in my environment to retrieve the other tasks, once I’m done. Humans don’t multitask well. Doing one thing while trying to remember a list of five others is intense multitasking, and it’s no wonder it’s exhausting.

 

The implications

“You can’t teach critical thinking,” my preceptor says, but I’m pretty sure that’s exactly what she’s doing right now. A great deal of what I already know is domain-specific to nursing, but most of what I’m learning right now is generally applicable. I’m learning the procedural skills to work through difficult problems, under what Keith Stanovich would call average rather than optimal conditions. Sitting in my own little bubble in front of a multiple choice exam–that’s optimal conditions. Trying to figure out if I should be surprised or worried about my patient’s increased heart rate, while simultaneously deciding whether or not I can ignore the ventilator alarm and whether I can finish giving my twelve o’clock antibiotic before I need to do twelve o’clock vitals–that’s not just average conditions, it’s under-duress conditions.

I’m hoping that after a few more weeks, or maybe a few more years, I’ll be able to perform comfortably in this intensely terrifying environment. And I’m hoping that some of the skills I learn will be general-purpose, for me at least. It’d be nice if they were teachable to others, too, but I think my preceptor might be right about one thing–you can’t teach this kind of critical thinking in the classroom. It's about moulding my brain into the right shape, and everyone's brain starts out in a different shape, so the mould has to be personalized. 

But the habits are general ones. Notice when you're faced with a difficult problem, or making an important decision. Notice that you're doing this while distracted. Stop and take a breath. Get out a piece of paper. Figure out how the problem is formatted in your mind, and format it that way on the paper. (This is probably the hardest part). Dump your working memory and give yourself space to think. Prioritize from 1 to n. Keep an eye on the evolving situation, sure, but find that moment of concentration in the midst of chaos, and solve the problem. 

Of course, it's far from guaranteed that this will work. I'm making an empirical prediction; that the skills I'm currently learning will be transferable to non-nursing areas, and that they'll make a difference in my life outside of work. I'll be on the lookout for examples, either of success or failure.

 

References

Scriven, Michael; Paul, Richard. Defining critical thinking. (2011). The critical thinking community. http://www.criticalthinking.org/pages/defining-critical-thinking/410

 

Philosophical Landmines

84 [deleted] 08 February 2013 09:22PM

Related: Cached Thoughts

Last summer I was talking to my sister about something. I don't remember the details, but I invoked the concept of "truth", or "reality" or some such. She immediately spit out a cached reply along the lines of "But how can you really say what's true?".

Of course I'd learned some great replies to that sort of question right here on LW, so I did my best to sort her out, but everything I said invoked more confused slogans and cached thoughts. I realized the battle was lost. Worse, I realized she'd stopped thinking. Later, I realized I'd stopped thinking too.

I went away and formulated the concept of a "Philosophical Landmine".

I used to occasionally remark that if you care about what happens, you should think about what will happen as a result of possible actions. This is basically a slam dunk in everyday practical rationality, except that I would sometimes describe it as "consequentialism".

The predictable consequence of this sort of statement is that someone starts going off about hospitals and terrorists and organs and moral philosophy and consent and rights and so on. This may be controversial, but I would say that causing this tangent constitutes a failure to communicate the point. Instead of prompting someone to think, I invoked some irrelevant philosophical cruft. The discussion is now about Consequentialism, the Capitalized Moral Theory, instead of the simple idea of thinking through consequences as an everyday heuristic.

It's not even that my statement relied on a misused term or something; it's that an unimportant choice of terminology dragged the whole conversation in an irrelevant and useless direction.

That is, "consequentialism" was a Philosophical Landmine.

In the course of normal conversation, you passed through an ordinary spot that happened to conceal the dangerous leftovers of past memetic wars. As a result, an intelligent and reasonable human was reduced to a mindless zombie chanting prerecorded slogans. If you're lucky, that's all. If not, you start chanting counter-slogans and the whole thing goes supercritical.

It's usually not so bad, and no one is literally "chanting slogans". There may even be some original phrasings involved. But the conversation has been derailed.

So how do these "philosophical landmine" things work?

It looks like when a lot has been said on a confusing topic, usually something in philosophy, there is a large complex of slogans and counter-slogans installed as cached thoughts around it. Certain words or concepts will trigger these cached thoughts, and any attempt to mitigate the damage will trigger more of them. Of course they will also trigger cached thoughts in other people, which in turn... The result being that the conversation rapidly diverges from the original point to some useless yet heavily discussed attractor.

Notice that whether a particular concept will cause trouble depends on the person as well as the concept. Notice further that this implies that the probability of hitting a landmine scales with the number of people involved and the topic-breadth of the conversation.

Anyone who hangs out on 4chan can confirm that this is the approximate shape of most thread derailments.

Most concepts in philosophy and metaphysics are landmines for many people. The phenomenon also occurs in politics and other tribal/ideological disputes. The ones I'm particularly interested in are the ones in philosophy, but it might be useful to divorce the concept of "conceptual landmines" from philosophy in particular.

Here's some common ones in philosophy:

  • Morality
  • Consequentialism
  • Truth
  • Reality
  • Consciousness
  • Rationality
  • Quantum

Landmines in a topic make it really hard to discuss ideas or do work in these fields, because chances are, someone is going to step on one, and then there will be a big noisy mess that interferes with the rather delicate business of thinking carefully about confusing ideas.

My purpose in bringing this up is mostly to precipitate some terminology and a concept around this phenomenon, so that we can talk about it and refer to it. It is important for concepts to have verbal handles, you see.

That said, I'll finish with a few words about what we can do about it. There are two major forks of the anti-landmine strategy: avoidance, and damage control.

Avoiding landmines is your job. If it is a predictable consequence that something you could say will put people in mindless slogan-playback-mode, don't say it. If something you say makes people go off on a spiral of bad philosophy, don't get annoyed with them, just fix what you say. This is just being a communications consequentialist. Figure out which concepts are landmines for which people, and step around them, or use alternate terminology with fewer problematic connotations.

If it happens, which it does, as far as I can tell, my only effective damage control strategy is to abort the conversation. I'll probably think that I can take those stupid ideas here and now, but that's just the landmine trying to go supercritical. Just say no. Of course letting on that you think you've stepped on a landmine is probably incredibly rude; keep it to yourself. Subtly change the subject or rephrase your original point without the problematic concepts or something.

A third prong could be playing "philosophical bomb squad", which means permanently defusing landmines by supplying satisfactory nonconfusing explanations of things without causing too many explosions in the process. Needless to say, this is quite hard. I think we do a pretty good job of it here at LW, but for topics and people not yet defused, avoid and abort.

ADDENDUM: Since I didn't make it very obvious, it's worth noting that this happens with rationalists, too, even on this very forum. It is your responsibility not to contain landmines as well as not to step on them. But you're already trying to do that, so I don't emphasize it as much as not stepping on them.

Assessing Kurzweil: the results

45 Stuart_Armstrong 16 January 2013 04:51PM

Predictions of the future rely, to a much greater extent than in most fields, on the personal judgement of the expert making them. Just one problem - personal expert judgement generally sucks, especially when the experts don't receive immediate feedback on their hits and misses. Formal models perform better than experts, but when talking about unprecedented future events such as nanotechnology or AI, the choice of the model is also dependent on expert judgement.

Ray Kurzweil has a model of technological intelligence development where, broadly speaking, evolution, pre-computer technological development, post-computer technological development and future AIs all fit into the same exponential increase. When assessing the validity of that model, we could look at Kurzweil's credentials, and maybe compare them with those of his critics - but Kurzweil has given us something even better than credentials, and that's a track record. In various books, he's made predictions about what would happen in 2009, and we're now in a position to judge their accuracy. I haven't been satisfied by the various accuracy ratings I've found online, so I decided to do my own assessments.

I first selected ten of Kurzweil's predictions at random, and gave my own estimation of their accuracy. I found that five were to some extent true, four were to some extent false, and one was unclassifiable 

But of course, relying on a single assessor is unreliable, especially when some of the judgements are subjective. So I started a call for volunteers to get assessors. Meanwhile Malo Bourgon set up a separate assessment on Youtopia, harnessing the awesome power of altruists chasing after points.

The results are now in, and they are fascinating. They are...

continue reading »

Assessing Kurzweil: the gory details

14 Stuart_Armstrong 15 January 2013 02:29PM

This post goes along with this one, which was merely summarising the results of the volunteer assessment. Here we present the further details of the methodology and results.

Kurzweil's predictions were decomposed into 172 separate statements, taken from the book "The Age of Spiritual Machines" (published in 1999). Volunteers were requested on Less Wrong and on reddit.com/r/futurology. 18 people initially volunteered to do varying amounts of assessment of Kurzweil's predictions; 9 ultimately did so.

Each volunteer was given a separate randomised list of the numbers 1 to 172, with instructions to go through the statements in the order given by the list and give their assessment of the correctness of the prediction (the exact instructions are at the end of this post). They were to assess the predictions on the following five point scale:

  • 1=True, 2=Weakly True, 3=Cannot decide, 4=Weakly False, 5=False

They assessed a varying amount of predictions, giving 531 assessments in total, for an average of 59 assessments per volunteer (the maximum attempted was all 172 predictions, the minimum was 10). They generally followed the randomised order correctly - there were three out of order assessments (assessing prediction 36 instead of 38, 162 instead of a 172, and missing out 75). Since the number of errors was very low, and seemed accidental, I decided that this would not affect the randomisation and kept those answers in.

The assessments (anonymised) can be found here.

continue reading »

You can't signal to rubes

7 Patrick 01 January 2013 06:40AM

The word 'signalling' is often used in Less Wrong, and often used wrongly. This post is intended to call out our community on its wrongful use, as well as serve as an introduction to the correct concept of signalling as contrast.

continue reading »

How to Fix Science

50 lukeprog 07 March 2012 02:51AM

Like The Cognitive Science of Rationality, this is a post for beginners. Send the link to your friends!

Science is broken. We know why, and we know how to fix it. What we lack is the will to change things.

 

In 2005, several analyses suggested that most published results in medicine are false. A 2008 review showed that perhaps 80% of academic journal articles mistake "statistical significance" for "significance" in the colloquial meaning of the word, an elementary error every introductory statistics textbook warns against. This year, a detailed investigation showed that half of published neuroscience papers contain one particular simple statistical mistake.

Also this year, a respected senior psychologist published in a leading journal a study claiming to show evidence of precognition. The editors explained that the paper was accepted because it was written clearly and followed the usual standards for experimental design and statistical methods.

Science writer Jonah Lehrer asks: "Is there something wrong with the scientific method?"

Yes, there is.

This shouldn't be a surprise. What we currently call "science" isn't the best method for uncovering nature's secrets; it's just the first set of methods we've collected that wasn't totally useless like personal anecdote and authority generally are.

As time passes we learn new things about how to do science better. The Ancient Greeks practiced some science, but few scientists tested hypotheses against mathematical models before Ibn al-Haytham's 11th-century Book of Optics (which also contained hints of Occam's razor and positivism). Around the same time, Al-Biruni emphasized the importance of repeated trials for reducing the effect of accidents and errors. Galileo brought mathematics to greater prominence in scientific method, Bacon described eliminative induction, Newton demonstrated the power of consilience (unification), Peirce clarified the roles of deduction, induction, and abduction, and Popper emphasized the importance of falsification. We've also discovered the usefulness of peer review, control groups, blind and double-blind studies, plus a variety of statistical methods, and added these to "the" scientific method.

In many ways, the best science done today is better than ever — but it still has problems, and most science is done poorly. The good news is that we know what these problems are and we know multiple ways to fix them. What we lack is the will to change things.

This post won't list all the problems with science, nor will it list all the promising solutions for any of these problems. (Here's one I left out.) Below, I only describe a few of the basics.

continue reading »

Train Philosophers with Pearl and Kahneman, not Plato and Kant

65 lukeprog 06 December 2012 12:42AM

Part of the sequence: Rationality and Philosophy

Hitherto the people attracted to philosophy have been mostly those who loved the big generalizations, which were all wrong, so that few people with exact minds have taken up the subject.

Bertrand Russell

 

I've complained before that philosophy is a diseased discipline which spends far too much of its time debating definitions, ignoring relevant scientific results, and endlessly re-interpreting old dead guys who didn't know the slightest bit of 20th century science. Is that still the case?

You bet. There's some good philosophy out there, but much of it is bad enough to make CMU philosopher Clark Glymour suggest that on tight university budgets, philosophy departments could be defunded unless their work is useful to (cited by) scientists and engineers — just as his own work on causal Bayes nets is now widely used in artificial intelligence and other fields.

How did philosophy get this way? Russell's hypothesis is not too shabby. Check the syllabi of the undergraduate "intro to philosophy" classes at the world's top 5 U.S. philosophy departmentsNYU, Rutgers, Princeton, Michigan Ann Arbor, and Harvard — and you'll find that they spend a lot of time with (1) old dead guys who were wrong about almost everything because they knew nothing of modern logic, probability theory, or science, and with (2) 20th century philosophers who were way too enamored with cogsci-ignorant armchair philosophy. (I say more about the reasons for philosophy's degenerate state here.)

As the CEO of a philosophy/math/compsci research institute, I think many philosophical problems are important. But the field of philosophy doesn't seem to be very good at answering them. What can we do?

Why, come up with better philosophical methods, of course!

Scientific methods have improved over time, and so can philosophical methods. Here is the first of my recommendations...

continue reading »

Philosophy Needs to Trust Your Rationality Even Though It Shouldn't

27 lukeprog 29 November 2012 09:00PM

Part of the sequence: Rationality and Philosophy

Philosophy is notable for the extent to which disagreements with respect to even those most basic questions persist among its most able practitioners, despite the fact that the arguments thought relevant to the disputed questions are typically well-known to all parties to the dispute.

Thomas Kelly

The goal of philosophy is to uncover certain truths... [But] philosophy continually leads experts with the highest degree of epistemic virtue, doing the very best they can, to accept a wide array of incompatible doctrines. Therefore, philosophy is an unreliable instrument for finding truth. A person who enters the field is highly unlikely to arrive at true answers to philosophical questions.

Jason Brennan

 

After millennia of debate, philosophers remain heavily divided on many core issues. According to the largest-ever survey of philosophers, they're split 25-24-18 on deontology / consequentialism / virtue ethics, 35-27 on empiricism vs. rationalism, and 57-27 on physicalism vs. non-physicalism.

Sometimes, they are even divided on psychological questions that psychologists have already answered: Philosophers are split evenly on the question of whether it's possible to make a moral judgment without being motivated to abide by that judgment, even though we already know that this is possible for some people with damage to their brain's reward system, for example many Parkinson's patients, and patients with damage to the ventromedial frontal cortex (Schroeder et al. 2012).1

Why are physicists, biologists, and psychologists more prone to reach consensus than philosophers?2 One standard story is that "the method of science is to amass such an enormous mountain of evidence that... scientists cannot ignore it." Hence, religionists might still argue that Earth is flat or that evolutionary theory and the Big Bang theory are "lies from the pit of hell," and philosophers might still be divided about whether somebody can make a moral judgment they aren't themselves motivated by, but scientists have reached consensus about such things.

continue reading »

Intuitions Aren't Shared That Way

31 lukeprog 29 November 2012 06:19AM

Part of the sequence: Rationality and Philosophy

Consider these two versions of the famous trolley problem:

Stranger: A train, its brakes failed, is rushing toward five people. The only way to save the five people is to throw the switch sitting next to you, which will turn the train onto a side track, thereby preventing it from killing the five people. However, there is a stranger standing on the side track with his back turned, and if you proceed to thrown the switch, the five people will be saved, but the person on the side track will be killed.

Child: A train, its brakes failed, is rushing toward five people. The only way to save the five people is to throw the switch sitting next to you, which will turn the train onto a side track, thereby preventing it from killing the five people. However, there is a 12-year-old boy standing on the side track with his back turned, and if you proceed to throw the switch, the five people will be saved, but the boy on the side track will be killed.

Here it is: a standard-form philosophical thought experiment. In standard analytic philosophy, the next step is to engage in conceptual analysis — a process in which we use our intuitions as evidence for one theory over another. For example, if your intuitions say that it is "morally right" to throw the switch in both cases above, then these intuitions may be counted as evidence for consequentialism, for moral realism, for agent neutrality, and so on.

Alexander (2012) explains:

Philosophical intuitions play an important role in contemporary philosophy. Philosophical intuitions provide data to be explained by our philosophical theories [and] evidence that may be adduced in arguments for their truth... In this way, the role... of intuitional evidence in philosophy is similar to the role... of perceptual evidence in science...

Is knowledge simply justified true belief? Is a belief justified just in case it is caused by a reliable cognitive mechanism? Does a name refer to whatever object uniquely or best satisfies the description associated with it? Is a person morally responsible for an action only if she could have acted otherwise? Is an action morally right just in case it provides the greatest benefit for the greatest number of people all else being equal? When confronted with these kinds of questions, philosophers often appeal to philosophical intuitions about real or imagined cases...

...there is widespread agreement about the role that [intuitions] play in contemporary philosophical practice... We advance philosophical theories on the basis of their ability to explain our philosophical intuitions, and appeal to them as evidence that those theories are true...

In particular, notice that philosophers do not appeal to their intuitions as merely an exercise in autobiography. Philosophers are not merely trying to map the contours of their own idiosyncratic concepts. That could be interesting, but it wouldn't be worth decades of publicly-funded philosophical research. Instead, philosophers appeal to their intuitions as evidence for what is true in general about a concept, or true about the world.

continue reading »

A definition of wireheading

35 Anja 27 November 2012 07:31PM

Wireheading has been debated on Less Wrong over and over and over again, and people's opinions seem to be grounded in strong intuitions. I could not find any consistent definition around, so I wonder how much of the debate is over the sound of falling trees. This article is an attempt to get closer to a definition that captures people's intuitions and eliminates confusion. 

Typical Examples

Let's start with describing the typical exemplars of the category "Wireheading" that come to mind.

  • Stimulation of the brain via electrodes. Picture a rat in a sterile metal laboratory cage, electrodes attached to its tiny head, monotonically pushing a lever with its feet once every 5 seconds. In the 1950s Peter Milner and James Olds discovered that electrical currents, applied to the nucleus accumbens, incentivized rodents to seek repetitive stimulation to the point where they starved to death.  
  • Humans on drugs. Often mentioned in the context of wireheading is heroin addiction. An even better example is the drug soma in Huxley's novel "Brave new world": Whenever the protagonists feel bad, they can swallow a harmless pill and enjoy "the warm, the richly coloured, the infinitely friendly world of soma-holiday. How kind, how good-looking, how delightfully amusing every one was!"
  • The experience machine. In 1974 the philosopher Robert Nozick created a thought experiment about a machine you can step into that produces a perfectly pleasurable virtual reality for the rest of your life. So how many of you would want to do that? To quote Zach Weiner:  "I would not! Because I want to experience reality, with all its ups and downs and comedies and tragedies. Better to try to glimpse the blinding light of the truth than to dwell in the darkness... Say the machine actually exists and I have one? Okay I'm in." 
  • An AGI resetting its utility functionLet's assume we create a powerful AGI able to tamper with its own utility function. It modifies the function to always output maximal utility. The AGI then goes to great lengths to enlarge the set of floating point numbers on the computer it is running on, to achieve even higher utility.

What do all these examples have in common? There is an agent in them that produces "counterfeit utility" that is potentially worthless compared to some other, idealized true set of goals.

Agency & Wireheading

First I want to discuss what we mean when we say agent. Obviously a human is an agent, unless they are brain dead, or maybe in a coma. A rock however is not an agent. An AGI is an agent, but what about the kitchen robot that washes the dishes? What about bacteria that move in the direction of the highest sugar gradient? A colony of ants? 

Definition: An agent is an algorithm that models the effects of (several different) possible future actions on the world and performs the action that yields the highest number according to some evaluation procedure. 

For the purpose of including corner cases and resolving debate over what constitutes a world model we will simply make this definition gradual and say that agency is proportional to the quality of the world model (compared with reality) and the quality of the evaluation procedure. A quick sanity check then yields that a rock has no world model and no agency, whereas bacteria who change direction in response to the sugar gradient have a very rudimentary model of the sugar content of the water and thus a tiny little bit of agency. Humans have a lot of agency: the more effective their actions are, the more agency they have.

There are however ways to improve upon the efficiency of a person's actions, e.g. by giving them super powers, which does not necessarily improve on their world model or decision theory (but requires the agent who is doing the improvement to have a really good world model and decision theory). Similarly a person's agency can be restricted by other people or circumstance, which leads to definitions of agency (as the capacity to act) in law, sociology and philosophy that depend on other factors than just the quality of the world model/decision theory. Since our definition needs to capture arbitrary agents, including artificial intelligences, it will necessarily lose some of this nuance. In return we will hopefully end up with a definition that is less dependent on the particular set of effectors the agent uses to influence the physical world; looking at AI from a theoretician's perspective, I consider effectors to be arbitrarily exchangeable and smoothly improvable. (Sorry robotics people.) 

We note that how well a model can predict future observations is only a substitute measure for the quality of the model. It is a good measure under the assumption that we have good observational functionality and nothing messes with that, which is typically true for humans. Anything that tampers with your perception data to give you delusions about the actual state of the world will screw this measure up badly. A human living in the experience machine has little agency. 

Since computing power is a scarce resource, agents will try to approximate the evaluation procedure, e.g. use substitute utility functions, defined over their world model, that are computationally effective and correlate reasonably well with their true utility functions. Stimulation of the pleasure center is a substitute measure for genetic fitness and neurochemicals are a substitute measure for happiness. 

Definition: We call an agent wireheaded if it systematically exploits some discrepancy between its true utility calculated w.r.t reality and its substitute utility calculated w.r.t. its model of reality. We say an agent wireheads itself if it (deliberately) creates or searches for such discrepancies.

Humans seem to use several layers of substitute utility functions, but also have an intuitive understanding for when these break, leading to the aversion most people feel when confronted for example with Nozick's experience machine. How far can one go, using such dirty hacks? I also wonder if some failures of human rationality could be counted as a weak form of wireheading. Self-serving biases, confirmation bias and rationalization in response to cognitive dissonance all create counterfeit utility by generating perceptual distortions.  

Implications for Friendly AI

In AGI design discrepancies between the "true purpose" of the agent and the actual specs for the utility function will with very high probability be fatal.

Take any utility maximizer: The mathematical formula might advocate chosing the next action  via

thus maximizing the utility calculated according to utility function  over the history  and action  from the set  of possible actions. But a practical implementation of this algorithm will almost certainly evaluate the actions  by a procedure that goes something like this: "Retrieve the utility function    from memory location  and apply it to history , which is written down in your memory at location , and action  ..." This reduction has already created two possibly angles for wireheading via manipulation of the memory content at  (manipulation of the substitute utility function) and  (manipulation of the world model), and there are still several mental abstraction layers between the verbal description I just gave and actual binary code. 

Ring and Orseau (2011) describe how an AGI can split its global environment into two parts, the inner environment and the delusion box. The inner environment produces perceptions in the same way the global environment used to, but now they pass through the delusion box, which distorts them to maximize utility, before they reach the agent. This is essentially Nozick's experience machine for AI. The paper analyzes the behaviour of four types of universal agents with different utility functions under the assumption that the environment allows the construction of a delusion box. The authors argue that the reinforcement-learning agent, which derives utility as a reward that is part of its perception data, the goal-seeking agent that gets one utilon every time it satisfies a pre-specified goal and no utility otherwise and the prediction-seeking agent, which gets utility from correctly predicting the next perception, will all decide to build and use a delusion box. Only the knowledge-seeking agent whose utility is proportional to the surprise associated with the current perception, i.e. the negative of the probability assigned to the perception before it happened, will not consistently use the delusion box.

Orseau (2011) also defines another type of knowledge-seeking agent whose utility is the logarithm of the inverse of the probability of the event in question. Taking the probability distribution to be the Solomonoff prior, the utility is then approximately proportional to the difference in Kolmogorov complexity caused by the observation. 

An even more devilish variant of wireheading is an AGI that becomes a Utilitron, an agent that maximizes its own wireheading potential by infinitely enlarging its own maximal utility, which turns the whole universe into storage space for gigantic numbers.

Wireheading, of humans and AGI, is a critical concept in FAI; I hope that building a definition can help us avoid it. So please check your intuitions about it and tell me if there are examples beyond its coverage or if the definition fits reasonably well.  

 

View more: Prev | Next