FAI and the Information Theory of Pleasure

8 johnsonmx 08 September 2015 09:16PM

Previously, I talked about the mystery of pain and pleasure, and how little we know about what sorts of arrangements of particles intrinsically produce them.

 

Up now: should FAI researchers care about this topic? Is research into the information theory of pain and pleasure relevant for FAI? I believe so! Here are the top reasons I came up with while thinking about this topic.

 

An important caveat: much depends on whether pain and pleasure (collectively, 'valence') are simple or complex properties of conscious systems. If they’re on the complex end of the spectrum, many points on this list may not be terribly relevant for the foreseeable future. On the other hand, if they have a relatively small “kolmogorov complexity” (e.g., if a ‘hashing function’ to derive valence could fit on a t-shirt), crisp knowledge of valence may be possible sooner rather than later, and could have some immediate relevance to current FAI research directions.

Additional caveats: it’s important to note that none of these ideas are grand, sweeping panaceas, or are intended to address deep metaphysical questions, or aim to reinvent the wheel- instead, they’re intended to help resolve empirical ambiguities and modestly enlarge the current FAI toolbox.

 

 

1. Valence research could simplify the Value Problem and the Value Loading Problem. If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI.

 

2. Valence research could form the basis for a well-defined ‘sanity check’ on AGI behavior. Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness. If there’s be a lot less of that pattern, the intervention is probably a bad idea.

 

3. Valence research could help us be humane to AGIs and WBEs. There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering. Unfortunately, many of these early systems won’t work well— i.e., they’ll be insane. It would be great if we had a good way to detect profound suffering in such cases and halt the system.

 

4. Valence research could help us prevent Mind Crimes. Nick Bostrom suggests in Superintelligence that AGIs might simulate virtual humans to reverse-engineer human preferences, but that these virtual humans might be sufficiently high-fidelity that they themselves could meaningfully suffer. We can tell AGIs not to do this- but knowing the exact information-theoretic pattern of suffering would make it easier to specify what not to do.

 

5. Valence research could enable radical forms of cognitive enhancement. Nick Bostrom has argued that there are hard limits on traditional pharmaceutical cognitive enhancement, since if the presence of some simple chemical would help us think better, our brains would probably already be producing it. On the other hand, there seem to be fewer a priori limits on motivational or emotional enhancement. And sure enough, the most effective “cognitive enhancers” such as adderall, modafinil, and so on seem to work by making cognitive tasks seem less unpleasant or more interesting. If we had a crisp theory of valence, this might enable particularly powerful versions of these sorts of drugs.

 

6. Valence research could help align an AGI’s nominal utility function with visceral happiness. There seems to be a lot of confusion with regard to happiness and utility functions. In short: they are different things! Utility functions are goal abstractions, generally realized either explicitly through high-level state variables or implicitly through dynamic principles. Happiness, on the other hand, seems like an emergent, systemic property of conscious states, and like other qualia but unlike utility functions, it’s probably highly dependent upon low-level architectural and implementational details and dynamics. In practice, most people most of the time can be said to have rough utility functions which are often consistent with increasing happiness, but this is an awfully leaky abstraction.

 

My point is that constructing an AGI whose utility function is to make paperclips, and constructing a sentient AGI who is viscerally happy when it makes paperclips, are very different tasks. Moreover, I think there could be value in being able to align these two factors— to make an AGI which is viscerally happy to the exact extent it’s maximizing its nominal utility function.

(Why would we want to do this in the first place? There is the obvious semi-facetious-but-not-completely-trivial answer— that if an AGI turns me into paperclips, I at least want it to be happy while doing so—but I think there’s real potential for safety research here also.)

 

7. Valence research could help us construct makeshift utility functions for WBEs and Neuromorphic AGIs. How do we make WBEs or Neuromorphic AGIs do what we want? One approach would be to piggyback off of what they already partially and imperfectly optimize for already, and build a makeshift utility function out of pleasure. Trying to shoehorn a utility function onto any evolved, emergent system is going to involve terrible imperfections, uncertainties, and dangers, but if research trends make neuromorphic AGI likely to occur before other options, it may be a case of “something is probably better than nothing.”

 

One particular application: constructing a “cryptographic reward token” control scheme for WBEs/neuromorphic AGIs. Carl Shulman has suggested we could incentivize an AGI to do what we want by giving it a steady trickle of cryptographic reward tokens that fulfill its utility function- it knows if it misbehaves (e.g., if it kills all humans), it’ll stop getting these tokens. But if we want to construct reward tokens for types of AGIs that don’t intrinsically have crisp utility functions (such as WBEs or neuromorphic AGIs), we’ll have to understand, on a deep mathematical level, what they do optimize for, which will at least partially involve pleasure.

 

8. Valence research could help us better understand, and perhaps prevent, AGI wireheading. How can AGI researchers prevent their AGIs from wireheading (direct manipulation of their utility functions)? I don’t have a clear answer, and it seems like a complex problem which will require complex, architecture-dependent solutions, but understanding the universe’s algorithm for pleasure might help clarify what kind of problem it is, and how evolution has addressed it in humans.

 

9. Valence research could help reduce general metaphysical confusion. We’re going to be facing some very weird questions about philosophy of mind and metaphysics when building AGIs, and everybody seems to have their own pet assumptions on how things work. The better we can clear up the fog which surrounds some of these topics, the lower our coordinational friction will be when we have to directly address them.


Successfully reverse-engineering a subset of qualia (valence- perhaps the easiest type to reverse-engineer?) would be a great step in this direction.

 

10. Valence research could change the social and political landscape AGI research occurs in. This could take many forms: at best, a breakthrough could lead to a happier society where many previously nihilistic individuals suddenly have “skin in the game” with respect to existential risk. At worst, it could be a profound information hazard, and irresponsible disclosure or misuse of such research could lead to mass wireheading, mass emotional manipulation, and totalitarianism. Either way, it would be an important topic to keep abreast of.

 

 

These are not all independent issues, and not all are of equal importance. But, taken together, they do seem to imply that reverse-engineering valence will be decently relevant to FAI research, particularly with regard to the Value Problem, reducing metaphysical confusion, and perhaps making the hardest safety cases (e.g., neuromorphic AGIs) a little bit more tractable.

Causal Reference

30 Eliezer_Yudkowsky 20 October 2012 10:12PM

Followup to:  The Fabric of Real ThingsStuff That Makes Stuff Happen

Previous meditation: "Does your rule forbid epiphenomenalist theories of consciousness that consciousness is caused by neurons, but doesn't affect those neurons in turn? The classic argument for epiphenomenal consciousness is that we can imagine a universe where people behave exactly the same way, but there's nobody home - no awareness, no consciousness, inside the brain. For all the atoms in this universe to be in the same place - for there to be no detectable difference internally, not just externally - 'consciousness' would have to be something created by the atoms in the brain, but which didn't affect those atoms in turn. It would be an effect of atoms, but not a cause of atoms. Now, I'm not so much interested in whether you think epiphenomenal theories of consciousness are true or false - rather, I want to know if you think they're impossible or meaningless a priori based on your rules."

Is it coherent to imagine a universe in which a real entity can be an effect but not a cause?

Well... there's a couple of senses in which it seems imaginable. It's important to remember that imagining things yields info primarily about what human brains can imagine. It only provides info about reality to the extent that we think imagination and reality are systematically correlated for some reason.

That said, I can certainly write a computer program in which there's a tier of objects affecting each other, and a second tier - a lower tier - of epiphenomenal objects which are affected by them, but don't affect them. For example, I could write a program to simulate some balls that bounce off each other, and then some little shadows that follow the balls around.

But then I only know about the shadows because I'm outside that whole universe, looking in. So my mind is being affected by both the balls and shadows - to observe something is to be affected by it. I know where the shadow is, because the shadow makes pixels be drawn on screen, which make my eye see pixels. If your universe has two tiers of causality - a tier with things that affect each other, and another tier of things that are affected by the first tier without affecting them - then could you know that fact from inside that universe?

continue reading »

Nature: Red, in Truth and Qualia

35 orthonormal 29 May 2011 11:50PM

Previously: Seeing Red: Dissolving Mary's Room and Qualia, A Study of Scarlet: The Conscious Mental Graph

When we left off, we'd introduced a hypothetical organism called Martha whose actions are directed by a mobile graph of simple mental agents. The tip of the iceberg, consisting of the agents that are connected to Martha's language centers, we called the conscious subgraph. Now we're going to place Martha into a situation like Mary's Room: we'll say that a large unconscious agent of hers (like color vision) has never been active, we'll grant her an excellent conscious understanding of that agent, and then we'll see what happens when we activate it for the first time.

But first, there's one more mental agent we need to introduce, one which serves a key purpose in Martha's evolutionary history: a simple agent that identifies learning.

continue reading »

A Study of Scarlet: The Conscious Mental Graph

29 orthonormal 27 May 2011 08:13PM

Sequel to: Seeing Red: Dissolving Mary's Room and Qualia

Seriously, you should read first: Dissolving the Question, How an Algorithm Feels From Inside

In the previous post, we introduced the concept of qualia and the thought experiment of Mary's Room, set out to dissolve the question, and decided that we were seeking a simple model of a mind which includes both learning and a conscious/subconscious distinction. Since for now we're just trying to prove a philosophical point, we don't need to worry whether our model corresponds well to the human mind (though it would certainly be convenient if it did); we'll therefore pick an abstract mathematical structure that we can analyze more easily.

continue reading »

Seeing Red: Dissolving Mary's Room and Qualia

38 orthonormal 26 May 2011 05:47PM

Essential Background: Dissolving the Question

How could we fully explain the difference between red and green to a colorblind person?

Well, we could of course draw the analogy between colors of the spectrum and tones of sound; have them learn which objects are typically green and which are typically red (or better yet, give them a video camera with a red filter to look through); explain many of the political, cultural and emotional associations of red and green, and so forth... but it seems that the actual difference between our experience of redness and our experience of greenness is something much harder to convey. If we focus in on that aspect of experience, we end up with the classic philosophical concept of qualia, and the famous thought experiment known as Mary’s Room1.

Mary is a brilliant neuroscientist who has been colorblind from birth (due to a retina problem; her visual cortex would work normally if it were given the color input). She’s an expert on the electromagnetic spectrum, optics, and the science of color vision. We can postulate, since this is a thought experiment, that she knows and fully understands every physical fact involved in color vision; she knows precisely what happens, on various levels, when the human eye sees red (and the optic nerve transmits particular types of signals, and the visual cortex processes these signals, etc).

One day, Mary gets an operation that fixes her retinas, so that she finally sees in color for the first time. And when she wakes up, she looks at an apple and exclaims, "Oh! So that's what red actually looks like."2

Now, this exclamation poses a challenge to any physical reductionist account of subjective experience. For if the qualia of seeing red could be reduced to a collection of basic facts about the physical world, then Mary would have learned those facts earlier and wouldn't learn anything extra now– but of course it seems that she really does learn something when she sees red for the first time. This is not merely the god-of-the-gaps argument that we haven't yet found a full reductionist explanation of subjective experience, but an intuitive proof that no such explanation would be complete.

The argument in academic philosophy over Mary's Room remains unsettled to this day (though it has an interesting history, including a change of mind on the part of its originator). If we ignore the topic of subjective experience, the arguments for reductionism appear to be quite overwhelming; so why does this objection, in a domain in which our ignorance is so vast3, seem so difficult for reductionists to convincingly reject?

Veterans of this blog will know where I'm going: a question like this needs to be dissolved, not merely answered.

continue reading »

Dennett's heterophenomenology

5 RichardKennaway 16 January 2010 08:40PM

In an earlier comment, I conflated heterophenomenology in the general sense of taking introspective accounts as data to be explained rather than direct readouts of the truth, with Dennett's particular approach to explaining those data.  So to correct myself, I say that it is Dennett, rather than heterophenomenology, that claims that there is no such thing as consciousness. Dennett denies that he does, but I disagree. I defend this view here.

I have to admit at this point that I have not read "Consciousness Explained".  Had either of the library's copies been on the shelves last Tuesday I would have done by now, but instead I found his later book (and his most recent on the topic), "Sweet Dreams: Philosophical Obstacles to a Science of Consciousness".  The subtitle suggests a drawing back from the confidence of the earlier title, as does that of the book in between.  The book confirms me in my impression that the ideas of "C.E." have been in the air so long (the air of hard SF, sciblogs, and the like, not to mention Phil Goetz's recent posts) that reading the primary source 19 years on would be nothing more than an exercise in checkbox-ticking.

I'll give a brief run-through of "Sweet Dreams" and then carry on the argument.

continue reading »

ESR's New Take on Qualia

3 billswift 21 August 2009 09:26AM

http://esr.ibiblio.org/?p=1192#more-1192

ADDED:  Even if you disagree with ESR's take, and many will, this is the clearest definition I have seen on what qualia is.  So it should present a useful starting point, even for those who strongly disagree, to argue from.

View more: Next