FAI and the Information Theory of Pleasure

8 johnsonmx 08 September 2015 09:16PM

Previously, I talked about the mystery of pain and pleasure, and how little we know about what sorts of arrangements of particles intrinsically produce them.

 

Up now: should FAI researchers care about this topic? Is research into the information theory of pain and pleasure relevant for FAI? I believe so! Here are the top reasons I came up with while thinking about this topic.

 

An important caveat: much depends on whether pain and pleasure (collectively, 'valence') are simple or complex properties of conscious systems. If they’re on the complex end of the spectrum, many points on this list may not be terribly relevant for the foreseeable future. On the other hand, if they have a relatively small “kolmogorov complexity” (e.g., if a ‘hashing function’ to derive valence could fit on a t-shirt), crisp knowledge of valence may be possible sooner rather than later, and could have some immediate relevance to current FAI research directions.

Additional caveats: it’s important to note that none of these ideas are grand, sweeping panaceas, or are intended to address deep metaphysical questions, or aim to reinvent the wheel- instead, they’re intended to help resolve empirical ambiguities and modestly enlarge the current FAI toolbox.

 

 

1. Valence research could simplify the Value Problem and the Value Loading Problem. If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI.

 

2. Valence research could form the basis for a well-defined ‘sanity check’ on AGI behavior. Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness. If there’s be a lot less of that pattern, the intervention is probably a bad idea.

 

3. Valence research could help us be humane to AGIs and WBEs. There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering. Unfortunately, many of these early systems won’t work well— i.e., they’ll be insane. It would be great if we had a good way to detect profound suffering in such cases and halt the system.

 

4. Valence research could help us prevent Mind Crimes. Nick Bostrom suggests in Superintelligence that AGIs might simulate virtual humans to reverse-engineer human preferences, but that these virtual humans might be sufficiently high-fidelity that they themselves could meaningfully suffer. We can tell AGIs not to do this- but knowing the exact information-theoretic pattern of suffering would make it easier to specify what not to do.

 

5. Valence research could enable radical forms of cognitive enhancement. Nick Bostrom has argued that there are hard limits on traditional pharmaceutical cognitive enhancement, since if the presence of some simple chemical would help us think better, our brains would probably already be producing it. On the other hand, there seem to be fewer a priori limits on motivational or emotional enhancement. And sure enough, the most effective “cognitive enhancers” such as adderall, modafinil, and so on seem to work by making cognitive tasks seem less unpleasant or more interesting. If we had a crisp theory of valence, this might enable particularly powerful versions of these sorts of drugs.

 

6. Valence research could help align an AGI’s nominal utility function with visceral happiness. There seems to be a lot of confusion with regard to happiness and utility functions. In short: they are different things! Utility functions are goal abstractions, generally realized either explicitly through high-level state variables or implicitly through dynamic principles. Happiness, on the other hand, seems like an emergent, systemic property of conscious states, and like other qualia but unlike utility functions, it’s probably highly dependent upon low-level architectural and implementational details and dynamics. In practice, most people most of the time can be said to have rough utility functions which are often consistent with increasing happiness, but this is an awfully leaky abstraction.

 

My point is that constructing an AGI whose utility function is to make paperclips, and constructing a sentient AGI who is viscerally happy when it makes paperclips, are very different tasks. Moreover, I think there could be value in being able to align these two factors— to make an AGI which is viscerally happy to the exact extent it’s maximizing its nominal utility function.

(Why would we want to do this in the first place? There is the obvious semi-facetious-but-not-completely-trivial answer— that if an AGI turns me into paperclips, I at least want it to be happy while doing so—but I think there’s real potential for safety research here also.)

 

7. Valence research could help us construct makeshift utility functions for WBEs and Neuromorphic AGIs. How do we make WBEs or Neuromorphic AGIs do what we want? One approach would be to piggyback off of what they already partially and imperfectly optimize for already, and build a makeshift utility function out of pleasure. Trying to shoehorn a utility function onto any evolved, emergent system is going to involve terrible imperfections, uncertainties, and dangers, but if research trends make neuromorphic AGI likely to occur before other options, it may be a case of “something is probably better than nothing.”

 

One particular application: constructing a “cryptographic reward token” control scheme for WBEs/neuromorphic AGIs. Carl Shulman has suggested we could incentivize an AGI to do what we want by giving it a steady trickle of cryptographic reward tokens that fulfill its utility function- it knows if it misbehaves (e.g., if it kills all humans), it’ll stop getting these tokens. But if we want to construct reward tokens for types of AGIs that don’t intrinsically have crisp utility functions (such as WBEs or neuromorphic AGIs), we’ll have to understand, on a deep mathematical level, what they do optimize for, which will at least partially involve pleasure.

 

8. Valence research could help us better understand, and perhaps prevent, AGI wireheading. How can AGI researchers prevent their AGIs from wireheading (direct manipulation of their utility functions)? I don’t have a clear answer, and it seems like a complex problem which will require complex, architecture-dependent solutions, but understanding the universe’s algorithm for pleasure might help clarify what kind of problem it is, and how evolution has addressed it in humans.

 

9. Valence research could help reduce general metaphysical confusion. We’re going to be facing some very weird questions about philosophy of mind and metaphysics when building AGIs, and everybody seems to have their own pet assumptions on how things work. The better we can clear up the fog which surrounds some of these topics, the lower our coordinational friction will be when we have to directly address them.


Successfully reverse-engineering a subset of qualia (valence- perhaps the easiest type to reverse-engineer?) would be a great step in this direction.

 

10. Valence research could change the social and political landscape AGI research occurs in. This could take many forms: at best, a breakthrough could lead to a happier society where many previously nihilistic individuals suddenly have “skin in the game” with respect to existential risk. At worst, it could be a profound information hazard, and irresponsible disclosure or misuse of such research could lead to mass wireheading, mass emotional manipulation, and totalitarianism. Either way, it would be an important topic to keep abreast of.

 

 

These are not all independent issues, and not all are of equal importance. But, taken together, they do seem to imply that reverse-engineering valence will be decently relevant to FAI research, particularly with regard to the Value Problem, reducing metaphysical confusion, and perhaps making the hardest safety cases (e.g., neuromorphic AGIs) a little bit more tractable.

The mystery of pain and pleasure

8 johnsonmx 01 March 2015 07:47PM

 

Some arrangements of particles feel better than others. Why?

We have no general theories, only descriptive observations within the context of the vertebrate brain, about what produces pain and pleasure. It seems like there's a mystery here, a general principle to uncover.

Let's try to chart the mystery. I think we should, in theory, be able to answer the following questions:


(1) What are the necessary and sufficient properties for a thought to be pleasurable?

(2) What are the characteristic mathematics of a painful thought?

(3) If we wanted to create an artificial neural network-based mind (i.e., using neurons, but not slavishly patterned after a mammalian brain) that could experience bliss, what would the important design parameters be?

(4) If we wanted to create an AGI whose nominal reward signal coincided with visceral happiness -- how would we do that?

(5) If we wanted to ensure an uploaded mind could feel visceral pleasure of the same kind a non-uploaded mind can, how could we check that? 

(6) If we wanted to fill the universe with computronium and maximize hedons, what algorithm would we run on it?

(7) If we met an alien life-form, how could we tell if it was suffering?


It seems to me these are all empirical questions that should have empirical answers. But we don't seem to have much for hand-holds which can give us a starting point.

Where would *you* start on answering these questions? Which ones are good questions, and which ones are aren't? And if you think certain questions aren't good, could you offer some you think are?

 

As suggested by shminux, here's some research I believe is indicative of the state of the literature (though this falls quite short of a full literature review):

Tononi's IIT seems relevant, though it only addresses consciousness and explicitly avoids valence. Max Tegmark has a formal generalization of IIT which he claims should apply to non-neural substrates. And although Tegmark doesn't address valence either, he posted a recent paper on arxiv noting that there *is* a mystery here, and that it seems topical for FAI research.

Current models of emotion based on brain architecture and neurochemicals (e.g., EMOCON) are somewhat relevant, though ultimately correlative or merely descriptive, and seem to have little universalization potential.

There's also a great deal of quality literature about specific correlates of pain and happiness- e.g., Building a neuroscience of pleasure and well-being and An fMRI-Based Neurologic Signature of Physical Pain. Luke covers Berridge's research in his post, The Neuroscience of Pleasure. Short version: 'liking', 'wanting', and 'learning' are all handled by different systems in the brain. Opioids within very small regions of the brain seem to induce the 'liking' response; elsewhere in the brain, opioids only produce 'wanting'. We don't know how or why yet. This sort of research constrains a general principle, but doesn't really hint toward one.

 

In short, there's plenty of research around the topic, but it's focused exclusively on humans/mammals/vertebrates: our evolved adaptations, our emotional systems, and our architectural quirks. Nothing on general or universal principles that would address any of (1)-(7). There is interesting information-theoretic / patternist work being done, but it's highly concentrated around consciousness research.

 

---

 

Bottom line: there seems to be a critically important general principle as to what makes certain arrangements of particles innately preferable to others, and we don't know what it is. Exciting!