All of johnsonmx's Comments + Replies

I think the elephant in the room is the purpose of the simulation.

Bostrom takes it as a given that future intelligences will be interested in running ancestor simulations. Why is that? If some future posthuman civilization truly masters physics, consciousness, and technology, I don't see them using it to play SimUniverse. That's what we would do with limitless power; it's taking our unextrapolated, 2017 volition and asking what we'd do if we were gods. But that's like asking a 5-year-old what he wants to do when he grows up, then taking the answer seriousl... (read more)

We don't live in a universe that's nice or just all the time, so perhaps there are nightmare scenarios in our future. Not all traps have an escape. However, I think this one does, for two reasons.

(1) all the reasons that RobinHanson mentioned;

(2) we seem really confused about how consciousness works, which suggests there are large 'unknown unknowns' in play. It seems very likely that if we extrapolate our confused models of consciousness into extreme scenarios such as this, we'll get even more confused results.

0RedMan
Addressing 2...this argument is compelling, I read it to be equivalent to the statement that 'humqn ethics do not apply to ems, or human behavior regarding ems', so acting from the standpoint of 'ems are not human, therefore human ethics do not apply, and em suffering is not human suffering, so effective altruism does not apply to ems' is a way out of the trap. Taking it to its' conclusion, we can view Ems as vampires (consume resources, produce no human children, are not alive but also not dead), and like all such abominations must be destroyed to preserve the lives and futures of humans!

A rigorous theory of valence wouldn't involve cultural context, much as a rigorous theory of electromagnetism doesn't involve cultural context.

Cultural context may matter a great deal in terms of how to build a friendly AGI that preserves what's valuable about human civilization-- or this may mostly boil down to the axioms that 'pleasure is good' and 'suffering is bad'. I'm officially agnostic on whether value is simple or complex in this way.

One framework for dealing with the stuff you mention is Coherent Extrapolated Volition (CEV)- it's not the last word on anything but it seems like a good intuition pump.

0cameroncowan
And I guess I'm saying that the sooner we think about these sorts of things the better off we'll be. Going for pleasure good/suffering bad reduced the mindset of AI to about 2 years old. Cultural context gives us a sense of maturity Valence or no.

We're not on the same page. Let's try this again.

  • The assertion I originally put forth is AI safety; it is not about reverse-engineering qualia. I'm willing to briefly discuss some intuitions on how one may make meaningful progress on reverse-engineering qualia as a courtesy to you, my anonymous conversation partner here, but since this isn't what I originally posted about I don't have a lot of time to address radical skepticism, especially when it seems like you want to argue against some strawman version of IIT.

  • You ask for references (in a somewhat ru

... (read more)
0TheAncientGeek
I actually agree with aim of using some basic, "visceral" drive for AI safety. I have argued that making an AIs top-level drive the same as it's ostensible purpose, paperclipping or whatever, is a potential disaster, because any kind of cease and desist command has to be a "non maskable interrupt" that overrides everything else. But if all you are doing is trying to constrain an AIs behaviour, you have the opportunity to use methodological behaviourism, because you are basically trying to get a certain kind of response to a certain kind of input ..you can sidestep the Hard Problem. But that isn't anything very new. The functional/behavioural equivalents of pleasure and pain are positive and negative reinforcement, which machine learning systems have already.(That's somewhat new to MIRIland, because MIRI tends not to take much notice that large and important class of AIs, but otherwise it isn't new). You list a number of useful things one could do with an understanding of pain and pleasure as qualia. The hypotheticals are true enough, because there are a lot of things one could do with an understanding of qualia. But valency isn't really a simplification of the Hard Problem..it just appears to be one. In other words, if you are aiming at AI control, then bringing in qualia just makes things considerably more difficult for yourself. It made a prediction about what it does, which is scales of more consciousness to less consciousness. That isn't particularly relevant to understanding how qualia are implemented. It's not clear that an artificial system implemented to have high consciousness according to IIT would have qualia at all. But, while IIT isn't elarly relevant to qualia, qualia aren't clearly relevant to AI control. You don't have data about my overall approach. What I'm doing is noting that, historically, the problem remains unsolved, and that, historically, people who think there is some relatively easy answer have misunderstood the question, or are enga

If you're looking for a Full, Complete Data-Driven And Validated Solution to the Qualia Problem, I fear we'll have to wait a long, long time. This seems squarely in the 'AI complete' realm of difficulty.

But if you're looking for clever ways of chipping away at the problem, then yes, Casali's Perturbational Complexity Index should be interesting. It doesn't directly say anything about qualia, but it does indirectly support Tononi's approach, which says much about qualia. (Of course, we don't yet know how to interpret most of what it says, nor can we validat... (read more)

0TheAncientGeek
That's what I am disputing. You are taking a problem we don;t know how to make a start on, and turning it into a smaller problem we also don't know how to make a start on. That is't an advance. Reducing or simplifying a problem isn't an unconditional, universal solvent, it only works where the simpler problem is one you can actually make progress on. IIT isn't going toi be of any real use unless it is confirmed, and how are you goign to confirm it, as a theory of qualia, without qualiometers? If we are going to continue not having qualiometers, we may have to give up on testing consciousness objectiively in favour oof subjective measures...phenomenology and heterophenomenology. But you can only do heterophenomenology on a system that can report its subjective sates. Starting with simpler systems, like a single simulated pain receptor, is not going to work.

The stuff by Casali is pretty topical, e.g. his 2013 paper with Tononi.

0TheAncientGeek
You mean this? But that isn't really saying anything about qualia. The authors can relate their PCI measure to consciousness as judged medically... in humans. But would that scale be applicable to very simple systems or artificial systems? There is a real possibility that qualia could go missing in computational simulations,even assuming strict physicalism. In fact , we standardly assume that AIs embedded in games don't suffer.

Testing hypotheses derived from or inspired by IIT will probably be on a case-by-case basis. But given some of the empirical work on coma patients IIT has made possible. I think it may be stretching things to critique IIT as wholly reliant on circular reasoning.

That said, yes there are deep methodological challenges with qualia that any approach will need to overcome. I do see your objection quite clearly- I'm confident that I address this in my research (as any meaningful research on this must do) but I don't expect you to take my word for it. The positio... (read more)

0TheAncientGeek
I think it's smallish. and that's philosoophy, because I don't have qualiometer.
0TheAncientGeek
refs?

I do have some detailed thoughts on your two questions-- in short, given certain substantial tweaks, I think IIT (or variants by Tegmark/Griffiths) can probably be salvaged from its (many) problems in order to provide a crisp dataset on which to base testable hypotheses about qualia.

(If you're around the Bay Area I'd be happy to chat about this over a cup of coffee or something.)

I would emphasize, though, that this post only talks about the value results in this space would have for FAI, and tries to be as agnostic as possible on how any reverse-engineering may happen.

0TheAncientGeek
Im still not seeing how IIT would help with confirming that an attempt at reverse engineering had succeeded, absent circular reasoning along the lines of "IIT says the system will have qualia. therefore the system wil have qualia".

Are you referring to any specific "current research into qualia", or just the idea of qualia research in general? I definitely agree that valence research is a subset of qualia research- but there's not a whole lot of either going on at this point, or at least not much that has produced anything quantitative/predictive.

I suspect valence is actually a really great path to approach more 'general' qualia research, since valence could be a fairly simple property of conscious systems. If we can reverse-engineer one type of qualia (valence), it'll help us reverse other types.

0TheAncientGeek
There's a lot of philosophical research, and very little scientific research. That confirms the impression of philosophers qualia are a Hard Problem. How do you reverse engineer a quale? how do you tell you have succeeded? I think that you have underestimated the hardness of the problem.

It would probably be highly dependent on the AI's architecture. The basic idea comes from Shulman and Bostrom - Superintelligence, chapter 9, in the "Incentive methods" section (loc 3131 of 8770 on kindle).

My understanding is that such a strategy could help as part of a comprehensive strategy of limitations and inventivization but wouldn't be viable on its own.

Right, absolutely. These are all things that we don't know, but should.

Are you familiar with David Pearce's Hedonistic Imperative movement? He makes a lot of the same points and arguments, basically outlining that it doesn't seem impossible that we could (and should) radically reduce, and eventually eliminate, suffering via technology.

But the problem is, we don't know what suffering is. So we have to figure that out before we can make much radical progress on this sort of work. I.e., I think a rigorous definition of suffering will be an information-theoret... (read more)

Although life, sin, disease, redness, maleness, and dogness are (I believe) inherently 'leaky' / 'fuzzy' abstractions that don't belong with electromagnetism, this is a good comment. If a hypothesis is scientific, it will make falsifiable predictions. I hope to have something more to share on this soon.

I think we're still not seeing eye-to-eye on the possibility that valence, i.e., whatever pattern within conscious systems innately feels good, can be described crisply.

If it's clear a priori that it can't, then yes, this whole question is necessarily confused. But I see no argument to that effect, just an assertion. From your perspective, my question takes the form: "what's the thing that all dogs have in common?"- and you're trying to tell me it's misguided to look for some platonic 'essence of dogness'. Concepts don't work like that. I do get ... (read more)

0minusdash
Life, sin, disease, redness, maleness and indeed dogness "may" also be like electromagnetism. The English language may also be a fundamental part of the universe and maybe you could tell if "irregardless" or "wanna" are real English words by looking into a microscope or turning your telescope to certain parts of the sky, or maybe by looking at chicken intestines, who knows. I know some people think like this. Stuart Hameroff says that morality may be encoded into the universe at the Planck scale. So maybe that's where you should look for "good", maybe "pleasure" is there as well. But anyway, research into electromagnetism was done using the scientific method, which means that the hypothesis had to produce predictions that were tested and replicated numerous times. What sort of experiment would you envision for testing something about "inherently pleasurable" arrangements of atoms? Would the atoms make you feel warm and fuzzy inside when you look at them? Or would you try to put that pattern into different living creatures and see if they react with their normal joyful reactions?

Right- good questions.

First, I think getting a rigorous answer to this 'mystery of pain and pleasure' is contingent upon having a good theory of consciousness. It's really hard to say anything about which patterns in conscious systems lead to pleasure without a clear definition of what our basic ontology is.

Second, I've been calling this "The Important Problem of Consciousness", a riff off Chalmers' distinction between the Easy and Hard problems. I.e., if someone switched my red and green qualia in some fundamental sense it wouldn't matter; if so... (read more)

Right. It might be a little bit more correct to speak of 'temporal arrangements of arrangements of particles', for which 'processes' is a much less awkward shorthand.

But saying "pleasure is a neurological process" seems consistent with saying "it all boils down to physical stuff- e.g., particles, eventually", and doesn't seem to necessarily imply that "you can't find a 'pleasure pattern' that's fully generalized. The information is always contextual."

Good is a complex concept, not an irreducible basic constituent of the universe. It's deeply rooted in our human stuff like metabolism (food is good), reproduction (sex is good), social environment (having allies is good) etc

It seems like you're making two very distinct assertions here: first, that valence is not a 'natural kind', that it doesn't 'carve reality at the joints', and is impossible to form a crisp, physical definition of; and second, that valence is highly connected to drives that have been evolutionarily advantageous to have. The second is... (read more)

2minusdash
I don't like the expression "carve reality at the joints", I think it's very vague and hard to verify if a concept carves it there or not. The best way I can imagine this is that you have lots of events or 'things' in some description space and you can notice some clusterings, and you pick those clusters as concepts. But a lot depends on which subspace you choose and on what scale you're working... 'Good' may form a cluster or may not, I just don't even know how you could give evidence either way. It's unclear how you could formalize this in practice. My thoughts on pleasure and the concept of good is that your problem is that you're trying to discover the sharp edges of these categories, whereas concepts don't work like that. Take a look at this LW post and this one from Slatestarcodex. From the second one, the concept of a behemah/dag exists because fishing and hunting exist. Try to make it clearer what you're trying to ask. "What is pleasure really?" is a useless question. You may ask "what is going on in my body when I feel pleasure?" or "how could I induce that state again?" You seem to be looking for some mathematical description of the pattern of pleasure that would unify pleasure in humans and aliens with totally unknown properties (that may be based on fundamentally different chemistry or maybe instead of electomagnetism-based chemistry their processes work over the strong nuclear force or whatever). What do you really have in mind here? A formula, like a part of space giving off pulses at the rate of X and another part of space at 1 cm distance pulsating with rate Y? You may just as well ask how we would detect alien life at all. And then I'd say "life" is a human concept, not a divine platonic object out there that you can go to and see what it really is. We even have edge cases here on Earth, like viruses or prions. But the importance of these sorts of questions disappears if you think about what you'd do with the answer. If it's "I just want to know

I see the argument, but I'll note that your comments seem to run contrary to the literature on this: see, e.g., Berridge on "Dissecting components of reward: ‘liking’, ‘wanting’, and learning", as summed up by Luke in The Neuroscience of Pleasure. In short, behavior, memory, and enjoyment ('seeking', 'learning', and 'liking' in the literature) all seem to be fairly distinct systems in the brain. If we consider a being with a substantially different cognitive architecture, whether through divergent evolution or design, it seems problematic to view... (read more)

0TheAncientGeek
Yes, this is the qualia problem, and, no it isn't easy to imagine pain and pleasure being inverted. Spectrum inversion isn't a necessary criterion for something being a quale. You seen to have landed on the easy end of the hard problem.
0minusdash
I don't know how limited plasticity is. Speculation: maybe if we put on some color filter glasses that changes red with green or somehow mixes up the colors, then maybe even after a long time we'd still have the experience of the original red, even when looking at outside green material. Okay, let's say it's not plastic enough, we'd still feel an internal red qualia. But in what sense? What if the brain would truly rewire to recognize plants and moldy fruit etc. in the presence of "red" perception and the original "green" pattern would feed into visceral avoidance of "green" liquids (blood) and would wire into the speech areas in such a way that nominal "green" sensation is extremely linked to the word "red" (for example as measured by these experiments where the words meaning colors are colored with different colors, for example the word blue written in yellow). In this case, how could we say that the person is still "seeing green" when presented with objectively red things? What would be our anticipation under this hypothesis? Now, I think emotions are the same thing. Of course it could be that the brain architecture cannot rewire itself to start sweating and shouting and producing adrenaline in the presence of the previously pleasure associated pattern. Maybe the two modules are too far away or there is some other physical limitation. Then the question is pointless, it's about an impossible scenario. If the brain can't rewire itself then it still produces the old kind of behavior that is inconsisent with reality so it is observable (e.g. smiling when we would expect a normal person to should actually shout in pain). I don't think we can view pleasure as simply existing inside the brain without considering the environment. Similarly, the motor cortex doesn't contain the actual information of what the limbs look like. It's a relay station. It only works because the muscles are where they are. You can't tell what a motor neuron controls unless you follow its axon

Surely neurological processes are "arrangements of particles" too, though.

I think your question gets to the heart of the matter- is there a general principle to be found with regard to which patterns within conscious systems innately feel good, or isn't there? It would seem very surprising to me if there wasn't.

3Lumifer
Processes are not "arrangements", it's a dynamic vs static difference.
1minusdash
Good is a complex concept, not an irreducible basic constituent of the universe. It's deeply rooted in our human stuff like metabolism (food is good), reproduction (sex is good), social environment (having allies is good) etc. We can generalize from this and say that the general pattern of "good" things is that they tend to reinforce themselves. If you feel good, you'll strive to achive the same later. If you feel bad, you'll strive to avoid feeling that in the future. So if an experience makes more of it then it's good, otherwise it's bad. Note that we could also ask: "Is there a general principle to be found with regard to which patterns within conscious systems innately feel like smelling a rose, or isn't there?" We could build rose smell detecting machines in various ways. How can you say that one is really having the experience of smelling it while another isn't?

I had posted the original in 2013, and did a major revision today, before promoting it (leaving the structure of the questions intact, to preserve previous discussion referents).

I hope I haven't committed any faux pas in doing this.

Thank you- that paper is extremely relevant and I appreciate the link.

To reiterate, mostly for my own benefit: As Tegmark says- whether we're talking about a foundation to ethics, or a "final goal", or we simply want to not be confused about what's worth wanting, we need to figure out what makes one brain-state innately preferable to another, and ultimately this boils down to arrangements of particles. But what makes one arrangement of particles superior to another? (This is not to give credence to moral relativism- I do believe this has a crisp answer).

Very interesting. No objections to your main points, but a few comments on side points and conclusions:

  • You say "it's not like we know of a specific technological innovation that would solve poverty, if only someone would develop it." I would identify Greg Cochran's 'genetic spellcheck' as such a tech, along with what other people are suggesting. http://westhunt.wordpress.com/2012/02/27/typos/

  • "We might have exhausted the low-hanging fruits in our desires." I think this is right, but it's complicated. I think the Robin Hanson way to f

... (read more)
0Stuart_Armstrong
This is definitely relevant to the point here - thanks!

On the first point-- what you say is clearly right, but is also consistent with the notion that there are certain mathematical commonalities which hold across the various 'flavors' of pleasure, and different mathematical commonalities in pain states.

Squashing the richness of human emotion into a continuum of positive and negative valence sounds like a horribly lossy transform, but I'm okay with that in this context. I expect that experiences at the 'pleasure' end of the continuum will have important commonalities 'under the hood' with others at that same ... (read more)

I understand the type of criticism generally, but could you say more about this specific case?

I'm curious if the objection stems from some mismatch of abstraction layers, or just the habit of not speaking about certain topics in certain terms.

0minusdash
This all seems to be about the "qualia" problem. Take another example. How would you know if an alien was having the experience of seeing the color red? Well, you could show it red and see what changes. You could infer it from its behavior (for example if you trained it that red means food - if indeed the alien eats food). Similarly you could tell that it's suffering when it does something to avoid an ongoing situation, and if later on it would very much prefer not to go under the same conditions ever again. I don't think there is anything special about the actual mechanism and neural pattern that expresses pain or suffering in our brains. It's that pattern's relation to memories, sensory inputs and motor outputs that's important. Probably you could even retrain the brain to consider a certain fixed brain stimulus to be pleasure even though it was previously associated with pain. It's like putting on those corrective glasses that turn the visual input by 180° and the brain can adapt to that situation and the person is feeling normal after some time.
427chaos
Pleasure is not a static "arrangement of particles". Pleasure is a neurological process. You can't find a "pleasure pattern" that's fully generalized. The information is always contextual. This isn't a perfect articulation of my objections, but this is a difficult subject.
1falenas108
A possible answer: There are many different kinds of pain and pleasure, and trying to categorize all of them together loses information. For starters, the difference between physical and mental pain and pleasure. To get more nuanced, the difference between the stingy pain of a slap, the thudy pain of a punch, the searing pain of fire, and the pain from electricity are all very distinct feelings, which could have very different circuitry. I'm not as sure on the last paragraph, I would place that at 60% probability.

It does, and thank you for the reply.

How should we define "pleasure"? -- A difficult question. As you mention, it is a cloud of concepts, not a single one. It's even more difficult because there appears to be precious little driving the standardization of the word-- e.g., if I use the word 'chair' differently than others, it's obvious, people will correct me, and our usages will converge. If I use the word 'pleasure' differently than others, that won't be as obvious because it's a subjective experience, and there'll be much less convergence towar... (read more)

0minusdash
"what are the characteristic mathematics of (i.e., found disproportionally in) self-identified pleasurable brain states?" Certain areas of the brain get more active and certain hormones get into the bloodstream. How does this help you out?

We seem to be talking past each other, to some degree. To clarify, my six questions were chosen to illustrate how much we don't know about the mathematics and science behind psychological valence. I tried to have all of them point at this concept, each from a slightly different angle. Perhaps you interpret them as 'disguised queries' because you thought my intent was other than to seek clarity about how to speak about this general topic of valence, particularly outside the narrow context of the human brain?

I am not trying to "Learn how to manipulate p... (read more)

6gjm
I'm not nyan_sandwich, but here is what I believe to be his point about asking for necessary and sufficient conditions. Part of your question (maybe not all) appears to be: how should we define "pleasure"? Aside from precise technical definitions ("an abelian group is a set A together with a function from AxA to A, such that ..."), the meaning of a word is hardly ever* accurately given by any necessary-and-sufficient conditions that can be stated explicitly in a reasonable amount of space, because that just isn't the way human minds work. We learn the meaning of a word by observing how it's used. We see, and hear, a word like "pleasure" or "pain" applied to various things, and not to others. What our brains do with this is approximately to consider something an instance of "pleasure" in so far as it resembles other things that are called "pleasure". There's no reason why any manageable set of necessary and sufficient conditions should be equivalent to that. Further, different people are exposed to different sets of uses of the word, and evaluate resemblance in different ways. So your idea of "pleasure" may not be the same as mine, and there's no reason why there need be any definite answer to the question of whose is better. Typically, lots of different things will contribute to our considering something sufficiently like other instances of "pleasure" to deserve that name itself. In some particular contexts, some will be more important than others. So if you're trying to pin down a precise definition for "pleasure", the features you should concentrate on will depend on what that definition is going to be used for. Does any of that help?

Tononi's Phi theory seems somewhat relevant, though it only addresses consciousness and explicitly avoids valence. It does seem like something that could be adapted toward answering questions like this (somehow).

Current models of emotion based on brain architecture and neurochemicals (e.g., EMOCON) are relevant, though ultimately correlative and thus not applicable outside of the human brain.

There's also a great deal of quality literature about specific correlates of pain and happiness- e.g., Building a neuroscience of pleasure and well-being and An fMRI-B... (read more)

I'd just like to say thanks for posting this. Cogent, researched, cheerful, and helpful.

Another view of Philosophy, which I believe Russell also subscribed to (but I can't seem to find a reference for presently) is that philosophy was the 'mother discipline'. It was generative. You developed your branch of Philosophy until you got your ontology and methodology sorted out, and then you stopped calling what you were doing philosophy. (This has the amusing side-effect of making anything philosophers say wrong by definition-- sometimes useful, but always wrong.)

The Natural Sciences, Psychology, Logic, Mathematics, Linguistics-- they all got their... (read more)

We can speak of different tiers of stuff, interacting (or not) through unknown causal mechanisms, but Occam's Razor would suggest these different tiers of stuff might actually be fundamentally the same 'stuff', just somehow viewed from different angles. (This would in turn suggest some form of panpsychism.)

In short, I have trouble seeing how we make these metaphysical hierarchizations pay rent. Perhaps that's your point also.

Yes, and I would say finding bunnies cuter than human babies isn't a strong argument against Dennett's hypothesis. Supernormal Stimuli are quite common in humans and non-humans.

I think this argument could be analogously phrased: "The reason why exercise makes us feel good can't be to get us to exercise more, because cocaine feels even better than exercise." Seems wrong when we put it that way.

You probably get a much richer sensation of zebra-ness under some conditions (being there, touching the zebra, smelling the zebra, seeing it move) than just seeing a picture of one on flickr. Experiencing zebra-ness isn't a binary value, and some types of exposures will tend to commandeer many more neurons than others.

I think the first 3/4ths are very well stated. I couldn't agree more.

On the last bit, my personal intuition is there are plenty of things people can do for FAI research beyond raising money. Moreover, such intangibles are likely often more important to the cause of FAI than cash.

(Also, the argument that "some of those willing and able able to do FAI research are spending their time raising money, right now, for lack of other ways to get money" may be undermined by the paragraph above it; e.g., I'd rather be thinking about FAI than raising money for others to think about FAI.)

I would suggest that a breakdown in social order (without a singularity occurring) is another scenario that might be roughly as probable as the others you mentioned. In such case, it would seem the manner by which you invest in equities would matter. I.e.., the value of most abstract investments may vanish, and the value of equities held in trust by various institutions (or counterparties) may also vanish.

1gwern
Which falls in the 'not valuable'/'not Singularity' cell of the 2x2 table.

I think it's possible that any leaky abstraction used in designing FAI might doom the enterprise. But if that's not true, we can use this "qualia translation function" to make a leaky abstractions in a FAI context a tiny bit safer(?).

E.g., if we're designing an AGI with a reward signal, my intuition is we should either (1) align our reward signal with actual pleasurable qualia (so if our abstractions leak it matters less, since the AGI is drawn to maximize what we want it to maximize anyway); (2) implement the AGI in an architecture/substrate whi... (read more)

I'd say nobody does! But a little less glibly, I personally think the most productive strategy in biologically-inspired AGI would be to focus on tools that help quantify the unquantified. There are substantial side-benefits to such a focus on tools: what you make can be of shorter-term practical significance, and you can test your assumptions.

Chalmers and Tononi have done some interesting work, and Tononi's work has also had real-world uses. I don't see Tononi's work as immediately applicable to FAI research but I think it'll evolve into something that wil... (read more)

I don't think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it's different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it's safe-- we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) ... (read more)

1TheOtherDave
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you're calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.

I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI's safety research can apply, that's a big problem.

On the second point, what you say makes sense. Particularly, AGI feelings haven't been completely ignored at LW; if they prove important, SI doesn't have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior any... (read more)

2TheOtherDave
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it's important to know that as well. (It may also be relevant for non-physical systems.) I'm a little wary of referring to those morally relevant states as "qualia" because that term gets used by so many different people in so many different ways, but I suppose labels don't matter much... we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to. Leaving that aside... so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn't expect. So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that's a problem. 100% agreed. But it's the same problem whether the root cause lies within the AGI's emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition. You seem to be arguing that it's a special problem if the failure is due to emotions or qualia or feelings? I'm not sure why. I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it's easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralizatio

Thank you.

I'd frame why I think biology matters in FAI research in terms of research applicability and toolbox dividends.

On the first reason--- applicability--- I think more research focus on biologically-inspired AGI would make a great deal of sense is because the first AGI might be a biologically-inspired black box, and axiom-based FAI approaches may not particularly apply to such. I realize I'm (probably annoyingly) retreading old ground here with regard to which method will/should win the AGI race, but SIAI's assumptions seem to run counter to the ass... (read more)

0hairyfigment
As a layman I don't have a clear picture of how to start doing that. How would it differ from this? Looks like you can find the paper in question here (WARNING: out-of-date 2002 content).
2TheOtherDave
(nods) Regarding your first point... as I understand it, SI (it no longer refers to itself as SIAI, incidentally) rejects as too dangerous to pursue any approach (biologically inspired or otherwise) that leads to a black-box AGI, because a black-box AGI will not constrain its subsequent behavior in ways that preserve the things we value except by unlikely chance. The idea is that we can get safety only by designing safety considerations into the system from the ground up; if we give up control of that design, we give up the ability to design a safe system. Regarding your second point... there isn't any assumption that AGIs won't feel stuff, or that its feelings can be ignored. (Nor even that they are mere "feelings" rather than genuine feelings.) Granted, Yudkowski talks here about going out of his way to ensure something like that, but he treats this as an additional design constraint that adequate engineering knowledge will enable us to implement, not as some kind of natural default or simplifying assumption. (Also, I haven't seen any indication that this essay has particularly informed SI's subsequent research. Those more closely -- which is to say, at all -- affiliated with SI might choose to correct me here.) And there certainly isn't an expectation that its behavior will be predictable at any kind of granular level. What there is is the expectation that a FAI will be designed such that its unpredictable behaviors (including feelings, if it has feelings) will never act against its values, and such that its values won't change over time. So, maybe you're right that explicitly modeling what an AGI feels (again, no scare-quotes needed or desired) is critically important to the process of AGI design. Or maybe not. If it turns out to be, I expect that SI is as willing to approach design that way as any other. (Which should not be taken as an expression of confidence in their actual ability to design an AGI, Friendly or otherwise.) Personally, I find it unlikely
2Kawoomba
If that were the case (and it may very well be), there goes provably friendly AI, for to guarantee a property under all circumstances, it must be upheld from the bottom layer upwards.
johnsonmx140

I'm Mike Johnson. I'd estimate I come across a reference to LW from trustworthy sources every couple of weeks, and after working my way through the sequences it feels like the good outweighs the bad and it's worth investing time into.

My background is in philosophy, evolution, and neural nets for market prediction; I presently write, consult, and am in an early-stage tech startup. Perhaps my highwater mark in community exposure has been a critique of the word Transhumanist at Accelerating Future. In the following years, my experience has been more mixed, bu... (read more)

5TheOtherDave
FWIW, I find your unvarnished thoughts, and the cogency with which you articulate them, refreshing. (The thoughts aren't especially novel, but the cogency is.) In particular, I'm interested in your thoughts on what benefits a greater focus on biologically inspired AGI might provide that a distaste for it would limit LW from concluding/achieving.