Secrets of the eliminati

Scott Alexander

Anyone who does not believe mental states are ontologically fundamental - ie anyone who denies the reality of something like a soul - has two choices about where to go next. They can try reducing mental states to smaller components, or they can stop talking about them entirely.

In a utility-maximizing AI, mental states can be reduced to smaller components. The AI will have goals, and those goals, upon closer examination, will be lines in a computer program.

But in the blue-minimizing robot, its "goal" isn't even a line in its program. There's nothing that looks remotely like a goal in its programming, and goals appear only when you make rough generalizations from its behavior in limited cases.

Philosophers are still very much arguing about whether this applies to humans; the two schools call themselves reductionists and eliminativists (with a third school of wishy-washy half-and-half people calling themselves revisionists). Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:

I don't think there are true preferences. In one situation you have one tendency, in another situation you have another tendency, and "preference" is what it looks like when you try to categorize tendencies. But categorization is a passive and not an active process: if every day of the week I eat dinner at 6, I can generalize to say "I prefer to eat dinner at 6", but it would be non-explanatory to say that a preference toward dinner at 6 caused my behavior on each day. I think the best way to salvage preferences is to consider them as tendencies currently in reflective equilibrium.

A more practical example: when people discuss cryonics or anti-aging, the following argument usually comes up in one form or another: if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper. And therefore your reluctance to sign up for cryonics violates your own revealed preferences! You must just be trying to signal conformity or something.

The problem is that not signing up for cryonics is also a "revealed preference". "You wouldn't sign up for cryonics, which means you don't really fear death so much, so why bother running from a burning building?" is an equally good argument, although no one except maybe Marcus Aurelius would take it seriously.

Both these arguments assume that somewhere, deep down, there's a utility function with a single term for "death" in it, and all decisions just call upon this particular level of death or anti-death preference.

More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior. People guess at their opinions about death by analyzing these behaviors, usually with a bit of signalling thrown in. If they desire consistency - and most people do - maybe they'll change some of their other behaviors to conform to their hypothesized opinion.

One more example. I've previously brought up the case of a rationalist who knows there's no such thing as ghosts, but is still uncomfortable in a haunted house. So does he believe in ghosts or not? If you insist on there being a variable somewhere in his head marked $belief_in_ghosts = (0,1) then it's going to be pretty mysterious when that variable looks like zero when he's talking to the Skeptics Association, and one when he's running away from a creaky staircase at midnight.

But it's not at all mysterious that the thought "I don't believe in ghosts" gets reinforced because it makes him feel intelligent and modern, and staying around a creaky staircase at midnight gets punished because it makes him afraid.

Behaviorism was one of the first and most successful eliminationist theories. I've so far ignored the most modern and exciting eliminationist theory, connectionism, because it involves a lot of math and is very hard to process on an intuitive level. In the next post, I want to try to explain the very basics of connectionism, why it's so exciting, and why it helps justify discussion of behaviorist principles.

I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:

I don't think there are true preferences. In one situation you have one tendency, in another situation you have another tendency, and "preference" is what it looks like when you try to categorize tendencies. But categorization is a passive and not an active process: if every day of the week I eat dinner at 6, I can generalize to say "I prefer to eat dinner at 6", but it would be non-explanatory to say that a preference toward dinner at 6 caused my behavior on each day. I think the best way to salvage preferences is to consider them as tendencies currently in reflective equilibrium.

I enjoyed reading that, in the same way that I enjoyed reading Roko's Banned Post - I don't believe it for a moment, but it stretches the mind a little. This one is much more metaphysical, and it also has an eschatological optimism that Roko's didn't. I think such optimism has no rational basis whatsoever, and in any case it means little to those unfortunates stuck in Hell to be told that Heaven is coming at the end of the computation, but unfortunately it can't come any faster because of logical incompressibility. I'm thinking of Raymond Smullyan's dialogue, in which God says that the devil is "the unfortunate length of time" that the process of "enlightenment" inevitably takes, and I think Tipler might make similar apologies for his Omega Point on occasion. All possible universes eventually reach the Omega Point (because, according to a sophistical argument of Tipler's, space-time itself is inconsistent otherwise, so it's logically impossible for this not to happen), so goodness and justice will inevitably triumph in every part of the multiverse, but in some of them it will take a really long time.

So, if I approach your essay anthropologically, it's a mix of the very new cosmology and crypto-metaphysics (of Singularities in the multiverse, of everything as computation) with a much older thought-form - and of course you know this, having mentioned Neoplatonism - but I'd go further and say that the contents of this philosophy are being partly determined by a wishful thinking, which in turn is made possible by the fundamental uncertainty about the nature of reality. In other words, all sorts of terrible things may happen and may keep happening, but if you embrace Humean skepticism about induction, you can still say, nonetheless, reality might start functioning differently at any moment, therefore I have license to hope. In that case, uncertainty about the future course of mundane events provides the epistemic license for the leap of optimism.

Here, we have the new cosmological vision, of a universe (or multiverse) dominated by the rise of superintelligence in diverse space-time locations. It hasn't happened locally yet, but it's supposed to lie ahead of us in time. Then, we have the extra ingredient of acausal interaction between these causally remote (or even causally disjoint) superintelligences, who know about each other through simulation, reasoning, or other logically and mathematically structured explorations of the multiverse. And here is where the unreasonable optimism enters. We don't know what these superintelligences choose to do, once they sound out the structure of the multiverse, but it is argued that they will come to a common, logically preordained set of values, and that these values will be good. Thus, the idea of a pre-established harmony, as in Tipler (and I think in Leibniz too, and surely many others), complete with a reason why the past and present are so unharmonious (our local singularity hasn't happened yet), and also with an extra bit of hope that's entirely new and probably doesn't make sense: maybe the evil things that already happened will be cancelled out by reversing the computation - as if something can both have happened and could nonetheless be made to have never happened. Still, I bet Spinoza never thought of that one; all he could come up with was that evil is always an absence, that all things which actually exist are good, and so there's nothing that's actually bad.

The Stoics had a tendency to equate the order of nature with a cosmic Reason that was also a cosmic Good. Possibly Bertrand Russell was the one who pointed out that this is a form of power worship: just because this is the universal order, or this is the way that things have always been, does not in itself make it good. This point can easily be carried across to the picture of superintelligences arriving at their decisional equilibrium via mutual simulation: What exactly foreordains that the resulting equilibrium deserves the name of Good? Wouldn't the concrete outcome depend on the distribution of superintelligence value systems arising in the multiverse - something we know nothing about - and on the resources that each superintelligence brings to the table of acausal trade and negotiation? It's intriguing that even when confronted by such a bizarrely novel world-concept, the human mind is nonetheless capable, not only of interpreting it in a way originating from cultures which didn't even know that the sun is a star, but of finding a way to affirm the resulting cosmology as good and as predestined to be so.

I have mentioned Russell's reason for scorning the Stoic equation of the cosmic order with the cosmic good (it's just worship of overwhelming force), but I will admit that, from an elemental perspective which values personal survival (and perhaps the personal gains that can come from siding with power), it does make sense to ask oneself what the values of the hypothetical future super-AI might be. That is, even if one scorns the beatific cyber-vision as wishful thinking, one might agree that a future super-purge of the Earth, conducted according to the super-AI's value system, is a possibility, and attempt to shape oneself so as to escape it. But as we know, that might require shaping oneself to be a thin loop, a few centimeters long, optimized for the purpose of holding together several sheets of paper.

I acknowledge your points about not equating Goodness with Power, which is probably the failure mode of lusting for reflective consistency. (The lines of reasoning I go through in that link are pretty often missed by people who think they understand the nature of direction of morality, I think.) Maybe I should explicitly note that I was not at all describing my own beliefs, just trying to come up with a modern rendition of old-as-dirt Platonistic religionesque ideas. (Taoism is admirable in being more 'complete' and human-useful than the Big Good Metaphysi... (read more)

3lessdazed15y

So hell is a slow internet connection? Hmm, maybe there's something to this after all.

137

Secrets of the eliminati

137

137

137

Secrets of the eliminati

137

137