Bleh, I think there may be too much equivocation going on, even though your comment is basically correct. My original "insane" comment is not representative of my comments, nor is it a good example of the skill of charitable interpretation.
When I give justifications they do tend to be pretty related to the causes of my actions, though often in weird double-negative ways. Sometimes I do something because I am afraid of the consequences of doing something, in a self-defeating manner. I think a lot of my trying to appear discreditable is a defense mechanism put up because I am afraid of what would happen if I let myself flinch away from the prospect of appearing discreditable, like, afraid of the typical default failure mode where people get an identity as someone who is "reasonable" and then stops signalling and thus stops thinking thoughts that are "unreasonable", where "reason" is only a very loose correlate of sanity. My favorite LW article ever is "Cached Selves", and that has been true for two years now. Also one of my closest friends co-wrote that article, and his thinking has had a huge effect on mine.
I think saying it was "fun" is actually the rationalization, and I knew it was a rationalization, and so I was lying. It's a lot more complex than that. I wrote it more because I was feeling frustrated at what I perceived to be an unjustified level of contempt in the Less Wrong community. (/does more reflection to make sure I'm not making things up.) Okay. Also relatedly part of it was wanting to signal insanity for the reasons outlined above, or reasons similar to the ones outline above in the sense of being afraid of some consequence of not doing something that I feel is principled, or something that I feel would make me a bad person if I didn't attempt to do. Part of it was wanting to signal something like cleverness, which is maybe where some of the "fun" happens to be, though I can only have so much fun when I'm forced to type very quickly. Part of it was trolling for its own sake on top of the aforementioned anti-anti-virtuous rationale, though where the motivation for "trolling for its own sake" came from might be the same as that anti-anti-virtuous rationale but stemming from a more fundamental principle. I would be suspicious if any of these reasons claimed to be the real reason. Actions tend to follow many reasons in conjunction. (/avoids going off on a tangent about the principle of sufficient reason and Leibniz's theodicy for irony's sake.)
It's interesting because others seem to be much more attached to certain kinds of language than I am, and so when they model me they model me as being unhealthily attached to the language of religion or spirituality or something for its own sake, and think that this is dangerous. I think this may be at least partially typical mind fallacy. I am interested in these languages because I like trolling people (and I like trolling people for many reasons as outline above), but personally much prefer the language of algorithmic probability and generally computationalism, which can actually be used precisely to talk about well-defined things. I only talk in terms of theism when I'm upset at people for being contemptuous of theism. Again there are many reasons for these things, often at different levels of abstraction, and it's all mashed together.
I wrote it more because I was feeling frustrated at what I perceived to be an unjustified level of contempt in the Less Wrong community.
I'm still not clear on what makes it unjustified.
Anyone who does not believe mental states are ontologically fundamental - ie anyone who denies the reality of something like a soul - has two choices about where to go next. They can try reducing mental states to smaller components, or they can stop talking about them entirely.
In a utility-maximizing AI, mental states can be reduced to smaller components. The AI will have goals, and those goals, upon closer examination, will be lines in a computer program.
But in the blue-minimizing robot, its "goal" isn't even a line in its program. There's nothing that looks remotely like a goal in its programming, and goals appear only when you make rough generalizations from its behavior in limited cases.
Philosophers are still very much arguing about whether this applies to humans; the two schools call themselves reductionists and eliminativists (with a third school of wishy-washy half-and-half people calling themselves revisionists). Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.
I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:
A more practical example: when people discuss cryonics or anti-aging, the following argument usually comes up in one form or another: if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper. And therefore your reluctance to sign up for cryonics violates your own revealed preferences! You must just be trying to signal conformity or something.
The problem is that not signing up for cryonics is also a "revealed preference". "You wouldn't sign up for cryonics, which means you don't really fear death so much, so why bother running from a burning building?" is an equally good argument, although no one except maybe Marcus Aurelius would take it seriously.
Both these arguments assume that somewhere, deep down, there's a utility function with a single term for "death" in it, and all decisions just call upon this particular level of death or anti-death preference.
More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior. People guess at their opinions about death by analyzing these behaviors, usually with a bit of signalling thrown in. If they desire consistency - and most people do - maybe they'll change some of their other behaviors to conform to their hypothesized opinion.
One more example. I've previously brought up the case of a rationalist who knows there's no such thing as ghosts, but is still uncomfortable in a haunted house. So does he believe in ghosts or not? If you insist on there being a variable somewhere in his head marked $belief_in_ghosts = (0,1) then it's going to be pretty mysterious when that variable looks like zero when he's talking to the Skeptics Association, and one when he's running away from a creaky staircase at midnight.
But it's not at all mysterious that the thought "I don't believe in ghosts" gets reinforced because it makes him feel intelligent and modern, and staying around a creaky staircase at midnight gets punished because it makes him afraid.
Behaviorism was one of the first and most successful eliminationist theories. I've so far ignored the most modern and exciting eliminationist theory, connectionism, because it involves a lot of math and is very hard to process on an intuitive level. In the next post, I want to try to explain the very basics of connectionism, why it's so exciting, and why it helps justify discussion of behaviorist principles.