The former is incredibly stupid, an agent that consistently gets its imagination confused with reality and cannot, even in principle, separate them would be utterly incapable of abstract thought.
Welcome to evolution. Have you looked at humanity lately?
(Ok, enough snide remarks. I do agree that this is fairly stupid design, but it would still work in many cases. The fact that it can't handle advanced neuroscience is unfortunate, but it worked really well in the Savannah.)
How about the fact that I claim not to want to wire head? My beliefs about my desires are surely correlated with my desires.
(I strongly disagree that "most of humanity" is against wireheading. The only evidence for that are very flawed intuition pumps that can easily be reversed.)
However, I do take your disagreement (and that of others here) seriously. It is a major reason why I don't just go endorse wireheading and why I wrote the post in the first place. Believe me, I'm listening. I'm sorry if I made the impression that I just discard your opinion as confused.
You are privileging the hypothesis. Your view has a low prior (most of the matter in the universe is not part of my mind, so given that I might care about anything it is not very likely that I will care about one specific lump of meat?).
It would have a low prior if human minds were pulled out of mind space at random. They aren't. We do know that they are reinforcement-based and we have good evolutionary pathways how complex minds based on that would be created. Reinforcement-based minds, however, are exactly like the first kind of mind I described and, it seems to me, should always wirehead if they can.
As such, assuming no more, we should have no problem with wireheading. The fact that we do needs to be explained. Assuming there's an additional complex utility calculation would answer the question, but that's a fairly expensive hypothesis, which is why I asked for evidence. On the other hand, assuming (unconscious) signaling, mistaken introspection and so on relies only on mechanisms we already know exist and equally works, but favors wireheading.
Economic models that do assume complex calculations like that, if I understand it correctly, work badly, while simpler models (PCT, behavioral economics in general) work much better.
You don't present any evidence of your own, and yet you demand that I present mine.
You are correct that I have not presented any evidence in favor of wireheading. I'm not endorsing wireheading and even though I think there are good arguments for it, I deliberately left them out. I'm not interested in "my pet theory about values is better than your pet theory and I'm gonna convince you of that". Looking at models of human behavior and inferred values, however, wireheading seems like a fairly obvious choice. The fact that you (and others) disagree makes me think I'm missing something.
The fact that it can't handle advanced neuroscience is unfortunate, but it worked really well in the Savannah.
What do you mean it can't handle advanced neuroscience? Who do you think invented neuroscience!
One of the points I was trying to make was that humans can, in principle, separate out the two concepts, if they couldn't then we wouldn't even be having this conversation.
Since we can separate these concepts, it seems like our final reflective equilibrium, whatever that looks like is perfectly capable of treating them differently. I think that wire-he...
I've been thinking about wireheading and the nature of my values. Many people here have defended the importance of external referents or complex desires. My problem is, I can't understand these claims at all.
To clarify, I mean wireheading in the strict "collapsing into orgasmium" sense. A successful implementation would identify all the reward circuitry and directly stimulate it, or do something equivalent. It would essentially be a vastly improved heroin. A good argument for either keeping complex values (e.g. by requiring at least a personal matrix) or external referents (e.g. by showing that a simulation can never suffice) would work for me.
Also, I use "reward" as short-hand for any enjoyable feeling, as "pleasure" tends to be used for a specific one of them, among bliss, excitement and so on, and "it's not about feeling X, but X and Y" is still wireheading after all.
I tried collecting all related arguments I could find. (Roughly sorted from weak to very weak, as I understand them, plus link to example instances. I also searched any literature/other sites I could think of, but didn't find other (not blatantly incoherent) arguments.)
(There have also been technical arguments against specific implementations of wireheading. I'm not concerned with those, as long as they don't show impossibility.)
Overall, none of this sounds remotely plausible to me. Most of it is outright question-begging or relies on intuition pumps that don't even work for me.
It confuses me that others might be convinced by arguments of this sort, so it seems likely that I have a fundamental misunderstanding or there are implicit assumptions I don't see. I fear that I have a large inferential gap here, so please be explicit and assume I'm a Martian. I genuinely feel like Gamma in A Much Better Life.
To me, all this talk about "valueing something" sounds like someone talking about "feeling the presence of the Holy Ghost". I don't mean this in a derogatory way, but the pattern "sense something funny, therefore some very specific and otherwise unsupported claim" matches. How do you know it's not just, you know, indigestion?
What is this "valuing"? How do you know that something is a "value", terminal or not? How do you know what it's about? How would you know if you were mistaken? What about unconscious hypocrisy or confabulation? Where do these "values" come from (i.e. what process creates them)? Overall, it sounds to me like people are confusing their feelings about (predicted) states of the world with caring about states directly.
To me, it seems like it's all about anticipating and achieving rewards (and avoiding punishments, but for the sake of the wireheading argument, it's equivalent). I make predicitions about what actions will trigger rewards (or instrumentally help me pursue those actions) and then engage in them. If my prediction was wrong, I drop the activity and try something else. If I "wanted" something, but getting it didn't trigger a rewarding feeling, I wouldn't take that as evidence that I "value" the activity for its own sake. I'd assume I suck at predicting or was ripped off.
Can someone give a reason why wireheading would be bad?