Vladimir_Nesov comments on Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (244)
Any notion of progress (what we want is certainly not evolution) can be captured as a deterministic criterion.
Obviously I meant 'evolution' in the sense of change over time, not change specifically induced by natural selection.
As to a deterministic criterion, I agree that such a thing is probably possible. But... so what? I'm not arguing that FAI isn't possible. The topic at hand is FAI research relative to WBE. I'm assuming a priori that both are possible. The question is which basket should get more eggs.
You said:
This is misuse of the term "preference". "Preference", in the context of this discussion, refers specifically to that which isn't to be changed, ever. This point isn't supposed to be related to WBE vs. FAI discussion, it's about a tool (the term "preference") used in leading this discussion.
Your definition is too narrow for me to accept. Humans are complicated. I doubt we have a core set of "preferences" (by your definition) which can be found with adequate introspection. The very act of introspection itself changes the human and potentially their deepest preferences (normal definition)!
I have some preferences which satisfy your definition, but I wouldn't consider them my core, underlying preferences. The vast majority of preferences I hold do not qualify. I'm perfectly OK with them changing over time, even the ones that guide the overarching path of my life. Yes, the change in preferences is often caused by other preferences, but to think that this causal chain can be traced back to a core preference is unjustified, in my opinion. There could just as well be closed loops in the causal tree.
You are disputing definitions! Of course, there are other natural ways to give meaning to the word "preference", but they are not as useful in discussing FAI as the comprehensive unchanging preference. It's not supposed to have much in common with likes or wants, and with their changes, though it needs to, in particular, describe what they should be, and how they should change. Think of your preference as that particular formal goal system that it is optimal, from your point of view (on reflection, if you knew more, etc.), to give to a Strong AI.
Your dislike for application of the label "preference" to this concept, and ambiguity that might introduce, needs to be separated from consideration of the concept itself.
I specifically dispute the usefulness of your definition. It may be a useful definition in the context of FAI theory. We aren't discussing FAI theory.
And, to be fair, you were originally the one disputing definitions. In my post I used the standard definition of 'preference', which you decided was 'wrong', saying
rather than accepting the implied (normal!) definition I had obviously used.
Regardless, it seems unlikely we'll be making any progress on the on-topic discussion even if we resolve this quibble.
But we do. Whether a particular action is going to end well for humanity is a core consideration in Friendliness. When you say
if it's read as implying that this road is OK, it is a factual claim about how preferable (in my sense) the outcome is going to be. The concept of preference (in my sense) is central to evaluating the correctness of your factual claim.
Your concept of preference is one way of evaluating the correctness of my claim, I agree. If you can resolve the complex web of human preferences (in my sense) into a clean, non-contradictory, static preference system (your sense) then you can use that system to judge the value of the hypothetical future in which WBE research overran FAI research.
It's not clear to me that this is the only way to evaluate my claim, or that it is even a reasonable way. My understanding of FAI is that arriving at such a resolution of human preferences is a central ingredient to building an FAI, hence using your method to evaluate my claim would require more progress on FAI. But the entire point of this discussion is to decide if we should be pushing harder for progress on FAI or WBE. I'll grant that this is a point in favor for FAI -- that it allows for a clearer evaluation of the very problem we're discussing -- but, beyond that, I think we must rely on the actual preferences we have access to now (in my sense: the messy, human ones) to further our evaluations of FAI and WBE.
If your statement ("The route of WBE simply takes the guess work out") were a comparison between two routes similar in approach, e.g. WBE and neuroenhancement, then you could argue that a better formal understanding of preference would be required before we could use the idea of "precise preference" to argue for one approach or the other.
Since we are comparing one option which does not try to capture preference precisely with an option that does, it does not matter what exactly precise preference says about the second option: Whatever statement our precise preferences make, the second option tries to capture it whereas the first option makes no such attempt.
The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess. Afterwards the next guess as to our fundamental preference is likely different, so the process iterates. The iteration is trying to evolve towards what the agent thinks is its exact preference. The iteration is simply doing so to some sort of "first order" approximation.
For the first option, I think self-modification under the direction of current, apparent preferences should be done with extreme caution, so as to get a better 'approximation' at each step. For the second option though, it's hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.
We do understand something about exact preferences in general, without knowing which one of them is ours. In particular, we do know that drifting from whatever preference we have is not preferable.
I agree. If our complex preferences can be represented as exact preferences then any drift from those exact preferences would be necessarily bad. However, it's not clear to me that we actually would be drifting from our exact preference were we to follow the path of WBE.
It's clear that the preferences we currently express most likely aren't our exact preferences. The path of WBE could potentially lead to humans with fundamentally different exact preferences (bad), or it could simply lead to humans with the same exact preferences but with a different, closer expression of them in the surface preferences they actually present and are consciously aware of (good). Or the path could lead to someplace in between, obviously. Any drift is bad, I agree, but small enough drift could be acceptable if the trade off is good enough (such as preventing a negative singularity).
By the way, I move to label your definition "exact preference" and mine "complex preference". Unless the context is clear, in which case we can just write "preference". Thoughts?
You are right, I was wrong to claim authority over the meaning of the term as you used it. The actual problem was in you misinterpreting its use in andreas's comment, where it was used in my sense: