Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Jordan comments on Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future - Less Wrong

11 Post author: inklesspen 01 March 2010 02:32AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Jordan 12 March 2010 10:00:07PM *  0 points [-]

I specifically dispute the usefulness of your definition. It may be a useful definition in the context of FAI theory. We aren't discussing FAI theory.

And, to be fair, you were originally the one disputing definitions. In my post I used the standard definition of 'preference', which you decided was 'wrong', saying

This is misuse of the term "preference"

rather than accepting the implied (normal!) definition I had obviously used.

Regardless, it seems unlikely we'll be making any progress on the on-topic discussion even if we resolve this quibble.

Comment author: Vladimir_Nesov 13 March 2010 01:25:22AM 0 points [-]

I specifically dispute the usefulness of your definition. It may be a useful definition in the context of FAI theory. We aren't discussing FAI theory.

But we do. Whether a particular action is going to end well for humanity is a core consideration in Friendliness. When you say

The route of WBE simply takes the guess work out: actually make people smarter, and then see what the drifted values are.

if it's read as implying that this road is OK, it is a factual claim about how preferable (in my sense) the outcome is going to be. The concept of preference (in my sense) is central to evaluating the correctness of your factual claim.

Comment author: Jordan 13 March 2010 02:19:31AM 0 points [-]

The concept of preference (in my sense) is central to evaluating the correctness of your factual claim.

Your concept of preference is one way of evaluating the correctness of my claim, I agree. If you can resolve the complex web of human preferences (in my sense) into a clean, non-contradictory, static preference system (your sense) then you can use that system to judge the value of the hypothetical future in which WBE research overran FAI research.

It's not clear to me that this is the only way to evaluate my claim, or that it is even a reasonable way. My understanding of FAI is that arriving at such a resolution of human preferences is a central ingredient to building an FAI, hence using your method to evaluate my claim would require more progress on FAI. But the entire point of this discussion is to decide if we should be pushing harder for progress on FAI or WBE. I'll grant that this is a point in favor for FAI -- that it allows for a clearer evaluation of the very problem we're discussing -- but, beyond that, I think we must rely on the actual preferences we have access to now (in my sense: the messy, human ones) to further our evaluations of FAI and WBE.

Comment author: andreas 13 March 2010 03:05:04AM 1 point [-]

It's not clear to me that this is the only way to evaluate my claim, or that it is even a reasonable way. My understanding of FAI is that arriving at such a resolution of human preferences is a central ingredient to building an FAI, hence using your method to evaluate my claim would require more progress on FAI.

If your statement ("The route of WBE simply takes the guess work out") were a comparison between two routes similar in approach, e.g. WBE and neuroenhancement, then you could argue that a better formal understanding of preference would be required before we could use the idea of "precise preference" to argue for one approach or the other.

Since we are comparing one option which does not try to capture preference precisely with an option that does, it does not matter what exactly precise preference says about the second option: Whatever statement our precise preferences make, the second option tries to capture it whereas the first option makes no such attempt.

Comment author: Jordan 14 March 2010 06:06:25AM *  0 points [-]

The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess. Afterwards the next guess as to our fundamental preference is likely different, so the process iterates. The iteration is trying to evolve towards what the agent thinks is its exact preference. The iteration is simply doing so to some sort of "first order" approximation.

For the first option, I think self-modification under the direction of current, apparent preferences should be done with extreme caution, so as to get a better 'approximation' at each step. For the second option though, it's hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.

Comment author: andreas 14 March 2010 08:52:09AM *  1 point [-]

The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess.

This guess may be awful. The process of emulation and attempts to increase the intelligence of the emulations may introduce subtle psychological changes that could affect the preferences of the persons involved.

For subsequent changes based on "trying to evolve towards what the agent thinks is its exact preference" I see two options: Either they are like the first change, open to the possibility of being arbitrarily awful due to the fact that we do not have much introspective insight into the nature of our preferences, and step by step we lose part of what we value — or subsequent changes consist of the formalization and precise capture of the object preference, in which case the situation must be judged depending on how much value was lost in the first step vs how much value was gained by having emulations work on the project of formalization.

For the second option though, it's hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.

This is not the proposal under discussion. The proposal is to build a tool that ensures that things develop according to our wishes. If it turns out that our preferred (in the exact, static sense) route of development is through a number of systems that are not reflectively consistent themselves, then this route will be realized.

Comment author: Jordan 15 March 2010 01:21:33AM *  2 points [-]

This guess may be awful.

It may be horribly awful, yes. The question is "how likely is it be awful?"

If FAI research can advance fast enough then we will have the luxury of implementing a coherent preference system that will guarantee the long term stability of our exact preferences. In an ideal world that would be the path we took. In the real world there is a downside to the FAI path: it may take too long. The benefit of other paths is that, although they would have some potential to fail even if executed in time, they offer a potentially faster time table.

I'll reiterate: yes, of course FAI would be better than WBE, if both were available. No, WBE provides no guarantee and could lead to horrendous preference drift. The questions are: how likely is WBE to go wrong? how long is FAI likely to take? how long is WBE likely to take? And, ultimately, combining the answers to those questions together: where should we be directing our research?

Your post points out very well that WBE might go wrong. It gives no clue to the likelihood though.

Comment author: andreas 15 March 2010 02:00:07AM 0 points [-]

Good, this is progress. Your comment clarified your position greatly. However, I do not know what you mean by "how long is WBE likely to take?" — take until what happens?

Comment author: Jordan 15 March 2010 11:05:21PM 0 points [-]

The amount of time until we have high fidelity emulations of human brains. At that point we can start modifying/enhancing humans, seeking to create a superintelligence or at least sufficiently intelligent humans that can then create an FAI. The time from first emulation to superintelligence is nonzero, but is probably small compared to the time to first emulation. If we have reason to believe that the additional time is not small we should factor in our predictions for it as well.

Comment author: andreas 15 March 2010 11:39:25PM 1 point [-]

My conclusion from this discussion is that our disagreement lies in the probability we assign that uploads can be applied safely to FAI as opposed to generating more existential risk. I do not see how to resolve this disagreement right now. I agree with your statement that we need to make sure that those involved in running uploads understand the problem of preserving human preference.

Comment author: Vladimir_Nesov 13 March 2010 02:31:02AM 0 points [-]

We do understand something about exact preferences in general, without knowing which one of them is ours. In particular, we do know that drifting from whatever preference we have is not preferable.

Comment author: Jordan 14 March 2010 06:00:01AM 0 points [-]

I agree. If our complex preferences can be represented as exact preferences then any drift from those exact preferences would be necessarily bad. However, it's not clear to me that we actually would be drifting from our exact preference were we to follow the path of WBE.

It's clear that the preferences we currently express most likely aren't our exact preferences. The path of WBE could potentially lead to humans with fundamentally different exact preferences (bad), or it could simply lead to humans with the same exact preferences but with a different, closer expression of them in the surface preferences they actually present and are consciously aware of (good). Or the path could lead to someplace in between, obviously. Any drift is bad, I agree, but small enough drift could be acceptable if the trade off is good enough (such as preventing a negative singularity).

By the way, I move to label your definition "exact preference" and mine "complex preference". Unless the context is clear, in which case we can just write "preference". Thoughts?

Comment author: Vladimir_Nesov 13 March 2010 01:16:54AM *  0 points [-]

And, to be fair, you were originally the one disputing definitions. In my post I used the standard definition of 'preference', which you decided was 'wrong', [...] rather than accepting the implied (normal!) definition I had obviously used.

You are right, I was wrong to claim authority over the meaning of the term as you used it. The actual problem was in you misinterpreting its use in andreas's comment, where it was used in my sense:

We do want our current preferences to be made reality (because that's what the term preference describes)