Jeffs - LessWrong

Orthogonality or the "Human Worth Hypothesis"?

Thank you. You are helping my thinking.

Orthogonality or the "Human Worth Hypothesis"?

(I'm liking my analogy even though it is an obvious one.)

To me, it feels like we're at the moment when Szilard has conceived of the chain reaction, letters to presidents are getting written, and GPT-3 was a Fermi pile-like moment.

I would give it a 97% chance you feel we are not nearly there, yet. (And I should quit creating scientific by association feelings. Fair point.)

To me, I am convinced intelligence is a superpower because the power and control we have over all the other animals. That is enough evidence for me to believe the boom could be big. Humanity was a pretty big "boom" if you are a chimpanzee.

The empiricist in me (and probably you) says: "Feelings are worthless. Do an experiment."

The rationalist in me says: "Be careful which experiments you do." (Yes, hope stick is long enough as you say.)

In any event, we agree on: "Do some experiments with a long stick. Quickly." Agreed!

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y10

I am applying myself to try and come up with experiments. I have a kernel of an idea I'm going to hound some Eval experts with and make sure it is already being performed.

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y10

A rationalist and an empiricist went backpacking together. They got lost, ended up in a desert, and were on the point of death from thirst. They wander to a point where they can see a cool, clear stream in the distance but unfortunately there is a sign that tells them to BEWARE THE MINE FIELD between them and the stream.

The rationalist says, "Let's reason through this and find a path." The empiricist says, "What? No. We're going to be empirical. Follow me." He starts walking through the mind field and gets blown to bits a few steps in.

The rationalist sits down and dies of thirst.

Alternate endings:

The rationalist gets killed by flying shrapnel along with the empiricist.
The rationalist grabs the empiricist and stops him. He carefully analyzes dirt patterns, draws a map, and tells the empiricist to start walking. The empiricist blows up. The rationalist sits down and dies of thirst chanting "The map is not the territory."
The rationalist grabs the empiricist, analyzes dirt patterns, draws map, tells empiricist to start walking. Empiricist blows up. Rationalist says, "Hmmm. Now I understand the dirt patterns better." Rationalist redraws map. Walks through mind field. While drinking water takes off fleece to reveal his "Closet Empiricist" t-shirt.
They sit down together, figure out how to find some magnetic rocks, build a very crude metal detector, put it on the end of a stick, and start making their way slowly through the mine field. Step on a mine and a nuclear mushroom cloud erupts.

So how powerful are those dad gum land mines?? Willingness to perform certain experiments should be a function of the expected size of the boom.

If you think you are walking over sand burs and not land mines, you are more willing to be an empiricist exploring the space. "Ouch don't step there" instead of "Boom. <black screen>"

If one believes that smarter things will see >0 value in humanity, that is, if you believe some version of the Human Worth Hypothesis, then you believe the land mines are less deadly and it makes sense to proceed...especially for that clear, cool water that could save your life.

I'm not really making a point, here, but just turning the issues into a mental cartoon, I guess.

Okay, well, I guess I am trying to make one point: There are experiments one should not perform.

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y10

Totally agreed that we are fumbling in the dark. (To me, though, I'm fairly convinced there is a cliff out there somewhere given that intelligence is a superpower.)

And, I also agree on the need to be empirical. (Of course, there are some experiments that scare me.)

I am hoping that, just maybe, this framing (Human Worth Hypothesis) will lead to experiments.

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y30

I would predict your probability of doom is <10%. Am I right? And no judgment here!! I'm testing myself.

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y10

I interpret people who disbelieve Orthogonality to think there is some cosmic guardrail that protects against such process failures like poor seeking. How? What mechanism? No idea. But I believe they believe that. Hence my inclusion of "...regardless of the process to create the intelligence."

Most readers of Less Wrong believe Orthogonality.

But, I think the term is confusing and we need to talk about it in simpler terms like Human Worth Hypothesis. (Put the cookies on the low shelf for the kids.)

And, its worth some creative effort to design experiments to test the Human Worth hypothesis.

Imagine the headline: "Experiments demonstrate that frontier AI models do not value humanity."

If it were believable, a lot of people would update.

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y21

Well, if it doesn't really value humans, it could demonstrate good behavior, deceptively, to make it out of training. If it is as smart as a human, it will understand that.

I think there are a lot of people banking on the good behavior towards humans being intrinsic: Intelligence > Wisdom > Benevolence towards these sentient humans. That's what I take Scott Aaronson to be arguing.

In addition to people like Scott who engage directly with the concept of Orthogonality, I feel like everyone saying things like "Those terminator sci-fi scenarios are crazy!" are expressing a version of the Human Worth Hypothesis. They are saying approximately: "Oh, cmon, we made it. It's going to like us. Why would it hate us?"

I'm suggesting we try and put this Human Worth Hypothesis to the test.

It feels like a lot is riding on it.

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y43

I believe you are predicting that resource constraints will be unlikely. To use my analogy from the post, you are saying we will likely be safer because the ASI will not require our habitat for its highway. There are so many other places for it to build roads.

I do not think that is a case that it values our wellbeing...just that it will not get around to depriving us of resources because of a cost/benefit analysis.

Do you think the Human Worth hypothesis is likely true? That the more intelligent an agent is the more it will positively value human wellbeing?

Orthogonality or the "Human Worth Hypothesis"?

Jeffs1y63

One experiment is worth more than all the opinions.

IMHO, no, there is not a coherent argument for the human worth hypothesis. My money is on it being disproven.

But, I assert the human worth hypothesis is the explicit belief of smart people like Scott Aaronson and the implicit belief of a lot of other people who think AI will be just fine. As Scott says Orthogonality is "a central linchpin" of the doom argument.

Can we be more clear about what people do believe at get at it with experiments?? That's the question I'm asking.

It's hard to construct experiments to prove all kinds of minds are possible, that is, to prove Orthogonality.

I think it may be less hard to quantify what an agent values. (Deception, yes. Still...)

LESSWRONG
LW

Posts

Wikitag Contributions

Comments