(I'm liking my analogy even though it is an obvious one.)
To me, it feels like we're at the moment when Szilard has conceived of the chain reaction, letters to presidents are getting written, and GPT-3 was a Fermi pile-like moment.
I would give it a 97% chance you feel we are not nearly there, yet. (And I should quit creating scientific by association feelings. Fair point.)
To me, I am convinced intelligence is a superpower because the power and control we have over all the other animals. That is enough evidence for me to believe the boom co...
I am applying myself to try and come up with experiments. I have a kernel of an idea I'm going to hound some Eval experts with and make sure it is already being performed.
A rationalist and an empiricist went backpacking together. They got lost, ended up in a desert, and were on the point of death from thirst. They wander to a point where they can see a cool, clear stream in the distance but unfortunately there is a sign that tells them to BEWARE THE MINE FIELD between them and the stream.
The rationalist says, "Let's reason through this and find a path." The empiricist says, "What? No. We're going to be empirical. Follow me." He starts walking through the mind field and gets blown to bits ...
Totally agreed that we are fumbling in the dark. (To me, though, I'm fairly convinced there is a cliff out there somewhere given that intelligence is a superpower.)
And, I also agree on the need to be empirical. (Of course, there are some experiments that scare me.)
I am hoping that, just maybe, this framing (Human Worth Hypothesis) will lead to experiments.
I would predict your probability of doom is <10%. Am I right? And no judgment here!! I'm testing myself.
I interpret people who disbelieve Orthogonality to think there is some cosmic guardrail that protects against such process failures like poor seeking. How? What mechanism? No idea. But I believe they believe that. Hence my inclusion of "...regardless of the process to create the intelligence."
Most readers of Less Wrong believe Orthogonality.
But, I think the term is confusing and we need to talk about it in simpler terms like Human Worth Hypothesis. (Put the cookies on the low shelf for the kids.)
And, its worth some creative effort ...
Well, if it doesn't really value humans, it could demonstrate good behavior, deceptively, to make it out of training. If it is as smart as a human, it will understand that.
I think there are a lot of people banking on the good behavior towards humans being intrinsic: Intelligence > Wisdom > Benevolence towards these sentient humans. That's what I take Scott Aaronson to be arguing.
In addition to people like Scott who engage directly with the concept of Orthogonality, I feel like everyone saying things like "Those terminator sci-fi scenarios...
I believe you are predicting that resource constraints will be unlikely. To use my analogy from the post, you are saying we will likely be safer because the ASI will not require our habitat for its highway. There are so many other places for it to build roads.
I do not think that is a case that it values our wellbeing...just that it will not get around to depriving us of resources because of a cost/benefit analysis.
Do you think the Human Worth hypothesis is likely true? That the more intelligent an agent is the more it will positively value human wellbeing?
One experiment is worth more than all the opinions.
IMHO, no, there is not a coherent argument for the human worth hypothesis. My money is on it being disproven.
But, I assert the human worth hypothesis is the explicit belief of smart people like Scott Aaronson and the implicit belief of a lot of other people who think AI will be just fine. As Scott says Orthogonality is "a central linchpin" of the doom argument.
Can we be more clear about what people do believe at get at it with experiments?? That's the question I'm asking.
It's hard ...
Okay, a "hard zone" rather than a no-go zone. Which begs the question "How hard?" and consequently how much comfort should one take in the belief?
Thank you for reading and commenting.
Yes. Valid. How to avoid reducing to a toy problem or such narrowing assumptions (in order to achieve a proof) that allows Mr. CEO to dismiss it.
When I revise, I'm going to work backwards with CEO/Senator dialog in mind.
Agreed. Proof or disproof should win.
All the way up meaning at increasing levels of intelligence…your 10,000 becomes 100,000X, etc.
At some level of performance, a moral person faces new temptations because of increased capabilities and greater power for damage, right?
In other words, your simulation may fail to be aligned at 20,000...30,000...
Okay, maybe I'm moving the bar, hopefully not and this thread is helpful...
Your counter-example, your simulation would prove that examples of aligned systems - at a high level - are possible. Alignment at some level is possible, of course. Functioning thermostats are aligned.
What I'm trying to propose is the search for a proof that a guarantee of alignment - all the way up - is mathematically impossible. We could then make the statement: "If we proceed down this path, no one will ever be able to guarantee that humans remain in control." &...
Great question. I think the answer must be "yes." The alignment-possible provers must get the prize, too.
And, that would be fantastic. Proving a thing is possible, accelerates development. (US uses atomic bomb. Russia has it 4 years later.) Okay, it would be fantastic if the possible proof did not create false security in the short term. It's important when alignment gets solved. A peer-reviewed paper can't get the coffee. (That thought is an aside and not enough to kill the value of the prize, IMHO. I...
I envision the org that offers the prize, after broad expert input, would set the definitions and criteria.
Yes, surely the definition/criteria exercise would be a hard thing...but hopefully valuable.
Yes, surely the proof would be very difficult or impossible. However, enough people have the nagging worry that it is impossible to justify the effort to see if we can prove that it is impossible...and update.
But, if the effort required for a proof is - I don't know - 120 person months - let's please, Humanity, not walk right past that one into the blades.
I am not advocating that we divert dozens of people from promising alignment work.
Even if it failed, I would hope the prove-impossibility effort would throw off beneficial by-products like:
Like dr_s stated, I'm contending that proof would be qualitatively different from "very hard" and powerful ammunition for advocating a pause...
Senator X: “Mr. CEO, your company continues to push the envelope and yet we now have proof that neither you nor anyone else will ever be able to guarantee that humans remain in control. You talk about safety and call for regulation but we seem to now have the answer. Human control will ultimately end. I repeat my question: Are you consciously working to replace humanity? Do you have children, sir?”...
I would love to see a video or transcript of this technique in action in a 1:1 conversation about ai x-risk.
Answer to my own question: https://www.youtube.com/watch?v=0VBowPUluPc
Thank you. You are helping my thinking.