Eugene D - LessWrong

Superintelligent AI is necessary for an amazing future, but far from sufficient

Is super-intelligent AI necessarily AGI (for this amazing future), or can it be ANI ?

i.e. why insist on all of the work-arounds we force with pursuing AGI, when, with ANI, don't we already have Safety, Alignment, Corrigibility, Reliability, and super-human ability, today?

Eugene

Where I agree and disagree with Eliezer

Eugene D3y21

OK thanks, I guess I missed him differentiating between 'solve alignment first, then trust', versus 'trusting first, given enough intelligence'. Although I think one issue w/having a proof is that we (or a million monkeys, to paraphrase him) still won't understand the decisions of the AGI...? ie we'll be asked to trust the prior proof instead of understanding the logic behind each future decision/step which the AGI takes? That also bothers me, because, what are the tokens which comprise a "step"? Does it stop 1,000 times to check with us that we're comfortable with, or understand, its next move?

However, since, it seems, we can't explain much of the decisions of our current ANI, how do we expect to understand future ones? He mentions that we may be able to, but only by becoming trans-human.

Where I agree and disagree with Eliezer

Eugene D3y10

Thank you--btw before I try responding to other points, here's the Ben G vid to which I'm referring. Starting around 52m, for a few minutes, for that particular part anyway:

Where I agree and disagree with Eliezer

Eugene D3y10

I've heard a few times that AI experts both 1) admit we don't know much about what goes on inside, even as it stands today, and 2) we expect to extend more trust to the AI even as capabilities increase (most recently Ben Goertzel).

I'm curious to know if you expect explainability to increase in correlation with capability? i.e. or can we use Ben's analogy that 'I expect my dog to trust me, both bc I'm that much smarter, and I have a track-record of providing food/water for him' ?

thanks!

Eugene

Where I agree and disagree with Eliezer

Eugene D3y20

I wonder when Alignment and Capability will finally be considered synonymous, so that the efforts merge into one -- bc that's where any potential AI-safety lives, I would surmise.

Where I agree and disagree with Eliezer

Eugene D3y60

I for one really appreciate the 'dumb-question' area :)

AGI Safety FAQ / all-dumb-questions-allowed thread

Eugene D3y60

When AI experts call upon others to ponder, as EY just did, "[an AGI] meant to carry out some single task" (emphasis mine), how do they categorize all the other important considerations besides this single task?

Or, asked another way, where do priorities come into play, relative to the "single" goal? e.g. a human goes to get milk from the fridge in the other room, and there are plentiful considerations to weigh in parallel to accomplishing this one goal -- some of which should immediately derail the task due to priority (I notice the power is out, i stub my toe, someone specific calls for me with a sense of urgency from a different room, etc, etc).

And does this relate at all to our understanding of how to make AGI corrigible?

many thanks,

Eugene

https://www.lesswrong.com/posts/AqsjZwxHNqH64C2b6/let-s-see-you-write-that-corrigibility-tag

AGI Safety FAQ / all-dumb-questions-allowed thread

Eugene D3y10

Does this remind you of what I'm trying to get at? bc it sure does, to me:

https://twitter.com/ESYudkowsky/status/1537842203543801856?s=20&t=5THtjV5sUU1a7Ge1-venUw

but I'm prob going to stay in the "dumb questions" area and not comment :)

ie. "the feeling I have when someone tries to teach me that human-safety is orthogonal to AI-Capability -- in a real implementation, they'd be correlated in some way"

AGI Safety FAQ / all-dumb-questions-allowed thread

Eugene D3y32

That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let's say), but then, bc it's orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you're saying it will remain 'aligned' in some vitally-important way, even when it discovers ways the code could've been written differently?

AGI Safety FAQ / all-dumb-questions-allowed thread

Eugene D3y10

thank you. Make some sense...but does "rewriting its own code" (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?

LESSWRONG
LW

Posts

Wikitag Contributions

Comments