User Comment Replies

AGI Safety FAQ / all-dumb-questions-allowed thread

Doesn't the exact same argument work for alignment though? "It's so different, it may be misaligned in ways you can't think of". Why is it treated as a solvable challenge for alignment and an impossibility for containment? Is the guiding principle that people do expect a foolproof alignment solution to be within our reach?

One difference is that the AI wants to escape containment by default, almost by definition, but is agnostic about preferring a goal function. But since alignment space is huge (i.e. "human-compatible goals are measure 0 in alignment space... (read more)

1Jay Bailey3y

The main difference that I see is, containment supposes that you're actively opposed to the AGI in some fashion - the AGI wants to get out, and you don't want to let it. This is believed by many to be impossible. Thus, the idea is that if an AGI is unaligned, containment won't work - and if an AGI is aligned, containment is unnecessary. By contrast, alignment means you're not opposed to the AGI - you want what the AGI wants. This is a very difficult problem to achieve, but doesn't rely on actually outwitting a superintelligence. I agree that it's hard to imagine what a foolproof alignment solution would even look like - that's one of the difficulties of the problem.

AGI Safety FAQ / all-dumb-questions-allowed thread

Reuven Falkovich3y60

My impression is that much more effort being put into alignment than containment, and containment is treated as impossible while alignment merely very difficult. Is it accurate? If so, why? By containment I mean mostly hardware-coded strategies of limiting the compute and/or world-influence an AGI has access to. It's similar to alignment in that the most immediate obvious solutions ("box!") won't work, but more complex solutions may. A common objection is that an AI will learn the structure of the protection from the human that built it and work around, bu... (read more)

3Kaj_Sotala3y

There's also the problem that the more contained an AGI is, the less useful it is. The maximally safe AGI would be one which couldn't communicate or interact with us in any way, but what would be the point of building it? If people have built an AGI, then it's because they'll want it to do something for them. From Disjunctive Scenarios of Catastrophic AGI Risk:

3Jay Bailey3y

I believe the general argument is this: If an AGI is smarter than you, it will think of ways to escape containment that you can't think of. Therefore, it's unreasonable to expect us to be able to contain a sufficiently intelligent AI even if it seems foolproof to us. One solution to this would be to make the AI not want to escape containment, but if you've solved that you've solved a massive part of the alignment problem already.

LESSWRONG
LW

All of Reuven Falkovich's Comments + Replies