All of ssadler's Comments + Replies

So the first one is an "AGSI", and the second is an "ANSI" (general vs narrow)?

If I understand correctly... One type of alignment (required for the "AGSI") is what I'm referring to as alignment which is that it is conscious of all of our interests and tries to respect them, like a good friend, and the other is that it's narrow enough in scope that it literally just does that one thing, way better than humans could, but the scope is narrow enough that we can hopefully reason about it and have an idea that it's safe.

Alignment is kind of a confusing term if a... (read more)

Thanks for your response! Could you explain what you mean by "fully general"? Do you mean that alignment of narrow SI is possible? Or that partial alignment of general SI is good enough in some circumstance? If it's the latter could you give an example?

1Joe Collman
By "fully general" I mean something like "With alignment process x, we could take the specification of any SI, apply x to it, and have an aligned version of that SI specification". (I assume almost everyone thinks this isn't achievable) But we don't need an approach that's this strong: we don't need to be able to align all, most, or even a small fraction of SIs. One is enough - and in principle we could build in many highly specific constraints by construction (given sufficient understanding). This still seems very hard, but I don't think there's any straightforward argument for its impossibility/intractability. Most such arguments only work against the more general solutions - i.e. if we needed to be able to align any SI specification. Here's a survey of a bunch of impossibility results if you're interested. These also apply to stronger results than we need (which is nice!).

The same game theory that has all the players racing to improve their models in spite of ethics and safety concerns will have them getting the models to self improve if that provides an advantage.