If a superintelligent AI is guaranteed to be manipulative (instrumental convergence) how can we validate any solution to the alignment problem? Afaik, we can't even guarantee that a model optimizes to the defined objective due to mesa optimizers. So that adds more complexity to a seemingly unanswerable problem.

My other question is, people here seem to think of intelligence as single dimension type of thing. But I always maintained the belief that the type of reasoning useful in scientific discovery does not necessarily unlock the secret of human communication or understanding them. I think we have different inner mechanisms that allow such a thing. For example, we can't assimilate ourselves in a Chimpanzee society, or make them do actions through pure manipulation, even though we're more intelligent. We're good at somethings and entirely incompetent at some others. If such intelligence exists that understands the world better than us across all domains, is it at all likely that this will happen with a single breakthrough or through a runaway effect?

1

0