Does it make sense to consider misuse and autonomy risks under the same framework? Potential for misuse doesn't appear to be difficult to detect and address. Autonomy risk is much harder to assess due to our current lack of understanding around the dynamics of emergence. Addressing the Knightian uncertainty around what conditions may give rise to autonomy at the very least deserves a different approach as compared to more quantifiable misuse risks. ARC's autonomous replication evals address this partially but a sufficiently advanced agent may be able to evade detection.
Does it make sense to consider misuse and autonomy risks under the same framework? Potential for misuse doesn't appear to be difficult to detect and address. Autonomy risk is much harder to assess due to our current lack of understanding around the dynamics of emergence. Addressing the Knightian uncertainty around what conditions may give rise to autonomy at the very least deserves a different approach as compared to more quantifiable misuse risks. ARC's autonomous replication evals address this partially but a sufficiently advanced agent may be able to evade detection.