GPT-4's answer fwiw (with the extra prompt "please, respond as if you were the most brilliant logntermist AI safety researcher"):
When considering the development of artificial general intelligence (AGI) and the implications for humanity, a key concern is ensuring that AGIs act in ways that are beneficial rather than harmful. One proposal to address this concern is to make AGIs scope-insensitive. To analyze whether this approach could make AGIs less likely to harm humanity, we'll first define and then discuss the implications of scope-insensitivity within AGI goals.
Scope-insensitivity in this context refers to goals that do not excessively amplify their aims with scale. For instance, a scope-insensitive AGI designed to reduce suffering might not distinguish much between preventing a minor inconvenience for millions of people and alleviating a major disaster for a few people. These AGIs would have a form of goal-setting where small-scale improvements and large-scale interventions do not differ significantly in perceived value, avoiding the classic trap of prioritizing extreme versions of their objectives at the cost of human well-being.
Training AGIs to have scope-insensitive or modest goals could significantly reduce the likelihood of them incapacitating humanity, as their approach would inherently avoid extreme, radical solutions. Instead, these AGIs would favor incremental and potentially more cooperative strategies. Nonetheless, comprehensive safeguards and iterative monitoring are essential to ensure these systems remain aligned and truly beneficial in all scales of their actions. Integrating scope-insensitivity with broader alignment frameworks might offer a robust path forward in AGI safety research, ensuring that AGIs act in ways that uphold and enhance human values and well-being.
I think the key word you want to search for is "Myopia". This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don't yet know how to instill them in a system via ML training.
Thanks! I guess myopia is a specific example of one form of scope-insensitivity (which has to do with longterm thinking, according to this at least), yes.
> This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don't yet know how to instill them in a system via ML training.
I didn't follow discussions around myopia and didn't have this context (e.g., I thought maybe people didn't find myopia promising at all to begin with or something) so thanks a lot. That's very helpful.
I was listening to Anders Sandberg talk about "humble futures" (i.e., futures that may be considered good in a sober non-"let's tile the universe with X" way), and started wondering whether training (not yet proven safe) AIs to have such "humble" scope-insensitive-ish goals -- which seems more tractable than (complete) value alignment -- might disincentivize the AI from incapacitating humans?
Why would it disincentivize it this way? I have some ideas but I thought I wouldn't flesh them out here to make sure people don't anchor on the particular scenarios I have in mind.
Here's an AI-generated image of a scope-insensitive AI chilling with a cup of tea to help you think: