I was listening to Anders Sandberg talk about "humble futures" (i.e., futures that may be considered good in a sober non-"let's tile the universe with X" way), and started wondering whether training (not yet proven safe) AIs to have such "humble" scope-insensitive-ish goals -- which seems more tractable than (complete) value alignment -- might disincentivize the AI from incapacitating humans?
Why would it disincentivize it this way? I have some ideas but I thought I wouldn't flesh them out here to make sure people don't anchor on the particular scenarios I have in mind.
Here's an AI-generated image of a scope-insensitive AI chilling with a cup of tea to help you think:
Thanks! I guess myopia is a specific example of one form of scope-insensitivity (which has to do with longterm thinking, according to this at least), yes.
> This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don't yet know how to instill them in a system via ML training.
I didn't follow discussions around myopia and didn't have this context (e.g., I thought maybe people didn't find myopia promising at all to begin with or something) so thanks a lot. That's very helpful.