I'm trying think through various approaches to AI alignment, and so far this is the one I came up with that I like best. I have not read much of the literature, so please do point me if this has been discussed before.
What if we train an AI agent (ie, reinforcement learning) to survive/thrive in an environment where there are a wide variety of agents with wildly different levels of intelligence? In particular, such that pretty much every agent can safely assume they'll eventually meet an agent much smarter than they are; structure the environment to reward tit-for-tat with a significant bias towards cooperation, eg require agents to "eat" resources that require... (read more)
You are right, that's not a valid assumption, at least not fully. But I do think this approach substantially moves the needle on whether we should try to ban all AI work, in a context where the potential benefits are also incalculable and it's not at all clear we could stop AGI at this point even with maximum effort.
Yeah that sounds right. My thesis... (read more)