(My first post on LessWrong. It seems the most recent Welcome Thread is from 2020, so I'm making a top-level post. This an edited version of my submission to the AI Alignment Awards.)
Abstract: First, we offer a formalisation of the shutdown problem from [1], and we show that solutions are essentially unique. Second, we formally define ad-hoc constructions ("hacks"). Last, we present one trivial ad-hoc construction for the shutdown problem and show that every solution to the shutdown problem must come from an ad-hoc construction.
1.Introduction
The shutdown problem is the problem of programming an agent so that it behaves useful during normal operation and facilitates a shutdown if and only if the creator... (read 2142 more words →)