dp

Message

Only a hack can solve the shutdown problem

(My first post on LessWrong. It seems the most recent Welcome Thread is from 2020, so I'm making a top-level post. This an edited version of my submission to the AI Alignment Awards.) Abstract: First, we offer a formalisation of the shutdown problem from [1], and we show that solutions...

Jul 15, 2023•5

Message

4 karma

1 post

Member for 3 years

dp — LessWrong

dp

Message

dp

Only a hack can solve the shutdown problem

Jul 15, 2023•5

Message

4 karma

1 post

Member for 3 years

Only a hack can solve the shutdown problem

(My first post on LessWrong. It seems the most recent Welcome Thread is from 2020, so I'm making a top-level post. This an edited version of my submission to the AI Alignment Awards.)

Abstract: First, we offer a formalisation of the shutdown problem from [1], and we show that solutions are essentially unique. Second, we formally define ad-hoc constructions ("hacks"). Last, we present one trivial ad-hoc construction for the shutdown problem and show that every solution to the shutdown problem must come from an ad-hoc construction.

1.Introduction

The shutdown problem is the problem of programming an agent so that it behaves useful during normal operation and facilitates a shutdown if and only if the creator... (read 2142 more words →)

LESSWRONG
LW

LESSWRONG
LW

dp

dp

dp

Only a hack can solve the shutdown problem

dp

dp

dp

Only a hack can solve the shutdown problem

1.Introduction