Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Meet Up: [Montreal] AI Safety: Corrigibility

MathieuRoy 06 August 2017 09:02AM

WHEN: 10 August 2017 07:00:00AM (-0400)

WHERE: 2920 Ch de la Tour, Montréal, QC

Meetup Event: https://goo.gl/XRHGS6

Location: room 3195, Pavillon André-Aisenstadt, Université de Montréal

Being able to correct the behavior of an AGI will be the discussion of our next AI Safety meeting. Basically we want to know if we can design a utility function that accepts human intervention (even though it might interfere with its other goals). More specifically, we will discuss the 3 papers below. All results are by necessity about simple toy models. They are meant to direct future research.

Here's a short description of each paper. I recommend you read as much as you can in this order.

Corrigibility (https://intelligence.org/files/Corrigibility.pdf): This sets up the problem and discusses issues with the most obvious approaches.

The Off-Switch Game (https://people.eecs.berkeley.edu/~dhm/papers/off_switch_AAAI_ws.pdf): If the AI is uncertain about the users utility function and treats any action by the user as information about this utility (as in the case of inverse reinforcement learning), then the AI may be amenable to correction. A more exact statement (with caveats) is found in this paper.

Safely Interruptible Agents (http://intelligence.org/files/Interruptibility.pdf): Similar to the previous paper but with a different approach and focus. They ask whether it's possible to interrupt an AI without damaging it's learning (i.e. will it learn as if it never gets interrupted). As above they show this is possible in the case of an uncertain AI. They even manage to show an example of an ideal 'perfect' intelligence for which this is true.

Comments (0)

There doesn't seem to be anything here.