AISC project: SatisfIA – AI that satisfies without overdoing it

Jobst Heitzig

AISC project: SatisfIA – AI that satisfies without overdoing it

1 min read11th Nov 2023No comments

11

AI Safety CampMild OptimizationQuantilizationReinforcement LearningRobust AgentsSatisficerAI

This is a linkpost for https://docs.google.com/document/d/1JhmK31IwYGcwqX0nKmxKsbmTh_DX3o1OoW7NJmhVbIw/edit?usp=sharing)

This project is part of the upcoming round of AI Safety Camp.

SatisfIA – AI that satisfies without overdoing it

Summary

This project will make a contribution to some fundamental design aspects of AI systems. We will explore novel designs for generic AI agents – AI systems that can be trained to act autonomously in a variety of environments – and their implementation in software.

Our designs deviate from most existing designs in that they are not based on the idea that the agent should aim to maximize some kind of objective function (which I argue is inherently unsafe if the agent is powerful enough and one cannot be absolutely sure to have found the exactly right objective function). Rather than aiming to maximize some objective function, our agents will aim to fulfill goals that are specified via constraints called “aspirations” (which I argue implies a much lower probability of taking “extreme” actions and therefore is likely much safer).

For example, I might want my AI butler to prepare 100–150 ml of tea, having a temperature of 70–80°C, taking for this at most 10 minutes, spending at most $1 worth of resources, and succeeding in this with at least 95% probability (rather than: prepare as much tea as fast and cheap as possible with the largest possible probability).

For a lightweight introduction into this way of thinking, you can watch this interview.

We will study several versions of such “non-maximizing” agent designs and corresponding learning algorithms (mostly variants of Reinforcement Learning in Markov Decision Problems).

This involves designing agents and algorithms in theory, implementing them in software (mostly Python), simulating their behavior in selected test environments (e.g., in the AI safety gridworlds), formulating hypotheses about that behavior, especially about its safety-relevant consequences, then trying to prove or disprove these hypotheses formally and/or provide numerical evidence that supports them, and writing-up these results in blog posts and an academic paper.

Details

For details, please consult the full proposal

Applying

To apply working on this project, please visit the AI Safety Camp webpage and apply for project #21

AI Safety CampMild OptimizationQuantilizationReinforcement LearningRobust AgentsSatisficerAI

Frontpage

11

New Comment

Moderation Log

LESSWRONG
LW

AISC project: SatisfIA – AI that satisfies without overdoing it

11

SatisfIA – AI that satisfies without overdoing it

Summary

Details

Applying

New to LessWrong?

11