For how maximizing a misaligned proxy utility function can go wrong, there are already many concrete examples (e.g., the "no clickbait" database or Gao et al., 2022), some theoretical models (e.g., Zhuang et al., 2021), and discussions (e.g., this post, this AISC team report).

In the context of the SatisfIA project, we came up with two more models, one motivated by a pure exchange model (a standard model of a market), the other assuming that the agent estimates utility from the provided ranking among a sample of candidate actions.
Although these are toy models for real situations, they may be interesting for further investigation of the conditions under which Goodhart-style behavior occurs.

Model 1: Purchasing

... (read 2620 more words →)

[Aspiration-based designs] Outlook: dealing with complexity

Jobst Heitzig

Jobst Heitzig, jossoliver, thomasfinn, Simon Dima

Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues.

Multi-dimensional aspirations

For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets:

Assume there are $d > 1$ many evaluation metrics $u_{i}$ , combined into a vector-valued evaluation metric $u = (u_{1}, \dots, u_{d})$ .
Preparation: Pick $d + 1$ many linear combinations $f_{j}$ in the space spanned by these metrics so that their convex hull is full-dimensional and contains

... (read 464 more words →)

[Aspiration-based designs] 2. Formal framework, basic algorithm

Jobst Heitzig

Jobst Heitzig, Simon Dima, Simon Fischer

Summary. In this post, we present the formal framework we adopt during the sequence, and the simplest form of the type of aspiration-based algorithms we study. We do this for a simple form of aspiration-type goals: making the expectation of some variable equal to some given target value. The algorithm is based on the idea of propagating aspirations along time, and we prove that the algorithm gives a performance guarantee if the goal is feasible. Later posts discuss safety criteria, other types of goals, and variants of the basic algorithm.

Assumptions

In line with the working hypotheses stated in the previous post, we assume more specifically the following in this post:

The agent is a

... (read 4638 more words →)

[Aspiration-based designs] 1. Informal introduction

B Jacobs

B Jacobs, Jobst Heitzig, Simon Fischer, Simon Dima

Sequence Summary. This sequence documents research by SatisfIA, an ongoing project on non-maximizing, aspiration-based designs for AI agents that fulfill goals specified by constraints ("aspirations") rather than maximizing an objective function. We aim to contribute to AI safety by exploring design approaches and their software implementations that we believe might be promising but neglected or novel. Our approach is roughly related to but largely complementary to concepts like quantilization and satisficing (sometimes called "soft-optimization"), Decision Transformers, and Active Inference.

This post describes the purpose of the sequence, motivates the research, describes the project status, our working hypotheses and theoretical framework, and has a short glossary of terms. It does not contain results and... (read 2298 more words →)

LESSWRONG
LW

LESSWRONG
LW

Simon Dima

[Aspiration-based designs] 1. Informal introduction

[Aspiration-based designs] 2. Formal framework, basic algorithm

[Aspiration-based designs] Outlook: dealing with complexity

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

Simon Dima

Simon Dima

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

[Aspiration-based designs] Outlook: dealing with complexity

[Aspiration-based designs] 2. Formal framework, basic algorithm

[Aspiration-based designs] 1. Informal introduction

Simon Dima

[Aspiration-based designs] 1. Informal introduction

[Aspiration-based designs] 2. Formal framework, basic algorithm

[Aspiration-based designs] Outlook: dealing with complexity

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

Simon Dima

Simon Dima

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

[Aspiration-based designs] Outlook: dealing with complexity

[Aspiration-based designs] 2. Formal framework, basic algorithm

[Aspiration-based designs] 1. Informal introduction

Model 1: Purchasing

Multi-dimensional aspirations

Assumptions