2 (naive?) ideas for alignment

Jonathan Moregård

3

[ Question ]

2 (naive?) ideas for alignment

by Jonathan Moregård

20th Feb 2022

1 min read

A

1 1

3 Context

I'm interested in working on alignment, coming from a programmer/Haskell background. I have two ideas that are unlikely to info hazards, that I want to post here in order to get feedback/info about prior work.

Both are very rough/early stage, view this as me presenting a pre-MVP to get some very early feedback.

Idea 1 - "Love is that which enables choice" (Inspired by Forrest Landry)

This is an idea for a potential goal/instruction for AI (can't recall the fancy term). The idea is to make an AI that optimizes for optionality: maximizing the total sum of agency for all human and non-human agents. Agency is here loosely defined as "Optional ability to make changes to the world".

Making it the sum total would discourage situations where the ability of one person to affect change would hamper the ability of someone else.

Idea 2 - Segmented gradient descent training optimized for collaboration between different agents

This is an idea for a potential training method, that I think may have a big attractor basin for collaborative traits. The idea is to have some kind of gradient descent-esque training where AI agents of varying calibres/types are put in training scenarios in which a premium is put on collaboration. This is run in multiple iterations, where AI that successfully collaborates with other agents get to continue.

The hardest thing about this is that we want an AI that is cooperative, but we do not want an AI that is naive, as this would lead to situations where terrorists convince the AI to do stupid shit. We could try to model this on human (cultural/biological) evolution.

One thing I like about this idea is that it might lead to AI that develops behavioural patterns akin to those found in herd animals (including humans). This would make the AI easier to reason about, and more likely to develop something akin to ethical behaviour.

AI

Frontpage

3

New Answer

New Comment

1 Answers sorted by
top scoring

RHollerith

Feb 21, 2022

20

I like the first idea better than the second

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

3

[ Question ]

2 (naive?) ideas for alignment

3

Context

Idea 1 - "Love is that which enables choice" (Inspired by Forrest Landry)

Idea 2 - Segmented gradient descent training optimized for collaboration between different agents

3

1 Answers sorted by
top scoring

Feb 21, 2022

3

3

[ Question ]

2 (naive?) ideas for alignment

3

Context

Idea 1 - "Love is that which enables choice" (Inspired by Forrest Landry)

Idea 2 - Segmented gradient descent training optimized for collaboration between different agents

3

1 Answers sorted by top scoring

Feb 21, 2022

3

1 Answers sorted by
top scoring