Definition of AI Friendliness

How will we know if future AI’s (or even existing planners) are making decisions that are bad for humans unless we spell out what we think is unfriendly?

At a machine level the AI would be recursively minimising cost functions to produce the most effective plan of action to achieve the goal, but how will we know if its decision is going to cause harm?

Is there a model or dataset which describes what is friendly to humans? e.g.

Context

0 - running a simulation in a VM

2 - physical robot with vacuum attachment

9 - full control of a plane

Actions

0 - selecting a song to play

5 - deciding which section of floor to vacuum

99 - deciding who is an ‘enemy’

9999 - aiming a gun at an ‘enemy’

Impact

1 - poor song selected to play, human mildly annoyed

2 - ineffective use of resources (vacuuming the same floor section twice)

99 - killing a human

99999 - killing all humans

This may not be possible to get agreement from all countries/cultures/beliefs, but it is something we should discuss and attempt to get some agreement.

How will we know if future AI’s (or even existing planners) are making decisions that are bad for humans unless we spell out what we think is unfriendly?

At a machine level the AI would be recursively minimising cost functions to produce the most effective plan of action to achieve the goal, but how will we know if its decision is going to cause harm?

Is there a model or dataset which describes what is friendly to humans? e.g.

Context

0 - running a simulation in a VM

2 - physical robot with vacuum attachment

9 - full control of a plane

Actions

0 - selecting a song to play

5 - deciding which section of floor to vacuum

99 - deciding who is an ‘enemy’

9999 - aiming a gun at an ‘enemy’

Impact

1 - poor song selected to play, human mildly annoyed

2 - ineffective use of resources (vacuuming the same floor section twice)

99 - killing a human

99999 - killing all humans

This may not be possible to get agreement from all countries/cultures/beliefs, but it is something we should discuss and attempt to get some agreement.

The problem here in not appearing incompetent, but being wrong/confused. This is the problem that should be fixed by reading the literature. It is more efficient to fix it by reading the literature rather than by engaging in a discussion, even given good intentions. Fixing the appearances might change the attitude of other people towards preferring the option of discussion, but I don't think the attitude should change on that basis, reading the literature is still more efficient, so fixing appearances would mislead rather than help.

(I use parentheticals to indicate that an observation doesn't work as a natural element of the preceding conversation, but instead raises a separate point that is more of a one-off, probably not worthy of further discussion.)

-9

Definition of AI Friendliness

-9

How will we know if future AI’s (or even existing planners) are making decisions that are bad for humans unless we spell out what we think is unfriendly?

-9

-9

Definition of AI Friendliness

-9

How will we know if future AI’s (or even existing planners) are making decisions that are bad for humans unless we spell out what we think is unfriendly?

-9