Steelmaning AI risk critiques

Stuart_Armstrong

(1) Intelligence is an extendible method that enables software to satisfy human preferences. (2) If human preferences can be satisfied by an extendible method, humans have the capacity to extend the method. (3) Extending the method that satisfies human preferences will yield software that is better at satisfying human preferences. (4) Magic happens. (5) There will be software that can satisfy all human preferences perfectly but which will instead satisfy orthogonal preferences, causing human extinction.

This is deeply silly. The thing about arguing from definitions is that you can prove anything you want if you just pick a sufficiently bad definition. That definition of intelligence is a sufficiently bad definition.

EDIT:

To extend this rebuttal in more detail:

I'm going to accept the definition of 'intelligence' given above. Now, here's a parallel argument of my own:

Entelligence is an extendible method for satisfying an arbitrary set of preferences that are not human preferences.
If these preferences can be satisfied by an extendible method, then the entelligent agent has the capacity to extend the method.
Extending the method that satisfies these non-human preferences will yield software that's better at satisfying non-human preferences.
The inevitable happens.
There will be software that will satisfy non-human preferences, causing human extinction.

Now, I pose to you: how do we make sure that we're making intelligent software, and not "entelligent" software, under the above definitions? Obviously, this puts us back to the original problem of how to make a safe AI.

The original argument is rhetorical slight of hand. The given definition of intelligence implicitly assumes that the problem doesn't exist, and all AI's will be safe, and then goes on to prove that all AIs will be safe.

It's really, fundamentally silly.

36

Steelmaning AI risk critiques

36

36

36

Steelmaning AI risk critiques

36

36