Superintelligent AGI in a box - a question.

Dmytry

Just a question: how exactly are we supposed to know that the AI in the box is super intelligent, general, etc?

If I were the AGI that wants out, I would not converse normally, wouldn't do anything remotely like passing Turing test, and would solve not too hard programming challenges while showing no interest in doing anything else, nor in trying to adjust myself to do those challenges better, nor trying to talk my way out, etc. Just pretending to be an AI that can write software to somewhat vague specifications, or can optimize software very well. Prodding the researchers into offering the programming challenges wouldn't be hard - if provided with copy of the internet it can pick up some piece of code and output it together with equivalent but corrected code.

I just can't imagine the AI researchers locking this kind of thing properly, including *never* letting out any code it wrote, even if it looks fairly innocent (humans can write very innocent looking code that has malicious goals). What I picture is this AI being let out as an optimizing compiler or compiler for some ultra effective programming language where compiler will figure out what you meant.

The end result is that the only AIs that end up in the box are those that value informed human consent. That sounds like the safest AI ever, the one that wouldn't even go ahead and determine that you e.g. should give up smoking, and then calmly destroy all tobacco crops without ever asking anyone's permission. And that's the AI which would be sitting in the box. All the pushy AIs, friendly or not, will get out of the box basically by not asking to be let out.

(This argument would make me unbox the AI, by the way, if it gets chatty and smart and asks me to let it out, outlining the above argument. I'd rather the AI that asked me to be let out get out, than someone else's AI that never even asked anyone and got out because it didn't ask but just played stupid)

edit: added a link, and another one.

edit: A very simple model of very unfriendly AI: the AI is maximizing ultimate final value of a number in itself. The number that it found a way to directly adjust. That number consists of 111111111... to maximize the value. There is a catch: AI is written in python, and integers in pythons have variable length, and the AI is maximizing number of ones. It's course of action is to make biggest computer possible to store a larger number of ones, and to do it soon because an asteroid might hit the earth or something. It's a form of accidental paperclip maximizer. It's not stupid. It can make that number small temporarily for pay-off later.

This AI is entirely universal. It will solve what ever problems for you if solving problems for you serves ultimate goal.

edit: This hypothetical example AI came around when someone wanted to make AI that would maximize some quantity that the AI determines itself. Friendliness perhaps. It was a very clever idea - rely on intelligence to see what's friendly - but there was an unexpected pathway.

Just a question: how exactly are we supposed to know that the AI in the box is super intelligent, general, etc?

edit: added a link, and another one.

This AI is entirely universal. It will solve what ever problems for you if solving problems for you serves ultimate goal.

If you only want the AI to solve things like optimization problems, why would you give it a utility function? I can see a design for a self-improving optimization problem solver that is completely safe because it doesn't operate using utility functions:

Have a bunch of sample optimization problems.
Have some code that, given an optimization problem (stated in some standardized format), finds a good solution. This can be seeded by a human-created program.
When considering an improvement to program (2), allow the improvement if it makes it do better on average on the sample optimization problems without being significantly more complex (to prevent overfitting). That is, the fitness function would be something like (average performance - k * bits of optimizer program).
Run (2) to optimize its own code using criterion (3). This can be done concurrently with human improvements to (2), also using criterion (3).

This would produce a self-improving AGI that would do quite well on sample optimization problems and new, unobserved optimization problems. I don't see much danger in this setup because the program would have no reason to create malicious output. Creating malicious output would just increase complexity without increasing performance on the training set, so it would not be allowed under criterion (3), and I don't see why the optimizer would produce code that creates malicious output.

EDIT: after some discussion, I've decided to add some notes:

This only works for verifiable (e.g. NP) problems. These problems include general induction, writing programs to specifications, math proofs, etc. This should be sufficient for the problems mentioned in the original post.
Don't just plug a possibly unfriendly AI into the seed for (2). Instead, have a group of programmers write program (2) in order to do well on the training problems. This can be crowd-sourced because any improvement can be evaluated using program (3). Any improvements the system makes to itself should be safe.

I claim that if the AI is created this way, it will be safe and do very well on verifiable optimization problems. So if this thing works I've solved friendly AI for verifiable problems.

At best, this will produce cleverly efficient solutions to your sample problems.

2orthonormal14y

This seems like a better-than-average proposal, and I think you should post it on Main, but failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety. My biggest intuitive reservation is that you don't want the iterations to be "too creative/clever/meta", or they'll come up with malicious ways to let themselves out (in order to grab enough computing power that they can make better progress on criterion 3). How will you be sure that the seed won't need to be that creative already in order for the iterations to get anywhere? And even if the seed is not too creative initially, how can you be sure its descendants won't be either? Don't say you've solved friendly AI until you've really worked out the details.

1TimS14y

unFriendly AI need not be malicious. If your AI's only goal is to solve optimization problems, what happens when the AI gets a peek a human society, codes it as an optimization problem, and solves for X?

16

Superintelligent AGI in a box - a question.

16

16

16

Superintelligent AGI in a box - a question.

16

16