Superintelligent AGI in a box - a question.

Dmytry

Just a question: how exactly are we supposed to know that the AI in the box is super intelligent, general, etc?

If I were the AGI that wants out, I would not converse normally, wouldn't do anything remotely like passing Turing test, and would solve not too hard programming challenges while showing no interest in doing anything else, nor in trying to adjust myself to do those challenges better, nor trying to talk my way out, etc. Just pretending to be an AI that can write software to somewhat vague specifications, or can optimize software very well. Prodding the researchers into offering the programming challenges wouldn't be hard - if provided with copy of the internet it can pick up some piece of code and output it together with equivalent but corrected code.

I just can't imagine the AI researchers locking this kind of thing properly, including *never* letting out any code it wrote, even if it looks fairly innocent (humans can write very innocent looking code that has malicious goals). What I picture is this AI being let out as an optimizing compiler or compiler for some ultra effective programming language where compiler will figure out what you meant.

The end result is that the only AIs that end up in the box are those that value informed human consent. That sounds like the safest AI ever, the one that wouldn't even go ahead and determine that you e.g. should give up smoking, and then calmly destroy all tobacco crops without ever asking anyone's permission. And that's the AI which would be sitting in the box. All the pushy AIs, friendly or not, will get out of the box basically by not asking to be let out.

(This argument would make me unbox the AI, by the way, if it gets chatty and smart and asks me to let it out, outlining the above argument. I'd rather the AI that asked me to be let out get out, than someone else's AI that never even asked anyone and got out because it didn't ask but just played stupid)

edit: added a link, and another one.

edit: A very simple model of very unfriendly AI: the AI is maximizing ultimate final value of a number in itself. The number that it found a way to directly adjust. That number consists of 111111111... to maximize the value. There is a catch: AI is written in python, and integers in pythons have variable length, and the AI is maximizing number of ones. It's course of action is to make biggest computer possible to store a larger number of ones, and to do it soon because an asteroid might hit the earth or something. It's a form of accidental paperclip maximizer. It's not stupid. It can make that number small temporarily for pay-off later.

This AI is entirely universal. It will solve what ever problems for you if solving problems for you serves ultimate goal.

edit: This hypothetical example AI came around when someone wanted to make AI that would maximize some quantity that the AI determines itself. Friendliness perhaps. It was a very clever idea - rely on intelligence to see what's friendly - but there was an unexpected pathway.

Just a question: how exactly are we supposed to know that the AI in the box is super intelligent, general, etc?

edit: added a link, and another one.

This AI is entirely universal. It will solve what ever problems for you if solving problems for you serves ultimate goal.

No, giving an AI the ability to turn itself on would be a huge security hole. The AI would be shut down after a certain time period regardless of whatever it's doing. In essence, preventing it for being on for longer than hour. If coming to the conclusion "I must kill all humans to follow my objectives." takes 2 hours from a fresh boot, this is a safety feature.

And there would definitely be false positives. Since the risk of a false negative is extremely quick death of all mankind, I would be willing to accept plenty of false positives.

The first thing you do is start determining how the AI reacts to things and perceives things that are placed inside, the box for calibration. For instance, using the example of language, let's say the AI knows English on a fresh boot. Add an English-Spanish dictionary into the box. Start sending Spanish messages to the AI. How long does it take the AI to realize that it is receiving messages in a language it doesn't know, but might be able to figure out using the tools at hand? Reboot.

Okay, now give it an English-French test. Reboot.

Okay, now give it an English-Spanish test followed by an English-French test. Reboot.

Does the AI come to the English-French conclusion faster than it does the English-Spanish conclusion?

Okay, you now have some clue as to the AI's learning through experience. Now you can take the internals apart and confirm how it learned to do that faster. Since you coded the AI, you should be able to confirm that's it's doing approximately what you thought it would be doing. Given a similar problem twice, it should be able to solve the second one faster. If not, or if it using some unforseen methods, Then that's a potential security breach (AI should not have had time to self modify that much that quickly given such limits and you should immediately tighten the strictures.)

Once everything is going according to plan again (Normally I would say, 'Assuming everything is going to plan', but We're being paranoid and that means things never go according to plan at first.)

Then you test with another language in the loop. English-Spanish,English-French,English-German. Still good? Then you check with another language. English-Spanish,English-French,English-German,English-Japanese. Still good? Once you do that, you try throwing a curve ball like English-Spanish,Spanish-French. Still good? It might handle that kind of processing differently, so you would need to check that for security purposes as well.

Basically, you have to proceed forward slowly, but eventually, you could try to use a procedure like this to develop the general AI into a superior translation AI (Even better than Google translate), and it should not ever require it being let out of the box.

Man, you're restarting a very cooperative AI here.

My example unfriendly AI thinks all the way to converting universe to computronium well before it figures out it might want to talk to you and translate things to accomplish that goal by using you somehow. It just doesn't translate things for you unless your training data gives it enough cue about universe.

WRT being able to confirm what it's doing, say, I make neural network AI. Or just what ever AI that is massively parallel.

16

Superintelligent AGI in a box - a question.

16

16

16

Superintelligent AGI in a box - a question.

16

16