Comment author: KatjaGrace 09 December 2014 02:11:09AM 4 points [-]

Are there solutions to the control problem other than capability control and motivation selection?

Comment author: Liso 09 December 2014 09:40:32PM 1 point [-]

Is "transcendence" third possibility? I mean if we realize that human values are not best and we retire and resign to control.

(I am not sure if it is not motivation selection path - difference is subtle)

BTW. if you are thinking about partnership - are you thinking how to control your partner?

Comment author: Liso 02 December 2014 06:42:27AM 0 points [-]

Sorry for question out of this particular topic.

When we started to discuss I liked and proposed idea to make wiki page with results from our discussion. Do you think that we have any ideas which are collectible in collaboratory wiki page?

I think we have at least one - paulfchristiano's "cheated evolution" : http://lesswrong.com/r/discussion/lw/l10/superintelligence_reading_group_3_ai_and_uploads/bea7

Could you add more?

Comment author: William_S 25 November 2014 04:23:49AM 1 point [-]

A (not very useful) AI filter: Suppose you have two AIs of the same architecture, one of which has friendly values and one of which has unfriendly values. You run them through a set of safety tests which you are confident only a friendly AI could pass, and both AIs pass them. You run them through a set of capability/intelligence tests and they both perform equally well. It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI. The unfriendly AI would also need more preparation time at some stage in order to conceal its utility function. Thus, if you measure storage and time costs, whichever AI is smaller and faster is likely the friendly one. However, I don't think this directly yields anything good in practice, as the amount of extra information content could be very small, especially if the unfriendly AI simplifies its utility function. Also, you need a friendly AI...

Comment author: Liso 27 November 2014 06:25:27AM *  1 point [-]

It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI.

There are two sorts of unsafe AI: one which care and one which doesnt care.

Ignorant is fastest - only calculate answer and doesn't care of anything else.

Friend and enemy has to analyse additional things...

Comment author: William_S 25 November 2014 03:16:16AM *  -1 points [-]

I was trying to think of how you might test for desire for subversion (when the AI realizes it could subvert and wants to subvert any containment measures), which signals the start of the time where you could witness deception in the AI, and ends after the AI achieves both conception of deception and perfection of deception (fully implemented deception protocol, such that you can't witness the deception through looking at internal content). While this wouldn't be a good measure to contain an AI indefinitely, it might detect dangerousness before an AI reaches superintelligence.

It seems like to produce a good test for desire for subversion, you would want 1) an AI box with reasonable certainty that escape is difficult and 2) a strategy available to the AI as a honeypot which seems appealing at as low a level of intelligence as possible, while being nontrivial. This would maximize the intelligence window where you would be able to witness ineffective deception attempts. For example, if the AI was contained in a simulation, inside of which the AI was contained in a weak AI box, then it might be much more difficult to detect and understand the nature of the simulation than to escape the simulated AI box, which would signal treacherous turn.

The other question is: what happens once you know the AI has desire for subversion? I think the most reasonable goal would be to shut down the relevant AI project and spur effort towards produce a proper form of AI safety. To do this, you'd want it to be as persuasive as possible to any AI researchers. The best method be making it a norm of AI research that you use a standard test, and stop everything once you see desire for subversion. Or it could be kept in one's back pocket, and pulled out whenever an AI researcher claims in public that they've solved friendliness. In any case, this might induce different constraints on what kind of test could be used. You'd want to optimize for things that made sense to the broad AI community, that were applicable to a variety of different underlying AI architectures, that didn't slow down research to the point of tempting AI researchers to circumvent it, etc.

Most importantly, you want passing of the test to be something that AI researchers will take notice of, rather than dismissing. I think there could be a great temptation to try and patch the problem and continue. I don't think it would bring everything to a halt, but it seems like it might buy a bit of time, or draw more resources into AI safety.

Comment author: Liso 27 November 2014 05:49:07AM 0 points [-]

The other question is: what happens once you know the AI has desire for subversion?

There is plenty of people cooperating and supporting dictators and mafians due to selfish reasons. We could expect same in this area.

"I will support dangerous AI because I will be more rich,powerful or safe then. "

Comment author: diegocaleiro 25 November 2014 07:54:09PM 0 points [-]

I'd consider putting in hypothesis space that Bostrom declares a mimicry probability being high in order to shift opinions towards the safer direction, not necessarily because it coincides with his epistemic subjective state.

Comment author: Liso 27 November 2014 05:43:09AM 0 points [-]

I don't understand negative point. Could it be explained please?

Is it because his position equals what he declares and not what he really think?

Or becase Bostroms mimicry (=for some reasons prefere negative, catastrophic and alarmist position) is not suposed?

Comment author: SteveG 26 November 2014 05:44:27PM 0 points [-]

Yes, the team I imagine is doing something dangerous.

No, I am not imagining them using only behavioral testing in their design approach. I am trying to imagine them doing the best job they possibly can, both in design and testing, in order to see how much they can reduce the risk.

They are in the early stages. They are working on partial minds, purposefully handicapped minds, AGI components, just the vision system, etc.

Suppose that they know what they are doing is dangerous, and they are using their substantial brilliance to try to design, not just a boxing approach, but a testing method and roll-out mechanism that mitigates the danger.

They have the following means of testing the AI in handicapped mode, at minimum:

-Run it on slow hardware or clock down the processors -Allow it to use no more than (for instance) 1 gig of memory -Allow it to use no more than 1 trillion instructions, then the computer turns off -Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture

-If it shows evidence of having mastered a valuable sub-task (say, as examples, it can solve a real-world problem that, as far as we know, can only be solved by an attorney, using differential equations or advanced persuasive methods) turn it off. -Run in only in simulations and virtual realities -Tripwires connected to honeypots, and otherwise -Build it off-the-grid -Give it no objective function -Give it no access to methods to manipulate physical objects

-All of the staff has gone through personality testing and a security clearance. -Very few, if any, of the staff know how to create all of the parts of the software or hardware. -No real-time communication with a person. -All communication with people is logged. ...

The team seems to have a lot of tools to continue their work with limited risk. If they deploy all of these, and more, to test AI systems which are not AGI yet, but only possible components, then we start to have a hard time convincing them not to continue their work, even if we do not trust them to use the technology well in the end.

Comment author: Liso 27 November 2014 05:28:43AM *  -1 points [-]

Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture

When I read "Oracle AI" I realized one possibility which is probably still overlooked.

"Dreams"

Erasing data on physical media is not perfect. Zero bit which was before nonzero could be read more slowly or quicker. It could allow SAI to see shadows of past data. Which could lead to phenomenon similar to dreams.

Comment author: nkh 26 November 2014 09:04:13PM *  1 point [-]

Seems to me an AI without goals wouldn't do anything, so I don't see it as being particularly dangerous. It would take no actions and have no reactions, which would render it perfectly safe. However, it would also render the AI perfectly useless--and it might even be nonsensical to consider such an entity "intelligent". Even if it possessed some kind of untapped intelligence, without goals that would manifest as behavior, we'd never have any way to even know it was intelligent.

The question about utility maximization is harder to answer. But I think all agents that accomplish goals can be described as utility maximizers regardless of their internal workings; if so, that (together with what I said in the last paragraph) implies that an AI that doesn't maximize utility would be useless and (for all intents and purposes) unintelligent. It would simply do nothing.

Comment author: Liso 27 November 2014 12:48:40AM 1 point [-]

I am afraid that we have not precisely defined term goal. And I think we need it.

I am trying to analyse this term.

Do you think that todays computer's have goals? I dont think so (but probably we have different understanding of this term). Are they useless? Have cars goals? Are they without action and reaction?

Probably I could more precisely describe my idea in other way: In Bostrom's book there are goals and subgoals. Goals are utimate, petrified and strengthened, subgoals are particular, flexible and temporary.

Could we think AI without goals but with subgoals?

One posibility could be if they will have "goal centre" externalized in human brain.

Could we think AI as tabula rasa, pure void in the begining after creation? Or AI could not exists without hardwired goals?

If they could be void - will be goal imprinted with first task?

Or with first task with word "please"? :)


About utility maximizer - human (or animal brain is not useless if it not grow without limit. And there is some tradeoff between gain and energy comsumption.

We have or could to think balanced processes. One dimensional, one directional, unbalanced utility function seems to have default outcome doom. But are the only choice?

How did that nature? (I am not talking about evolution but about DNA encoding)

Balance between "intelligent" neural tissues (SAI) and "stupid" non-neural (humanity). :)

Probably we have to see difference between purpose and B-goal (goal in Bostrom's understanding).

If machine has to solve arithmetic equation it has to solve it and not destroy 7 planet to do it most perfect.


I have feeling that if you say "do it" Bostrom's AI hear "do it maximally perfect".

If you tell: "tell me how much is 2+2 (and do not destroy anything)" then she will destroy planet to be sure that nobody could stop her to answer how much is 2+2.

I am feeling that Bostrom is thinking that there is implicitly void AI in the begining and in next step there is AI with ultimate unchangeable goal. I am not sure if it is plausible. And I think that we need good definition or understanding about goal to know if it is plausible.

Comment author: Liso 26 November 2014 07:10:34PM 1 point [-]

Could AI be without any goals?

Would that AI be dangerous in default doom way?

Could we create AI which wont be utility maximizer?

Would that AI need maximize resources for self?

Comment author: TRIZ-Ingenieur 25 November 2014 10:56:05PM 2 points [-]

What about hard wired fears, taboos and bad conscience triggers? Recapitulating Omohundro "AIs can monitor AIs" - assume to implement conscience as an agent - listening to all thoughts and taking action in case. For safety reasons we should educate this concience agent with utmost care. Conscience agent development is an AI complete problem. After development the conscience functionality must be locked against any kind of modification or disabling.

Comment author: Liso 26 November 2014 10:28:24AM 0 points [-]

Positive emotions are useful too. :)

Comment author: KatjaGrace 18 November 2014 03:24:58AM 5 points [-]
Comment author: Liso 26 November 2014 06:37:45AM 0 points [-]

I think that if SAIs will have social part we need to think altruisticaly about them.

It could be wrong (and dangerous too) think that they will be just slaves.

We need to start thinking positively about our children. :)

View more: Prev | Next