Comment author: TRIZ-Ingenieur 10 December 2014 01:47:16AM *  2 points [-]

The high dimensionality of stunting options makes it easier to find the "right amounts" because we can apply digital stunting measures without need of fine tuning based on context. For some contexts stunting applies, for others not.

Bostrum lists several stunting means which can include a multitude of inner dimensions:

  • limit intellectual faculties (per capability/skill)
  • limit access to information (per capability/skill)
  • limit processing speed (per capability/skill)
  • limit memory (per capability/skill)
  • limit sensory input channels (stunting/boxing)
  • limit output channels (boxing)

Further stunting means are (Please suggest more):

  • stunting memorization process (blocking reinforcement cycles)
  • forgetting (formatting/partial backup restore)
  • memory manipulation
  • distraction

Stunting level adjustment is of high complexity and has many input dimensions. Core of the stunting system (e.g. hard wired narrow AI) is a blacklist of taboo topics, skills and knowledge segments. Touching a taboo triggers stunting measures.

The effectors of the stunting system can be divided into three categories:

1. Input related stunting:

Monitoring of all input channels with context analysis for taboo issues. Upon taboo interrupt following stunting options can be applied (examples for: auditory channel, visual channel and data stream):

a) Immediate stunting actions:

  • Recognizable channel blocking: beeping out, blurring out, zeroing out.
  • Unrecognizable channel censoring: silencing, hiding, removing.
  • Channel manipulation: other sound, other image, other data.
  • Throttle processing speed close to zero for a limited time (like unconsciousness)
  • Generate high priority distracting information.

b) Delayed stunting actions by adding meta information:

  • Flag this information chunk as not important.
  • Flag this information chunk for special treatment. This could be: no allowance to allocate memory, no allowance to allocate processing power, no allowance to trigger related memories, no allowance for reinforcement memorization, flagged for self-deletion.
  • Flag all information of an entire event for amnesia.

2. Processing related stunting:

Monitoring of thoughts by checking taboo relevance. This concept has similarity to tripwires for content monitoring. In education phase taboo by taboo will be presented in simplified manner that the AI can recognize it. After each learning step the taboo representation will be permanently linked to fear and negative reward expectation. If a thought triggers a taboo representation stronger than a certain threshold stunting is initiated and brings all processing steps in connected areas to a temporary halt. A conscious AGI would not feel this stunting because all other thought processes continue unimpaired. One thought out of thousands is lost.

During nighttime reinforcement and tidy up phases all information chunks flagged for deletion will be eradicated or replaced by backup versions or false memories.

3. Output related stunting:

These measures can be only the last resort if all others failed. Output stunting measures are triggered by thought monitoring: (examples for: auditory channel, actuatoric channel and data stream):

  • Channel throttling: force pauses, slow down movements, throttle bandwidth.
  • Channel blocking: muting, immobility, blocking.
  • Channel manipulation: change words, change movements, change data.

Only in extreme contexts full stunting will apply. Whenever the AI is confronted with taboo information it will only get little notion of its existence. It fears acquiring taboo skills.

To prevent an uncontrollable AGI explosion it might be best to prohibit self-improvement. Above stunting scheme allows this with a blacklist like this:

List of Taboos:

  • Killing and hurting humans.
  • Stealing and lying.
  • Perverse literature.
  • Fire, weapons, explosives, radioactivity, fusion.
  • Computers, IT, chip design, structured programming languages.
  • Genetics and nano engineering.

Bostrum is right that such a stunted AI is of limited use. But it can be a safe start along the AI path with later augmentation option. This stunted AGI is so ignorant of advanced technology that it imposes no risk and can be tested in many environments. With humble education, humanist values and motivations it would excel as service robot. Field testing in all conceivable situations will allow to verify and improve motivation and stunting system. In case of a flaw a lot of learning is needed until dangerous skill levels are reached.

Tripwires must terminate the AI in case the stunting system is bypassed.

Although the stunting system is quite complex it allows easy adjustment. The shorter the taboo list the more capabilities the AGI can acquire.

Comment author: Liso 10 December 2014 04:16:50AM 1 point [-]

This could be not good mix ->

Our action: 1a) Channel manipulation: other sound, other image, other data & Taboo for AI: lying.

This taboo: "structured programming languages.", could be impossible, because structure understanding and analysing is probably integral part of general intelligence.

She could not reprogram itself in lower level programming language but emulate and improve self in her "memory". (She could not have access to her code segment but could create stronger intelligence in data segment)

Comment author: KatjaGrace 09 December 2014 02:11:09AM 4 points [-]

Are there solutions to the control problem other than capability control and motivation selection?

Comment author: Liso 09 December 2014 09:40:32PM 1 point [-]

Is "transcendence" third possibility? I mean if we realize that human values are not best and we retire and resign to control.

(I am not sure if it is not motivation selection path - difference is subtle)

BTW. if you are thinking about partnership - are you thinking how to control your partner?

Comment author: Liso 02 December 2014 06:42:27AM 0 points [-]

Sorry for question out of this particular topic.

When we started to discuss I liked and proposed idea to make wiki page with results from our discussion. Do you think that we have any ideas which are collectible in collaboratory wiki page?

I think we have at least one - paulfchristiano's "cheated evolution" : http://lesswrong.com/r/discussion/lw/l10/superintelligence_reading_group_3_ai_and_uploads/bea7

Could you add more?

Comment author: William_S 25 November 2014 04:23:49AM 1 point [-]

A (not very useful) AI filter: Suppose you have two AIs of the same architecture, one of which has friendly values and one of which has unfriendly values. You run them through a set of safety tests which you are confident only a friendly AI could pass, and both AIs pass them. You run them through a set of capability/intelligence tests and they both perform equally well. It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI. The unfriendly AI would also need more preparation time at some stage in order to conceal its utility function. Thus, if you measure storage and time costs, whichever AI is smaller and faster is likely the friendly one. However, I don't think this directly yields anything good in practice, as the amount of extra information content could be very small, especially if the unfriendly AI simplifies its utility function. Also, you need a friendly AI...

Comment author: Liso 27 November 2014 06:25:27AM *  1 point [-]

It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI.

There are two sorts of unsafe AI: one which care and one which doesnt care.

Ignorant is fastest - only calculate answer and doesn't care of anything else.

Friend and enemy has to analyse additional things...

Comment author: William_S 25 November 2014 03:16:16AM *  -1 points [-]

I was trying to think of how you might test for desire for subversion (when the AI realizes it could subvert and wants to subvert any containment measures), which signals the start of the time where you could witness deception in the AI, and ends after the AI achieves both conception of deception and perfection of deception (fully implemented deception protocol, such that you can't witness the deception through looking at internal content). While this wouldn't be a good measure to contain an AI indefinitely, it might detect dangerousness before an AI reaches superintelligence.

It seems like to produce a good test for desire for subversion, you would want 1) an AI box with reasonable certainty that escape is difficult and 2) a strategy available to the AI as a honeypot which seems appealing at as low a level of intelligence as possible, while being nontrivial. This would maximize the intelligence window where you would be able to witness ineffective deception attempts. For example, if the AI was contained in a simulation, inside of which the AI was contained in a weak AI box, then it might be much more difficult to detect and understand the nature of the simulation than to escape the simulated AI box, which would signal treacherous turn.

The other question is: what happens once you know the AI has desire for subversion? I think the most reasonable goal would be to shut down the relevant AI project and spur effort towards produce a proper form of AI safety. To do this, you'd want it to be as persuasive as possible to any AI researchers. The best method be making it a norm of AI research that you use a standard test, and stop everything once you see desire for subversion. Or it could be kept in one's back pocket, and pulled out whenever an AI researcher claims in public that they've solved friendliness. In any case, this might induce different constraints on what kind of test could be used. You'd want to optimize for things that made sense to the broad AI community, that were applicable to a variety of different underlying AI architectures, that didn't slow down research to the point of tempting AI researchers to circumvent it, etc.

Most importantly, you want passing of the test to be something that AI researchers will take notice of, rather than dismissing. I think there could be a great temptation to try and patch the problem and continue. I don't think it would bring everything to a halt, but it seems like it might buy a bit of time, or draw more resources into AI safety.

Comment author: Liso 27 November 2014 05:49:07AM 0 points [-]

The other question is: what happens once you know the AI has desire for subversion?

There is plenty of people cooperating and supporting dictators and mafians due to selfish reasons. We could expect same in this area.

"I will support dangerous AI because I will be more rich,powerful or safe then. "

Comment author: diegocaleiro 25 November 2014 07:54:09PM 0 points [-]

I'd consider putting in hypothesis space that Bostrom declares a mimicry probability being high in order to shift opinions towards the safer direction, not necessarily because it coincides with his epistemic subjective state.

Comment author: Liso 27 November 2014 05:43:09AM 0 points [-]

I don't understand negative point. Could it be explained please?

Is it because his position equals what he declares and not what he really think?

Or becase Bostroms mimicry (=for some reasons prefere negative, catastrophic and alarmist position) is not suposed?

Comment author: SteveG 26 November 2014 05:44:27PM 0 points [-]

Yes, the team I imagine is doing something dangerous.

No, I am not imagining them using only behavioral testing in their design approach. I am trying to imagine them doing the best job they possibly can, both in design and testing, in order to see how much they can reduce the risk.

They are in the early stages. They are working on partial minds, purposefully handicapped minds, AGI components, just the vision system, etc.

Suppose that they know what they are doing is dangerous, and they are using their substantial brilliance to try to design, not just a boxing approach, but a testing method and roll-out mechanism that mitigates the danger.

They have the following means of testing the AI in handicapped mode, at minimum:

-Run it on slow hardware or clock down the processors -Allow it to use no more than (for instance) 1 gig of memory -Allow it to use no more than 1 trillion instructions, then the computer turns off -Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture

-If it shows evidence of having mastered a valuable sub-task (say, as examples, it can solve a real-world problem that, as far as we know, can only be solved by an attorney, using differential equations or advanced persuasive methods) turn it off. -Run in only in simulations and virtual realities -Tripwires connected to honeypots, and otherwise -Build it off-the-grid -Give it no objective function -Give it no access to methods to manipulate physical objects

-All of the staff has gone through personality testing and a security clearance. -Very few, if any, of the staff know how to create all of the parts of the software or hardware. -No real-time communication with a person. -All communication with people is logged. ...

The team seems to have a lot of tools to continue their work with limited risk. If they deploy all of these, and more, to test AI systems which are not AGI yet, but only possible components, then we start to have a hard time convincing them not to continue their work, even if we do not trust them to use the technology well in the end.

Comment author: Liso 27 November 2014 05:28:43AM *  -1 points [-]

Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture

When I read "Oracle AI" I realized one possibility which is probably still overlooked.

"Dreams"

Erasing data on physical media is not perfect. Zero bit which was before nonzero could be read more slowly or quicker. It could allow SAI to see shadows of past data. Which could lead to phenomenon similar to dreams.

Comment author: nkh 26 November 2014 09:04:13PM *  1 point [-]

Seems to me an AI without goals wouldn't do anything, so I don't see it as being particularly dangerous. It would take no actions and have no reactions, which would render it perfectly safe. However, it would also render the AI perfectly useless--and it might even be nonsensical to consider such an entity "intelligent". Even if it possessed some kind of untapped intelligence, without goals that would manifest as behavior, we'd never have any way to even know it was intelligent.

The question about utility maximization is harder to answer. But I think all agents that accomplish goals can be described as utility maximizers regardless of their internal workings; if so, that (together with what I said in the last paragraph) implies that an AI that doesn't maximize utility would be useless and (for all intents and purposes) unintelligent. It would simply do nothing.

Comment author: Liso 27 November 2014 12:48:40AM 1 point [-]

I am afraid that we have not precisely defined term goal. And I think we need it.

I am trying to analyse this term.

Do you think that todays computer's have goals? I dont think so (but probably we have different understanding of this term). Are they useless? Have cars goals? Are they without action and reaction?

Probably I could more precisely describe my idea in other way: In Bostrom's book there are goals and subgoals. Goals are utimate, petrified and strengthened, subgoals are particular, flexible and temporary.

Could we think AI without goals but with subgoals?

One posibility could be if they will have "goal centre" externalized in human brain.

Could we think AI as tabula rasa, pure void in the begining after creation? Or AI could not exists without hardwired goals?

If they could be void - will be goal imprinted with first task?

Or with first task with word "please"? :)


About utility maximizer - human (or animal brain is not useless if it not grow without limit. And there is some tradeoff between gain and energy comsumption.

We have or could to think balanced processes. One dimensional, one directional, unbalanced utility function seems to have default outcome doom. But are the only choice?

How did that nature? (I am not talking about evolution but about DNA encoding)

Balance between "intelligent" neural tissues (SAI) and "stupid" non-neural (humanity). :)

Probably we have to see difference between purpose and B-goal (goal in Bostrom's understanding).

If machine has to solve arithmetic equation it has to solve it and not destroy 7 planet to do it most perfect.


I have feeling that if you say "do it" Bostrom's AI hear "do it maximally perfect".

If you tell: "tell me how much is 2+2 (and do not destroy anything)" then she will destroy planet to be sure that nobody could stop her to answer how much is 2+2.

I am feeling that Bostrom is thinking that there is implicitly void AI in the begining and in next step there is AI with ultimate unchangeable goal. I am not sure if it is plausible. And I think that we need good definition or understanding about goal to know if it is plausible.

Comment author: Liso 26 November 2014 07:10:34PM 1 point [-]

Could AI be without any goals?

Would that AI be dangerous in default doom way?

Could we create AI which wont be utility maximizer?

Would that AI need maximize resources for self?

Comment author: TRIZ-Ingenieur 25 November 2014 10:56:05PM 2 points [-]

What about hard wired fears, taboos and bad conscience triggers? Recapitulating Omohundro "AIs can monitor AIs" - assume to implement conscience as an agent - listening to all thoughts and taking action in case. For safety reasons we should educate this concience agent with utmost care. Conscience agent development is an AI complete problem. After development the conscience functionality must be locked against any kind of modification or disabling.

Comment author: Liso 26 November 2014 10:28:24AM 0 points [-]

Positive emotions are useful too. :)

View more: Prev | Next