Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

In response to AI arms race
Comment author: James_Miller 21 May 2017 01:48:50AM 1 point [-]

I wrote about something similar in the first Singularity Hypothesis book. Some Economic Incentives Facing a Business that Might Bring About a Technological Singularity

In response to comment by James_Miller on AI arms race
Comment author: Stuart_Armstrong 22 May 2017 07:30:36AM 1 point [-]

Ah, but do you have graphs and gratuitously over-simplified models (no, you just have big blue arrows)? Without those, it doesn't count! ;-)

Comment author: turchin 19 May 2017 06:45:25PM 0 points [-]

If we create AI around human upload, or a model of human mind, it solves some of the problems:

1) It will, by definition, have the same values and the same value structure as a human being; in short, – human uploading solves value uploading.

2) It will be also not an agent

3) We could predict human upload behaviour based on our experience with predicting human behaviour.

And it will be not very powerful or very capable to strong self-improvement, because of the messy internal structure.

However, it could still be above human level because of acceleration of hardware and some tweaking. Using it we could construct primitive AI Police or AI Nanny, which will prevent the creation of any other types of AIs.

Comment author: Stuart_Armstrong 20 May 2017 08:08:56AM 0 points [-]

Convergent instrumental goals would make agent-like things become agents if they can self-modify (humans can't do this to any strong extent).

AI safety: three human problems and one AI issue

5 Stuart_Armstrong 19 May 2017 10:48AM

Crossposted at the Intelligent agent foundation.

There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems.

These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans.

Specifically, I feel AI safety issues can be classified as three human problems and one central AI issue. The human problems are:

  • Humans don't know their own values (sub-issue: humans know their values better in retrospect than in prediction).
  • Humans are not agents and don't have stable values (sub-issue: humanity itself is even less of an agent).
  • Humans have poor predictions of an AI's behaviour.

And the central AI issue is:

  • AIs could become extremely powerful.

Obviously if humans were agents and knew their own values and could predict whether a given AI would follow those values or not, there would be not problem. Conversely, if AIs were weak, then the human failings wouldn't matter so much.

The points about human values is relatively straightforward, but what's the problem with humans not being agents? Essentially, humans can be threatened, tricked, seduced, exhausted, drugged, modified, and so on, in order to act seemingly against our interests and values.

If humans were clearly defined agents, then what counts as a trick or a modification would be easy to define and exclude. But since this is not the case, we're reduced to trying to figure out the extent to which something like a heroin injection is a valid way to influence human preferences. This makes both humans susceptible to manipulation, and human values hard to define.

Finally, the issue of humans having poor predictions of AI is more general than it seems. If you want to ensure that an AI has the same behaviour in the testing and training environment, then you're essentially trying to guarantee that you can predict that the testing environment behaviour will be the same as the (presumably safe) training environment behaviour.

 

How to classify methods and problems

That's well and good, but how to various traditional AI methods or problems fit into this framework? This should give us an idea as to whether the framework is useful.

It seems to me that:

 

  • Friendly AI is trying to solve the values problem directly.
  • IRL and Cooperative IRL are also trying to solve the values problem. The greatest weakness of these methods is the not agents problem.
  • Corrigibility/interruptibility are also addressing the issue of humans not knowing their own values, using the sub-issue that human values are clearer in retrospect. These methods also overlap with poor predictions.
  • AI transparency is aimed at getting round the poor predictions problem.
  • Laurent's work on carefully defining the properties of agents is mainly also about solving the poor predictions problem.
  • Low impact and Oracles are aimed squarely at preventing AIs from becoming powerful. Methods that restrict the Oracle's output implicitly accept that humans are not agents.
  • Robustness of the AI to changes between testing and training environment, degradation and corruption, etc... ensures that humans won't be making poor predictions about the AI.
  • Robustness to adversaries is dealing with the sub-issue that humanity is not an agent.
  • The modular approach of Eric Drexler is aimed at preventing AIs from becoming too powerful, while reducing our poor predictions.
  • Logical uncertainty, if solved, would reduce the scope for certain types of poor predictions about AIs.
  • Wireheading, when the AI takes control of reward channel, is a problem that humans don't know their values (and hence use an indirect reward) and that the humans make poor predictions about the AI's actions.
  • Wireheading, when the AI takes control of the human, is as above but also a problem that humans are not agents.
  • Incomplete specifications are either a problem of not knowing our own values (and hence missing something important in the reward/utility) or making poor predictions (when we though that a situation was covered by our specification, but it turned out not to be).
  • AIs modelling human knowledge seem to be mostly about getting round the fact that humans are not agents.

Putting this all in a table:

 

MethodValues
Not Agents
Poor PredictionsPowerful
Friendly AI
X


IRL and CIRL X


Corrigibility/interruptibility X
X
AI transparency

X
Laurent's work

X
Low impact and Oracles
X
X
Robustness

X
Robustness to adversaries
X

Modular approach

X X
Logical uncertainty

X
Wireheading (reward channel) X X X
Wireheading (human) X
X
Incomplete specifications X
X
AIs modelling human knowledge
X

 

Further refinements of the framework

It seems to me that the third category - poor predictions - is the most likely to be expandable. For the moment, it just incorporates all our lack of understanding about how AIs would behave, but this might more useful to subdivide.

Comment author: Lumifer 17 May 2017 02:33:23PM 0 points [-]

You could use them when building a new model in a new field with experts but little data.

Again, why would you use this particular model class instead of other alternatives?

they still outperform experts

That statement badly needs modifiers. I would suggest "some improper linear models sometimes outperform experts". Note that there is huge selection bias here. Also, your link is from 1979, where is that "still" coming from?

Comment author: Stuart_Armstrong 17 May 2017 05:50:53PM *  0 points [-]

"some improper linear models sometimes outperform experts"

Fair qualifiers.

Comment author: gwern 17 May 2017 04:25:12PM 0 points [-]

ie making the model worse.

Yes, but again, where is the qualitative difference? In what sense does this explain the performance of improper linear models versus human experts? Why does the subtle difference between a model based on an 'enriched' set of variables and a model based on a non-enriched-but-slightly-worse 'explain' how they perform better than humans?

Comment author: Stuart_Armstrong 17 May 2017 05:50:21PM 0 points [-]

? I'm not sure what you're asking for. The basic points are a) experts are bad integrating information, and b) experts are good at selecting important variables of roughly equal importance, c) these variables are often highly correlated.

a) explains why experts are bad (as in worse than proper linear models), b) and c) explain why improper linear models might perform not too far off proper linear models (and hence be better than experts).

Comment author: gwern 16 May 2017 08:13:02PM 0 points [-]

Adding in more irrelevant variables does change things quantitatively by lowering power due to increased variance and requiring more data, but I don't see how this leads to any qualitative transition from working to not working such that it might explain why they work.

Comment author: Stuart_Armstrong 17 May 2017 04:55:16AM 0 points [-]

by lowering power

ie making the model worse.

and requiring more data

I don't think this is true. All the useful weights are set to +1 or -1 by expert assessment, and the non-useful weights are just noise. Why would more data be required?

Comment author: gathaung 16 May 2017 08:06:51PM 0 points [-]

Second question:

Do you have a nice reference (speculative feasibility study) for non-rigid coil-guns for acceleration?

Obvious idea would be to have a swarm of satellites with a coil, spread out over the solar system. Outgoing probe would pass through a series of such coils, each adding some impulse to the probe (and doing minor course corrections). Obviously needs very finely tuned trajectory.

Advantage over rigid coil-gun: acceleration spread out (unevenly) over longer length (almost entire solar system). This is good for heat dissipation (no coupling is perfect), and maintaining mega-scale rigid objects appears difficult. Satellites can take their time to regain position (solar sail / solar powered ion thruster / gravity assist). Does not help with g-forces.

Disadvantage: Need a large number of satellites in order to get enough launch windows. But if we are talking dyson swarm anyway, this does not matter.

How much do we gain compared to laser acceleration? Main question is probably: How does the required amount of heat dissipation compare?

Comment author: Stuart_Armstrong 17 May 2017 04:51:09AM 0 points [-]

Interesting idea. No, I don't have any references, sorry!

Comment author: Lumifer 16 May 2017 08:53:05PM 0 points [-]

I am not sure of the point you are making. In particular, I don't see why would anyone use those improper linear models. It's not 1979 and we can easily fit a variety of linear models including robust ones. Under which circumstances would you prefer an improper linear model to other alternatives available?

Comment author: Stuart_Armstrong 17 May 2017 04:50:17AM *  0 points [-]

You could use them when building a new model in a new field with experts but little data. But the point is not so much to use these models, but to note that they still outperform experts.

Comment author: gwern 16 May 2017 06:15:34PM *  1 point [-]

I don't think this is why improper linear models work. If you have a large number of variables, most of which are irrelevant in the sense of being uncorrelated with the outcome, then the irrelevant variables will be randomly assigned to +1 or -1 weights and will on average cancel out, leaving the signal from the relevant variables who do not cancel each other out.

So even without an implicit prior from an expert relevance selection effect or any explicit prior enforcing sparsity, you would still get good performance from improper linear models. (And IIRC, when you use something like ridge regression or Laplacian priors, the typical result, especially in high-dimensional settings like genomics or biology, most of the variables do drop out or get set to zero, so even in these 'enriched' datasets, most of the variables are irrelevant. What's sauce for the goose is sauce for the gander.)

Adding in more irrelevant variables does change things quantitatively by lowering power due to increased variance and requiring more data, but I don't see how this leads to any qualitative transition from working to not working such that it might explain why they work. That seems to have more to do with the human subjects overweighting noise and the 'bet on sparsity' principle.

Comment author: Stuart_Armstrong 16 May 2017 08:03:46PM 0 points [-]

then the irrelevant variables will be randomly assigned to +1 or -1 weights and will on average cancel out, leaving the signal from the relevant variables who do not cancel each other out.

This will seriously degrade the signal. Normally there are only a few key variables, so adding more random ones with similar will increase the amount of spurious results.

Comment author: contravariant 16 May 2017 04:55:23PM *  0 points [-]

Why would they want to stop us from fleeing? It doesn't reduce their expansion rate, and we already established that we don't pose any serious threat to them. We would essentially be giving a perfectly good planet and star to them, undamaged by war (we would probably have enough time to launch at least some nuclear missiles, probably not harming them much but wrecking the ecosystem and making the planet ill-suited for colonization by biological life). Unless they're just sadistic and value the destruction of life as a final goal, I see no reason for them to care. Any planets and star systems that would be colonized by the escaping humans would be taken just as easily as Earth, with only a minor delay.

Comment author: Stuart_Armstrong 16 May 2017 06:16:37PM 0 points [-]

Why would they want to stop us from fleeing?

Because over the billions of years of our flight, we could develop technology that could be used to counter them, especially if interstellar warfare favours the defence or scorched earth is possible.

View more: Next