An overall schema for the friendly AI problems: self-referential convergence criteria

17 Stuart_Armstrong 13 July 2015 03:34PM

A putative new idea for AI control; index here.

After working for some time on the Friendly AI problem, it's occurred to me that a lot of the issues seem related. Specifically, all the following seem to have commonalities:

Speaking very broadly, there are two features all them share:

  1. The convergence criteria are self-referential.
  2. Errors in the setup are likely to cause false convergence.

What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup. In other words, the stopping point (and the the convergence to the stopping point) is entirely self-referentially defined: the morality judges itself. It does not include any other moral considerations. You input your initial moral intuitions and values, and you hope this will cause the end result to be "nice", but the definition of the end result does not include your initial moral intuitions (note that some moral realists could see this process dependence as a positive - except for the fact that these processes have many convergent states, not just one or a small grouping).

So when the process goes nasty, you're pretty sure to have achieved something self-referentially stable, but not nice. Similarly, a nasty CEV will be coherent and have no desire to further extrapolate... but that's all we know about it.

The second feature is that any process has errors - computing errors, conceptual errors, errors due to the weakness of human brains, etc... If you visualise this as noise, you can see that noise in a convergent process is more likely to cause premature convergence, because if the process ever reaches a stable self-referential state, it will stay there (and if the process is a long one, then early noise will cause great divergence at the end). For instance, imagine you have to reconcile your belief in preserving human cultures with your beliefs in human individual freedom. A complex balancing act. But if, at any point along the way, you simply jettison one of the two values completely, things become much easier - and once jettisoned, the missing value is unlikely to ever come back.

Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps - and again, once you lose something of value in your system, you don't tend to get if back.

 

Solutions

And again, very broadly speaking, there are several classes of solutions to deal with these problems:

  1. Reduce or prevent errors in the extrapolation (eg solving the agent tiling problem).
  2. Solve all or most of the problem ahead of time (eg traditional FAI approach by specifying the correct values).
  3. Make sure you don't get too far from the starting point (eg reduced impact AI, tool AI, models as definitions).
  4. Figure out the properties of a nasty convergence, and try to avoid them (eg some of the ideas I mentioned in "crude measures", general precautions that are done when defining the convergence process).

 

Reductionist research strategies and their biases

16 PhilGoetz 06 February 2015 04:11AM

I read an extract of (Wimsatt 1980) [1] which includes a list of common biases in reductionist research. I suppose most of us are reductionists most of the time, so these may be worth looking at.

This is not an attack on reductionism! If you think reductionism is too sacred for such treatment, you've got a bigger problem than anything on this list.

Here's Wimsatt's list, with some additions from the parts of his 2007 book Re-engineering Philosophy for Limited Beings that I can see on Google books. His lists often lack specific examples, so I came up with my own examples and inserted them in [brackets].

continue reading »

When the uncertainty about the model is higher than the uncertainty in the model

19 Stuart_Armstrong 28 November 2014 06:12PM

Most models attempting to estimate or predict some elements of the world, will come with their own estimates of uncertainty. It could be the Standard Model of physics predicting the mass of the Z boson as 91.1874 ± 0.0021 GeV, or the rather wider uncertainty ranges of economic predictions.

In many cases, though, the uncertainties in or about the model dwarf the estimated uncertainty in the model itself - especially for low probability events. This is a problem, because people working with models often try to use the in-model uncertainty and adjust it to get an estimate of the true uncertainty. They often realise the model is unreliable, but don't have a better one, and they have a measure of uncertainty already, so surely doubling and tripling this should do the trick? Surely...

The following three cases are going to be my go-to examples for showing what a mistake this can be; they cover three situations: extreme error, being in the domain of a hard science, and extreme negative impact.

continue reading »

If reason told you to jump off a cliff, would you do it?

-12 Shalmanese 21 December 2009 03:54AM

In reply to Eliezer's Contrarian Status Catch 22 & Sufficiently Advanced Sanity. I accuse Eliezer of encountering a piece of Advanced Wisdom.

Unreason is something that we should fight against. Witch burnings, creationism & homeopathy are all things which should rightly be defended against for society to advance. But, more subtly, I think reason is in some ways, is also a dangerous phenomena that should be guarded against. I am arguing not against the specific process of reasoning itself, it is the attitude which instinctually reaches for reason as the first tool of choice when confronting a problem. Scott Aaronson called this approach bullet swallowing when he tried to explain why he was so uncomfortable with it. Jane Galt also rails against reason when explaining why she does not support gay marriage.

continue reading »

View more: Next