AI indifference through utility manipulation

2Stuart_Armstrong02 September 2010 05:06PM

Indifference is a precious and rare commodity for complex systems. The most likely effect of making a change in an intricate apparatus is a whole slew of knock-on effects crowned with unintended consequences. It would be ideal if one could make a change and be sure that the effects would remain isolated - that the rest of the system would be indifferent to the change.

For instance, it might be a sensible early-AI precaution to have an extra observer somewhere, sitting with his hand upon a button, ready to detonate explosives should the AI make a visible power grab. Except, of course, the AI will become aware of this situation, and will factor it in in any plans it makes, either by increasing its deception or by grabbing control of the detonation system as a top priority. We would be a lot safer if the AI were somehow completely indifferent to the observer and the explosives. That is a complex wish that we don't really know how to phrase; let's make it simpler, and make it happen.

continue reading »
Stuart_Armstrong02 September 2010 03:39:38PM* 1 point [-]

When we try to estimate the number of technological civilizations that evolved on main-sequence stars in our past light cone, we must not use the presence of at least one tech civ (namely, us) as evidence of the presence of another one (namely, ET) because if that first tech civ had not evolved, we would have no way to observe that outcome (because we would not exist).

If there were two universes, one very likely to evolve life and one very unlikely, and all we knew was that we existed in one, then we are much more likely to exist in the first universe. Hence our own existence is evidence about the likelihood of life evolving, and there still is a Fermi paradox.

Stuart_Armstrong02 September 2010 10:00:13AM1 point [-]

You have to also be able to deduce how much of the other agent's information is shared with you. If you and them got your posteriors by reading the same blogs and watching the same TV shows, then this is very different from the case when you reached the same conclusion from completely different channels.

Stuart_Armstrong13 July 2010 11:05:14AM1 point [-]

Upvoted those options on the website.

Stuart_Armstrong22 May 2010 08:39:38AM1 point [-]

Have you seen Full Non-Idexical Conditioning? (http://www.cs.toronto.edu/~radford/ftp/anth.pdf) Though the theory is mathematically incorrect, it's very nearly right, and it's very similar to your sleeping beaty approach...

Stuart_Armstrong09 May 2010 02:05:50PM0 points [-]

I can't say anything about this specific construction, but there is a related issue in Turing machines. The issue was whether you could determine a useful subset S of the set of all Turing machines, such that the halting problem is solveable for all machines in S, and S was general enough to contain useful examples.

If I remember correctly, the answer was that you couldn't. This feels a lot like that - I'd bet that the only way of being sure that we can avoid Russel's paradox is to restrict predicates to such a narrow category that we can't do much anything useful with them.

Stuart_Armstrong11 April 2010 09:02:03PM0 points [-]

"you" could be a UDT agent. So does this example show a divergence between UDT and XDT applied to UDT?

In response to comment by wedrifid on The Last Number
Stuart_Armstrong11 April 2010 08:57:39PM1 point [-]

Problem is, everything collapses with one contradiction. So rigorously, there is nothing more to tell.

Now you can conceive of some sort of world in which the truths of mathematics are empirical, contingent and changeable, for instance, so that one contradiction is not much of a biggie. That would be quite fun, but, alas, I don't have time for much fiction nowadays. Maybe someone else could try?

Stuart_Armstrong11 April 2010 08:52:42PM0 points [-]

Actually, it is a serious point. If you choose thories at random, according to some universal prior, then a lot of them are going to be inconsistent. And most of the theories that can quickly prove their own consistency are the inconsistent ones. So this does provide some information (depending on how the consistency proof was arrived at, of course).

Stuart_Armstrong11 April 2010 07:50:29AM8 points [-]

Just an idea: what about putting a "number of votes" next to the "vote total" score for posts and comments? That would distinguish cases where a subject was highly controvertial from those where no-one really cares.

View more: Next