Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Stuart_Armstrong 20 November 2014 03:41:42PM 1 point [-]

For this problem: it's not whether the guy has solved AI, it's whether the guy is more likely than other people to have solved AI (more exactly, of all the actions you could do to increase the chance of FAI, is interacting with this guy the most helpful?).

Comment author: Stuart_Armstrong 20 November 2014 03:40:29PM 2 points [-]

Great!

Comment author: owencb 20 November 2014 01:11:25PM 1 point [-]

Thanks, nice write-up.

Solution 1 seems to see quite a lot of use in the world (often but not always in conjunction with 4): one player will set a price without reference to the other player's utility function, setting up an ultimatum.

Comment author: Stuart_Armstrong 20 November 2014 02:04:11PM 0 points [-]

But setting a price is an iterative process, depending on how much of the good is purchased...

Comment author: RichardKennaway 19 November 2014 10:51:47AM 0 points [-]

Is there a mathematical solution to this problem? Could the regulation requiring a 99.5% chance each year for each individual company to meet its obligations take into account the dependency between different companies, and set the required level to target the chance of industry-wide failure? Taleb has certainly made the point that the assumption of independent failures fails when everyone is adopting the same strategy. They needn't even be investing in each other for this to happen, just acting as fewer agents than they are.

Comment author: Stuart_Armstrong 19 November 2014 01:22:00PM 1 point [-]

If you have excellent models, then you could have the regulators adjust requirements as dependencies change. But we don't have excellent models, far from it...

Comment author: AlexMennen 19 November 2014 01:31:29AM 0 points [-]

How does this differ from testing an AI in a simulated universe before letting it out into the real world?

Comment author: Stuart_Armstrong 19 November 2014 12:59:25PM 1 point [-]

It depends on what you define "simulated universe" to be. Here it uses the AIs own prediction module to gauge the future; if you want to call that a simulated universe, the position is arguably defensible.

Comment author: DanielLC 18 November 2014 01:50:38AM 0 points [-]

If it believes something is impossible, then when it sees proof to the contrary it assumes there's something wrong with its sensory input. If it has utility indifference, then if it sees that the universe is one it doesn't care about it acts based on the tiny chance that there's something wrong with its sensory input. I don't see a difference. If you use Solomonoff induction and set a prior to zero, everything will work fine. Even a superintelligent AI won't be able to use Solomonoff induction, and it realistically will have Bayes' theorem not quite accurately describe its beliefs, but that's true regardless of if it has zero probability for something.

Comment author: Stuart_Armstrong 18 November 2014 10:39:12AM 2 points [-]

That's not how utility indifference works. I'd recommend skimming the paper ( http://www.fhi.ox.ac.uk/utility-indifference.pdf ), then ask me if you still have questions.

Comment author: DanielLC 17 November 2014 09:54:58PM *  0 points [-]

Is there a difference between utility indifference and false beliefs? The Von Neumann–Morgenstern utility theorem works entirely based on expected value, and does not differentiate between high probability and high magnitude of value.

Comment author: Stuart_Armstrong 17 November 2014 10:56:18PM 1 point [-]

False beliefs might be contagious (spreading to other beliefs), and lead to logic problems with thing like P(A)=P(B)=1 and P(A and B)<1 (or when impossible things happen).

Comment author: V_V 17 November 2014 07:09:42PM 0 points [-]

Isn't that essentially a false beliefs about one's own preferences?

I mean, the AI "true" VNM utility function, to the extent that it has one, is going to be different than the utility function the AI think reflectively it has. In principle the AI could find out the difference and this could cause it to alter its behavior.

Or maybe not, I don't have a strong intuition about this at the moment. But if I recall correctly, in the previous work on corrigibility (I didn't read the last version you linked yet), Soares was thinking of using causal decision nodes to implement utility indifference for the shutdown problem. This effectively introduces false beliefs into the agent, as the agent is mistaken about what causes the button to be pressed.

Comment author: Stuart_Armstrong 17 November 2014 08:51:33PM 3 points [-]

Isn't that essentially a false beliefs about one's own preferences?

No. It's an adjusted preference that functions in practice just like a false belief.

Comment author: Toggle 17 November 2014 07:37:24PM *  1 point [-]

A belief that M has no impact will generate extremely poor predictions of the future, iff it's an UFAI; it's interesting to have a prescriptive belief in which Friendly agents definitionally believe a true thing and Unfriendly agents definitionally believe a false thing.

It's a solution that seems mostly geared towards problems of active utility deception- it prevents certain cases of an AI that deliberately games a metric. To the extent that a singleton is disingenuous about its own goals, this is a neat approach. I am a little worried that this kind of deliberate deception is stretching the orthogonality thesis further than it can plausibly go; with the right kind of careful experimentation and self-analysis, an UFAI with a proscribed falsehood might derive a literally irreconcilable set of beliefs about the world. I don't know if that would crack it down the middle or what, or how 'fundamental' that crack would go in its conception of reality.

We also have the problem of not being able to specify our own values with precision. If an AI will produce catastrophic results by naively following our exact instructions, then M will presumably be using the same metric, and it will give a green light on a machine that proceeds to break down all organic molecules for additional stock market construction projects or something. I suppose that this isn't really the sort of problem that you're trying to solve, but it is a necessary limitation in M even though M is fairly passive.

Comment author: Stuart_Armstrong 17 November 2014 08:18:53PM 0 points [-]

Whenever I use the colloquial phrase "the AI believes a false X" I mean that we are using utility indifference to accomplish that goal, without actually giving the AI false beliefs.

Comment author: V_V 17 November 2014 06:06:03PM *  1 point [-]

I'm generally skeptical about these framework that require agents to hold epistemically false beliefs.

What if the AI finds out about module M through a side channel? Depending on the details, either it will correctly update on the evidence and start to behave accordingly, or it will enter in an inconsistent epistemic state, and thus possibly behave erratically.

Comment author: Stuart_Armstrong 17 November 2014 06:40:00PM 1 point [-]

I'd be using utility indifference, rather than incorrect beliefs. It serves a similar purpose, without causing the AI to believe anything incorrect.

View more: Next