whpearson comments on Re-formalizing PD - Less Wrong

28 Post author: cousin_it 28 April 2009 12:10PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (57)

You are viewing a single comment's thread.

Comment author: whpearson 28 April 2009 02:25:48PM 0 points [-]

Unless I've missed something. What say ye?

Verifying source code is not quite enough. You need to verify that the compiler/interpreter or hardware that the sourcecode is run on treat the prefix semantically identically.

E.g. From a machine code perspective if 04 is the opcode for ifeq in one agents architecture and ifneq in anothers, they can transfer source code and still not be sure what the other will do.

Comment author: cousin_it 28 April 2009 02:30:46PM *  1 point [-]

Well, technically yes, but I'd say that falls under "nitpick". In any actual tournament the execution environment would be published common knowledge, as in the shootout for example.

Comment author: whpearson 28 April 2009 02:43:02PM 0 points [-]

True, we had different uses in mind. I was trying to forestall any people using the idea as an argument that future AI swapping source code should necessarily cooperate.

By the way, I liked the math at the end, although I don't have time to sit down and check my intuitions.

Comment author: Eliezer_Yudkowsky 29 April 2009 01:03:51AM 1 point [-]

Then the question is whether AIs can (a) trustworthily verify each other's source code by e.g. sending in probes to do random inspections which include verifying that the inspected software is not deliberately opaque nor is it set to accept coded overrides from outside, or (b) don't need to verify each other's source code because the vast majority of initial conditions converge on the same obvious answer to PD problems.

Comment author: Wei_Dai 29 April 2009 02:37:45AM 0 points [-]

(a) Random inspections probably won't work. It's easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won't detect the backdoor unless you can see the entire system as a whole.

Tim Freeman's "proof by construction" method is the only viable solution to the "prove your source code" problem that I've seen so far.

(b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other's source code, won't they want to modify their source code to play Defect again?

Comment author: Eliezer_Yudkowsky 29 April 2009 06:21:14AM 2 points [-]

Look innocent to a cursory human inspection, yes. But if hardware is designed to be deterministically cooperative/coordinating and to provably not be a backdoor in combination with larger hardware, that sounds like something that should be provable if the hardware was designed with that provability in mind.

Comment author: Wei_Dai 29 April 2009 07:31:47PM 2 points [-]

Many governments, including the US, are concerned right now that their computers have hardware backdoors, so the current lack of research results on this topic is not just due to lack of interest, but probably intrinsic difficulty. Even if provable hardware is physically possible and technically feasible in the future, there is likely a cost attached, for example running slower than non-provable hardware or using more resources.

Instead of confidently predicting that AIs will Cooperate in one-shot PD, wouldn't it be more reasonable to say that this is a possibility, which may or may not occur, depending on the feasibility and economics of various future technologies?

Comment author: Vladimir_Nesov 29 April 2009 07:40:22PM *  0 points [-]

The singleton scenario seems overwhelmingly likely, so whatever multiple AIs will exist, they'll play by the singleton's rules, with native physics becoming irrelevant. (I know, I know...)

Comment author: cousin_it 29 April 2009 07:23:35AM *  2 points [-]

I believe this stuff bottoms out in physics - it's either possible or impossible to make a physically provable analog to the PREFIX program. The idea is fascinating, but I don't know enough physics to determine whether it's crazy.

Comment author: whpearson 29 April 2009 10:18:47AM *  0 points [-]

The difficulty would be to make sure nothing could interact with the atoms/physical constituents of the prefix in a way that distorts the prefix. Prefixes of programs have the benefit they go first, and in the serial nature of most programs, things that go first have complete control.

So it is a question of isolating the prefix. I'm going to read this paper on isolation and physics, before making any comments on the subject.

Comment author: cousin_it 29 April 2009 10:57:52AM 0 points [-]

I read the paper, and it seemed to me to be useless. We want a physically inviolable guarantee of isolation.

Comment author: whpearson 29 April 2009 11:57:07AM *  0 points [-]

It gave some ideas. It suggests we might start with specifying time limits, e.g. specifying a system will be effectively isolated for a certain time, by scanning a region of space around that system.