AI cooperation in practice

cousin_it

45 AI cooperation in practice

30th Jul 2010

1 min read

45

You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:

Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?

Wait, that was the easy version. Here's a harder puzzle:

Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.

Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

Decision theoryAI

Personal Blog

45

New Comment

Rendering 0/166 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:59 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

45 AI cooperation in practice

by cousin_it

30th Jul 2010

1 min read

166

45

You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:

Wait, that was the easy version. Here's a harder puzzle:

Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

Decision theoryAI

Personal Blog

45

Mentioned in

122Robust Cooperation in the Prisoner's Dilemma

47I'm open for projects (sort of)

36Decision Theories: A Semi-Formal Analysis, Part III

35Formulas of arithmetic that behave like decision agents

Load More (5/10)

New Comment

Rendering 0/166 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:59 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from cousin_it

Curated and popular this week

166Comments

166

Comment Permalink

cousin_it16y50

The argument is admittedly sketchy, but I don't think it depends much on the implementation of "proves". Of course I'd want the proof system to hold the same properties that are used in the proof of Löb's Theorem, because my argument is almost an exact replica of that; but pretty much all proof systems actually used by people (apart from some extremely weak ones) satisfy these properties. And you're guaranteed not to become worse off if you make your proof system stronger while your opponent stays the same. That's a really big deal, if my argument is indeed correct.

About heat death: even though the two programs say they can iterate up to 3^^^3 or something, the actual time to cooperation will be much less, because the reasoning steps that I outlined are exactly the reasoning steps they'd need to stumble upon. Yes, that would make a proof of length 10000 or so (which would mean 2^10000 proofs tried), but a that's long way from 3^^^3. If the programs use some kind of heuristics to search for working proofs (but still guarantee that they will do the brute force search if heuristics fail), the proofs can be found quicker still.

[anonymous]16y00

It's a nice result, but I doubt it'll ever be practically relevant.

You could implement the algorithm today, although with a limit of less than 3^^^3. Rough guesstimates suggest that about a limit of 50 would be workable at home, 100 for a worldwide distributed computing effort and 500 for a universe-sized computer that does one operation per bit per plank time. In all cases, nowhere near 10000 (let alone 3^^^3).

Heuristics could help a bit, but making the algorithm more complex is likely to also make the proof that it works longer and so harder to find. The fallback to brute-force search is unhelpful because no computer that could actually exist would be able to find the proof that way.

0timtyler16y

Re: "apart from some extremely weak ones" It does depend on the implementation - because it is relatively easy to imagine poor provers that result in a failure to understand and prove anything much. So, you would need to specify some constraints on the prover before you can make statements about what it outputs.

See in context