A model of UDT without proof limits

cousin_it

This post requires some knowledge of decision theory math. Part of the credit goes to Vladimir Nesov.

Let the universe be a computer program U that returns a utility value, and the agent is a subprogram A within U that knows the source code of both A and U. (The same setting was used in the reduction of "could" post.) Here's a very simple decision problem:

def U(): if A() == 1: return 5 else: return 10

The algorithm for A will be as follows:

Search for proofs of statements of the form "A()=a implies U()=u". Upon finding at least one proof for each possible a, go to step 2.
Let L be the maximum length of proofs found on step 1, and let f(L) be some suitably fast-growing function like 10^L. Search for proofs shorter than f(L) of the form "A()≠a". If such a proof is found, return a.
If we're still here, return the best a found on step 1.

The usual problem with such proof-searching agents is that they might stumble upon "spurious" proofs, e.g. a proof that A()==2 implies U()==0. If A finds such a proof and returns 1 as a result, the statement A()==2 becomes false, and thus provably false under any formal system; and a false statement implies anything, making the original "spurious" proof correct. The reason for constructing A this particular way is to have a shot at proving that A won't stumble on a "spurious" proof before finding the "intended" ones. The proof goes as follows:

Assume that A finds a "spurious" proof on step 1, e.g. that A()=2 implies U()=0. We have a lower bound on L, the length of that proof: it's likely larger than the length of U's source code, because a proof needs to at least state what's being proved. Then in this simple case 10^L steps is clearly enough to also find the "intended" proof that A()=2 implies U()=10, which combined with the previous proof leads to a similarly short proof that A()≠2, so the agent returns 2. But that can't happen if A's proof system is sound, therefore A will find only "intended" proofs rather than "spurious" ones in the first place.

Quote from Nesov that explains what's going on:

With this algorithm, you're not just passively gauging the proof length, instead you take the first moral argument you come across, and then actively defend it against any close competition

By analogy we can see that A coded with f(L)=10^L will correctly solve all our simple problems like Newcomb's Problem, the symmetric Prisoner's Dilemma, etc. The proof of correctness will rely on the syntactic form of each problem, so the proof may break when you replace U with a logically equivalent program. But that's okay, because "logically equivalent" for programs simply means "returns the same value", and we don't want all world programs that return the same value to be decision-theoretically equivalent.

A will fail on problems where "spurious" proofs are exponentially shorter than "intended" proofs (or even shorter, if f(L) is chosen to grow faster). We can probably construct malicious examples of decision-determined problems that would make A fail, but I haven't found any yet.

This post requires some knowledge of decision theory math. Part of the credit goes to Vladimir Nesov.

def U(): if A() == 1: return 5 else: return 10

The algorithm for A will be as follows:

Search for proofs of statements of the form "A()=a implies U()=u". Upon finding at least one proof for each possible a, go to step 2.
Let L be the maximum length of proofs found on step 1, and let f(L) be some suitably fast-growing function like 10^L. Search for proofs shorter than f(L) of the form "A()≠a". If such a proof is found, return a.
If we're still here, return the best a found on step 1.

Quote from Nesov that explains what's going on:

With this algorithm, you're not just passively gauging the proof length, instead you take the first moral argument you come across, and then actively defend it against any close competition

In my formulation of ASP, the predictor has a stronger formal system than the agent, so some statements can have short proofs in the predictor's formal system but only long proofs in the agent's formal system. An extreme example is consistency of the agent's formal system, which is unprovable by the agent but an axiom for the predictor.

Hm. So what would you still need to win. You'd still need to prove that A()==1 implies U()==10^6. But if that's true, the only way the agent doesn't prove that A()==2 implies U()==10^6+10^3 is if the predictor is really always right. But that's not provable to the agent, because I'm pretty sure that relies on the consistency of the agent's formal system.

21

A model of UDT without proof limits

21

21

21

A model of UDT without proof limits

21

21