LESSWRONG
LW

All of SamEisenstat's Comments + Replies

Reflective oracles as a solution to the converse Lawvere problem

Yeah, it's a bit weird to call this particular problem the converse Lawvere problem. The name suggests that there is a converse Lawvere problem for any concrete category and any base object $Y$--namely, to find an object $X$ and a map $f: X \to Y^X$ such that every morphism $X \to Y$ has the form $f(x)$ for some $x \in X$. But, among a small group of researchers, we ended up using the name for the case where $Y$ is the unit interval and the category is that of topological spaces, or alternatively a reasonable topological-like category. In this post, I play... (read more)

Shortform

SamEisenstat4mo215

I think a lot of this is factual knowledge. There are five publicly available questions from the FrontierMath dataset. Look at the last of these, which is supposed to be the easiest. The solution given is basically "apply the Weil conjectures". These were long-standing conjectures, a focal point of lots of research in algebraic geometry in the 20th century. I couldn't have solved the problem this way, since I wouldn't have recalled the statement. Many grad students would immediately know what to do, and there are many books discussing this, but there are a... (read more)

8TsviBT4mo

Pulling a quote from the tweet replies (https://x.com/littmath/status/1870560016543138191):

The consistent guessing problem is easier than the halting problem

SamEisenstat11mo10

Well, I guess describing a model of a computably enumerable theory, like PA or ZFC counts. We could also ask for a model of PA that's nonstandard in a particular way that we want, e.g. by asking for a model of $P A + \neg C o n (P A)$ , and that works the same way. Describing a reflective oracle has low solutions too, though this is pretty similar to the consistent guessing problem. Another one, which is really just a restatement of the low basis theorem, but perhaps a more evocative one, is as follows. Suppose some oracle machine $T$ has the property that ... (read more)

The consistent guessing problem is easier than the halting problem

SamEisenstat11mo*30

Yeah, there's a sort of trick here. The natural question is uniform--we want a single reduction that can work from any consistent guessing oracle, and we think it would be cheating to do different things with different oracles. But this doesn't matter for the solution, since we produce a single consistent guessing oracle that can't be reduced to the halting problem.

This reminds me of the theory of enumeration degrees, a generalization of Turing degrees allowing open-set-flavoured behaviour like we see in partial recursive functions; if the answer to an ora... (read more)

The consistent guessing problem is easier than the halting problem

SamEisenstat11mo192

Note that Andy Drucker is not claiming to have discovered this; the paper you link is expository.

Since Drucker doesn't say this in the link, I'll mention that the objects you're discussing are conventionally know as PA degrees. The PA here stands for Peano arithmetic; a Turing degree solves the consistent guessing problem iff it computes some model of PA. This name may be a little misleading, in that PA isn't really special here. A Turing degree computes some model of PA iff it computes some model of ZFC, or more generally any $Σ_{1}$ theory capable o... (read more)

3jessicata11mo

Ah, the low basis theorem does make more sense of Drucker's paper. I thought Turing degrees wouldn't be helpful because there are multiple consistent guessing oracles, but it looks like they are helpful. I hadn't heard of PA degrees, will look into it.

4notfnofn11mo

Are there any other nice decision problems that are low? A quick search only reveals existence theorems. Intuitive guess: Can we get some hierarchy from oracles to increasingly sparse subsets of the digits of Chaitin's constant?

SamEisenstat2y10

I haven't read too closely, but it looks like the equivalence relation that you're talking about in the post sets elements that are scalar multiples of each other in equivalence. This isn't the point of my equivalence; the stuff I wrote is all in terms of vectors, not directions. My other top-level comment discusses this.

SamEisenstat2y10

Yeah, this could be clearer. The point is that 1/(c(v+ + w−))*(v+ + w−) and 1/(c(v+ + w−))*(v- + w+) are formal sums of elements of L. These formal sums have positive coefficients which sum to 1, so they represent convex combinations. But their not equal as formal sums, only the results of applying the convex combination operation of L are equal.

SamEisenstat2y10

We can then quotient $L^{2}$ by this relation to get a vector space $V$

I think you're confusing two different parts here. There's a quotient of a vector space to get a vector space, which is done to embed $\mathcal{L}$ in a vector space. There's also something sort of like a projectivization, which does not produce a vector space. In the method I prefer, there isn't an explicit quotient, but instead just functions on the vector space that satisfy certain properties. (I could see being convinced to prefer the other version if it did improve the pres... (read more)

SamEisenstat2y10

$L^{2}$ of differences of lotteries

Is this supposed to be the square of the space of lotteries? The square would correspond to formal differences, but actual differences would be a different space.

The point of my construction with formal differences is that differences of lotteries are not defined a priori. If we embed $\mathcal{L}$ in a vector space then we have already done what my construction is for. This is all in https://link.springer.com/article/10.1007/BF02413910 in some form, and many other places.

Happy to talk more about this.

2Jesse Richardson2y

Yep I see what you mean, I've changed the setup back to what you wrote with V_1 and V_0. My main concern is the part where we quotient V_1 by an equivalence relation to get V, I found this not super intuitive to follow and I'd ideally love to have a simpler way to express it. The main part I don't get right now: I see that 1/(c(v+ + w−))*(v+ + w−) and 1/(c(v+ + w−))*(v- + w+) are convex combinations of elements of L and are therefore in L, however it seems to me that these two things being the same corresponds to v+ + w- = v- + w+, which is equivalent to v + v- + w- = v- + w + w- which is equivalent to v=w. So doesn't that mean that v is equivalent to w iff v=w so our equivalence relation isn't doing anything? I'm sure I'm misunderstanding something here. Also just wanted to check, v and w should be in V_1 right, rather than V_0? Or is that the source of my confusion. Hopefully I can try and find a way to express it once I fully understand it, but if you think what you wrote on Overleaf is the simplest way to do it then we'll use that.

Concave Utility Question

Answer by SamEisenstatApr 24, 2023Ω130

Q5 is true if (as you assumed), the space of lotteries is the space of distributions over a finite set. (For a general convex set, you can get long-line phenomena.)

First, without proof, I'll state the following generalization.

Theorem 1. Let $⪯$ be a relation on a convex space $L$ satisfying axioms A1, A2, A3, and the following additional continuity axiom. For all $A, B_{1}, B_{2}, C \in L$ , the set

{p \in [0, 1] ∣ A ≺ p B_{1} + (1 - p) B_{2} ≺ C}

is open in $[0, 1]$ . Then, there exists a function $u$ from $L$ to the long line such that $u (A) \leq u (B)$ i... (read more)

Modal Fixpoint Cooperation without Löb's Theorem

SamEisenstat2yΩ790

Nice, I like this proof also. Maybe there's a clearer way to say thing, but your "unrolling one step" corresponds to my going from $u$ to $v$ . We somehow need to "look two possible worlds deep".

Modal Fixpoint Cooperation without Löb's Theorem

SamEisenstat2yΩ162410

Here's a simple Kripke frame proof of Payor's lemma.

Let $⟨ W, R, ⊩ ⟩$ be a Kripke frame over our language, i.e. $W$ is a set of possible worlds, $R$ is an accessibility relation, and $⊩$ judges that a sentence holds in a world. Now, suppose for contradiction that $W ⊩ x \leftrightarrow □ (□ x \to x)$ but that $W ⊮ x$ , i.e. $x$ does not hold in some world $u \in W$ .

A bit of De Morganing tells us that the hypothesis on $x$ is equivalent to $\neg x \leftrightarrow ◊ (□ x \land \neg x)$ , so $u ⊩ ◊ (□ x \land \neg x)$ . So, there is some world $v$ with $u R v$ ... (read more)

James Payor2yΩ7100

Thanks! I also wrote up my proof in another comment, which should help triangulate intuition.

Alignment proposals and complexity classes

SamEisenstat5yΩ5100

Theorem. Weak HCH (and similar proposals) contain EXP.

Proof sketch: I give a strategy that H can follow to determine whether some machine that runs in $O (2^{c n^{k}})$ time accepts. Basically, we need to answer questions of the form "Does cell $x$ have value $y$ at time $t$ ." and "Was the head in position $x$ at time $t$ ?", where $x$ and $t$ are bounded by some function in $O (2^{c n^{k}})$ . Using place-system representations of $x$ and $t$ , these questions have length in $O (n^{k})$ , so they can be asked. Further, each question is a simple function of a constant number of other... (read more)

6evhub5y

Yeah, I think that's absolutely right—I actually already have a version of my market making EXP proof for amplification that I've been working on cleaning up for publishing. But regardless of how you prove it I certainly agree that I understated amplification here and that it can in fact get to EXP.

The Cartoon Guide to Löb's Theorem

SamEisenstat5y70

*I* think that there's a flaw in the argument.

I could elaborate, but maybe you want to think about this more, so for now I'll just address your remark about $\neg □ C \to C$ , where $C$ is refutable. If we assume that $\neg □ C \to C$ , then, since $C$ is false, $\neg □ C$ must be false, so $□ C$ must be true. That is, you have proven that PA proves $□ C$ , that is, since $C$ is contradictory, PA proves its own inconsistency. You're right that this is compatible with PA being consistent--PA may be consistent but prove its... (read more)

2jpulgarin5y

Thanks Sam. I thought about this some more and realized where I went wrong - I was applying the deduction theorem incorrectly (as other comments in this thread have pointed out). By the way, when you say that PA proves its own inconsistency, do you mean that PA⊢□(¬Con(PA)) or that PA⊢¬Con(PA)? From your reasoning I agree that if we assume ¬□C→C then we can arrive at PA⊢□C and PA⊢□(¬C), from which we can conclude that PA⊢□(¬Con(PA)). If you meant that PA⊢¬Con(PA) though, could you explain how you arrived at that?

Adding Up To Normality

SamEisenstat5y70

This reminds me of the Discourse on Method.

[T]here is seldom so much perfection in works composed of many separate parts, upon which different hands had been employed, as in those completed by a single master. Thus it is observable that the buildings which a single architect has planned and executed, are generally more elegant and commodious than those which several have attempted to improve, by making old walls serve for purposes for which they were not originally built. Thus also, those ancient cities which, from being at first only villages, have bec

SamEisenstat7yΩ110

As you say, this isn't a proof, but it wouldn't be too surprising if this were consistent. There is some $k \in N$ such that $□_{n} ϕ \to ϕ$ has a proof of length $n^{k}$ by a result of Pavel Pudlák (On the length of proofs of finitistic consistency statements in first order theories). Here I'm making the dependence on $n$ explicit, but not the dependence on $ϕ$ . I haven't looked at it closely, but the proof strategy in Theorems 5.4 and 5.5 suggests that $k$ will not depend on $ϕ$ , as long as we only ask for the weaker property that $□_{n} ϕ \to ϕ$ will only be provable in length $n^{k}$ for sentenc

... (read more)

0Diffractor7y

I found an improved version by Pavel, that gives a way to construct a proof of ϕ from □nϕ that has a length of O(n). The improved version is here. There are restrictions to this result, though. One is that the C-rule must apply to the logic. This is just the ability to go from ∃x:ϕ(x) to instantiating a c such that ϕ(c). Pretty much all reasonable theorem provers have this. The second restriction is that the theory must be finitely axiomatizable. No axiom schemas allowed. Again, this isn't much of a restriction in practice, because NBG set theory, which proves the consistency of ZFC, is finitely axiomatizable. The proof strategy is basically as follows. It's shown that the shortest proof of a statement with quantifier depth n must have a length of Ω(n2), if the maximum quantifier depth in the proof is 2n or greater. This can be flipped around to conclude that if there's a length-n proof of ϕ, the maximum quantifier depth in the proof can be at most O(√n). The second part of the proof involves constructing a bounded-quantifier version of a truth predicate. By Tarski's undefinability of truth, a full truth predicate cannot be constructed, but it's possible to exhibit a formula for which it's provable that qd(¯¯¯¯ψ)≤n→(Satn(¯¯¯¯ψ,x)↔Σ(Satn,¯¯¯¯ψ,x)) (Σ is the formula laying out Tarski's conditions for something to be a truth predicate). Also, if n≥ quantifier depth of ψ, there's a proof of Satn(¯¯¯¯ψ,x)↔ψ[x] (ψ[x] is the sentence ψ with its free variables substituted for the elements enumerated in the list x) Also, there's a proof that Satn is preserved under inference rules and logical axioms, as long as everything stays below a quantifier depth of n. All these proofs can be done in O(n2) lines. One factor of n comes from the formula abbreviated as Satn(x,y) getting longer at a linear rate, and the other factor comes from having to prove Satn for each n seperately as an ingredient for the next proof. Combining the two parts, the O(√n) bound on the quantifier de

A Loophole for Self-Applicative Soundness

SamEisenstat7yΩ000

I misunderstood your proposal, but you don't need to do this work to get what you want. You can just take each sentence $□_{n} ϕ \to ϕ$ as an axiom, but declare that this axiom takes $n$ symbols to invoke. This could be done by changing the notion of length of a proof, or by taking axioms $ψ_{ϕ, n} \to (□_{n} ϕ \to ϕ)$ and $ψ_{ϕ, n}$ with $ψ_{ϕ, n}$ very long.

Open question: are minimal circuits daemon-free?

SamEisenstat7y10

Yeah, I had something along the lines of what Paul said in mind. I wanted not to require that the circuit implement exactly a given function, so that we could see if daemons show up in the output. It seems easier to define daemons if we can just look at input-output behaviour.

Open question: are minimal circuits daemon-free?

SamEisenstat7yΩ7100

I'm having trouble thinking about what it would mean for a circuit to contain daemons such that we could hope for a proof. It would be nice if we could find a simple such definition, but it seems hard to make this intuition precise.

For example, we might say that a circuit contains daemons if it displays more optimization that necessary to solve a problem. Minimal circuits could have daemons under this definition though. Suppose that some function $f$ describes the behaviour of some powerful agent, a function $~ f$ is like $f$ with noise added, and our proble... (read more),,,,,

6paulfchristiano7y

I don't know what the statement of the theorem would be. I don't really think we'd have a clean definition of "contains daemons" and then have a proof that a particular circuit doesn't contain daemons. Also I expect we're going to have to make some assumption that the problem is "generic" (or else be careful about what daemon means), ruling out problems with the consequentialism embedded in them. (Also, see the comment thread with Wei Dai above, clearly the plausible version of this involves something more specific than daemons.)

1interstice7y

By "predict sufficiently well" do you mean "predict such that we can't distinguish their output"? Unless the noise is of a special form, can't we distinguish $f$ and $tilde{f}$ by how well they do on $f$'s goals? It seems like for this not to be the case, the noise would have to be of the form "occasionally do something weak which looks strong to weaker agents". But then we could get this distribution by using a weak (or intermediate) agent directly, which would probably need less compute.

No Constant Distribution Can be a Logical Inductor

SamEisenstat7yΩ120

Two minor comments. First, the bitstrings that you use do not all correspond to worlds, since, for example, $C o n (P A + C o n (P A))$ implies $C o n (P A)$ , as $P A$ is a subtheory of $P A + C o n (P A)$ . This can be fixed by instead using a tree of sentences that all diagonalize against themselves. Tsvi and I used a construction in this spirit in A limit-computable, self-reflective distribution, for example.

Second, I believe that weakening #2 in this post also cannot be satisfied by any constant distribution. To sketch my reasoning, a trader can try to buy a sequence of sentences $ϕ_{1}$

... (read more)

An Untrollable Mathematician

SamEisenstat7yΩ340

I at first didn't understand your argument for claim (2), so I wrote an alternate proof that's a bit more obvious/careful. I now see why it works, but I'll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of $Φ$ by having it announced by nature; there may be a high probability of sentences being decided indirectly via sentences in $Φ$ .

Instead of choosing $Φ$ as you describe, pick $Φ$ so that the probability $μ (Φ)$ of sampling something in $Φ$ is greater than $1 - μ (ψ) \cdot ε / 2$ . Then, the probabili

SamEisenstat8yΩ000

A few thoughts:

I agree that the LI criterion is "pointwise" in the way that you describe, but I think that this is both pretty good and as much as could actually be asked. A single efficiently computable trader can do a lot. It can enforce coherence on a polynomially growing set of sentences, search for proofs using many different proof strategies, enforce a polynomially growing set of statistical patterns, enforce reflection properties on a polynomially large set of sentences, etc. So, eventually the market will not be exploitable on all these things simu

... (read more)

0IAFF-User-2408y

Your points about the difficulty of getting uniform results in this framework are interesting. My inclination is to regard this as a failure of the framework. The LI paper introduced the idea of "e.c. traders," and the goal of not being exploitable (in some sense) by such traders; these weren't well-established notions which the paper simply proven some new theorems about. So they are up for critique as much as anything else in the paper (indeed, they are the only things up for critique, since I'm not disputing that the theorems themselves follow from the premises). And if our chosen framework only lets us prove something that is too weak, while leaving the most obvious strengthening clearly out of reach, that suggests we are not looking at the problem (the philosophical problem, about how to think about logical induction) at the right level of "resolution." As I said to Vadim earlier, I am not necessarily pessimistic about the performance of some (faster?) version of LIA with a "good" ordering for T^k. But if such a thing were to work, it would be for reasons above and beyond satisfying the LI criterion, and I wouldn't expect the LI criterion to do much work in illuminating its success. (It might serve as a sanity check -- too weak, but its negation would be bad -- but it might not end up being the kind of sanity check we want, i.e. the failures it does not permit might be just those required for good and/or fast finite-time performance. I don't necessarily think this is likely, but I won't know if it's true or not until the hypothetical work on LIA-like algorithms is done.)

Two Questions about Solomonoff Induction

SamEisenstat8yΩ220

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don't see a proof for either question yet.

An Approach to Logically Updateless Decisions

SamEisenstat8yΩ000

Yeah, I like tail dependence.

There's this question of whether for logical uncertainty we should think of it more as trying to "un-update" from a more logically informed perspective rather than trying to use some logical prior that exists at the beginning of time. Maybe you've heard such ideas from Scott? I'm not sure if that's the right perspective, but it's what I'm alluding to when I say you're introducing artificial logical uncertainty.

0abramdemski8y

I don't think it's much like un-updating. Un-updating takes a specific fact we'd like to pretend we don't know. Plus, the idea there is to back up the inductor. Here I'm looking at average performance as estimated by the latest stage of the inductor. The artificial uncertainty is more like pretending you don't know which problem in the sequence you're at.

Two Major Obstacles for Logical Inductor Decision Theory

SamEisenstat8yΩ110

Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can't prove statements $A = a \to U = u$ for agent $A$ , utility function $U$ , action $a$ , and utility value $u$ , at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn't necessary in this example, so it's more of an illustration.

To say a bit more, if you make logical inductors propositionally con

SamEisenstat8yΩ130

It's hard to analyze the dynamics of logical inductors too precisely, so we often have to do things that feel like worst-case analysis, like considering an adversarial trader with sufficient wealth. I think that problems that show up from this sort of analysis can be expected to correspond to real problems in superintelligent agents, but that is a difficult question. The malignancy of the universal prior is part of the reason.

As to your proposed solution, I don't see how it would work. Scott is proposing that the trader makes conditional contracts, which

SamEisenstat8yΩ220

In counterfactual mugging with a logical coin, AsDT always uses a logical inductor’s best-estimate of the utility it would get right now, so it sees the coin as already determined, and sees no benefit from giving Omega money in the cases where Omega asks for money.

The way I would think about what's going on is that if the coin is already known at the time that the expectations are evaluated, then the problem isn't convergent in the sense of AsDT. The agent that pays up whenever asked has a constant action, but it doesn't receive a constant expected util

... (read more)

2abramdemski8y

I think AsDT has a limited notion of convergent problem. It can only handle situations where the optimal strategy is to make the same move each time. Tail-dependence opens this up, partly by looking at the limit of average payoff rather than the limit of raw payoff. This allows us to deal with problems where the optimal strategy is complicated (and even somewhat dependent on what's done in earlier instances in the sequence). I wasn't thinking of it as introducing artificial logical uncertainty, but I can see it that way.

Generalizing Foundations of Decision Theory II

SamEisenstat8yΩ000

This isn't too related to your main point, but every ordered field can be embedded into a field of Hahn series, which might be simpler to work with than surreals.

That page discusses the basics of Hahn series, but not the embedding theorem. (Ehrlich, 1995) treats things in detail, but is long and introduces a lot of definitions. The embedding theorem is stated on page 23 (24 in the pdf).

The set of Logical Inductors is not Convex

SamEisenstat9yΩ110

Nicely done. I should have spent more time thinking about liar sentences; you really can do a lot with them.

Follow-up question - is the set of limits of logical inductors convex? (Your proof also makes me curious as to whether the set of "expanded limits" ${{liminf}_{n \to \infty} P_{n} (ϕ_{n})}_{¯ ¯ ¯ ϕ efficiently computable}$ is convex, though that question is a bit less nice than the other).

1LawrenceC8y

From conversation with Scott and Michael Dennis: there aren't enough logical inductors to make the set of limits convex, since there are an uncountable number of convex combinations but only a countable number of inductors (and thus limits). The interesting question would be whether any rational/computable convex combination of limits of logical inductors is a logical inductor.

A new proposal for logical counterfactuals

SamEisenstat9yΩ000

I'm copying over some (lightly edited) conversation about this post from Facebook.

Sam Eisenstat: In the original counterexample to the Trolljecture, I guess you're proposing that $C o n (P A)$ be logically prior to $A () = 1$ , so we can go back and replace $A () = 1$ with $A () = 2$ , but still have $C o n (P A)$ to use in order to derive $A () = 2 □ \to U () = 10$ ? Here I'm just using " $□ \to$ " to mean "counterfactually results in" without meaning to imply that this should be defined modally.

I agree that this is intuitively sensible, but it is difficult to give a general definition that agrees with

SamEisenstat9yΩ230

[EDIT: This all deals with measures, not semimeasures; see below.]

For $P$ to dominate $Q$ in the old sense means that its Bayes score can only be some constant worse than $Q$ , no worse, regardless of environment. My definition here implies that, but I think it’s a stricter requirement.

Your definition is equivalent to the standard definition. Li and Vitányi say that $P$ dominates $Q$ iff there is some $α \in R$ such that for any binary string $x$ , we have $α P (x) \geq Q (x)$ . Li and Vitányi's "probability distributions" take a binary string as input while the probability distribu

SamEisenstat10yΩ000

Haskell doesn't actually let you do this as far as I can tell, but the natural computational way to implement a function $B o t - > A$ is with a case expression with no cases. This is sensible because $B o t$ has no constructors, so it should be possible to pattern match it with such an expression. Another way of thinking of this is that we can use pattern matching on sum types and $B o t$ is just the $0$ -ary sum type.

0Quinn10y

Thanks for pointing that out. My feeling is still "well yes, that's technically true, but it still seems unnatural, and explosion is still the odd axiom out". Coq, for example, allows empty case expressions (for empty types), and I expect that other languages which double as proof assistants would support them as well... for the very purpose of satisfying explosion. General purpose languages like Haskell (and I just checked OCaml too) can seemingly overlook explosion/empty cases with few if any practical problems.

Optimal and Causal Counterfactual Worlds

SamEisenstat10yΩ000

Condition 4 in your theorem coincides with Lewis' account of counterfactuals. Pearl cites Lewis, but he also criticizes him on the ground that the ordering on worlds is too arbitrary. In the language of this post, he is saying that condition 2 arises naturally from the structure of the problem and that condition 4 is derives from the deeper structure corresponding to condition 2.

I also noticed that the function $f$ and the partial order $≻$ can be read as "time of first divergence from the real world" and "first diverges before", respectively. This makes the t

... (read more)

0Diffractor7y

Yeah, when I went back and patched up the framework of this post to be less logical-omniscence-y, I was able to get 2→3→4→1, but 2 is a bit too strong to be proved from 1, because my framing of 2 is just about probability disagreements in general, while 1 requires W to assign probability 1 to ϕ.

A Counterexample to an Informal Conjecture on Proof Length and Logical Counterfactuals

SamEisenstat10yΩ000

We can also construct an example where $P A ⊢ ϕ \to ψ$ with a short proof and $P A$ also proves $ϕ \to \neg ψ$ , but any such proof is much longer. We only need to put a bound on the proof length in A's proof search. Then, the argument that $A () = 1 \land U () \neq 0$ proves its own consistency still works, and is rather short: $O (log n)$ as the proof length bound $n$ increases. However, there cannot be a proof of $A () = 1 \to U () = 10$ within $A$ 's proof length bound, since if it found one it would immediately take action 1. In this case $P A$ can still prove that $A () = 1 \to U () = 10$ simply by running the agent, but thi

SamEisenstat10yΩ230

The argument that I had in mind was that if $P A ⊢ A () \neq 1$ , then $P A ⊢ □ ┌ A () \neq 1 ┐$ , so $P A ⊢ A () = 1$ since PA knows how the chicken rule works. This gives us $P A ⊢ ⊥$ , so PA can prove that if $P A ⊢ A () \neq 1$ , then PA is inconsistent. I'll include this argument in my post, since you're right that this was too big a jump.

Edit: We also need to use this argument to show that the modal UDT agent gets to the part where it iterates over utilities, rather than taking an action at the chicken rule step. I didn't mention this explicitly, since I felt like I had seen it before often enough, but

SamEisenstat10yΩ460

I (just yesterday) found a counterexample to this. The universe is a 5-and-10 variant that uses the unprovability of consistency:

def U():
  if A() == 2:
    if PA is consistent:
      return 10
    else:
      return 0
  else:
    return 5

The agent can be taken to be modal UDT, using PA as its theory. (The example will still work for other theories extending PA, we just need the universe's theory to include the agents. Also, to simplify some later arguments we suppose that the agent uses the chicken rule, and that it checks action 1 first, then action 2

... (read more)

2Benya_Fallenstein10y

It seems like this argument needs soundness of PA, not just consistency of PA. Do you see a way to prove in PA that if PA⊢A()≠1, then PA is inconsistent? [edited to add:] However, your idea reminds me of my post on the odd counterfactuals of playing chicken, and I think the example I gave there makes your idea go through: The scenario is that you get 10 if you take action 1 and it's not provable that you don't take action 1; you get 5 if you take action 2; and you get 0 if you take action 1 and it's provable that you don't. Clearly you should take action 1, but I prove that modal UDT actually takes action 2. To do so, I show that PA proves A()=1→¬□┌A()=1┐. (Then from the outside, A()=2 follows from the outside by soundness of PA.) This seems to make your argument go through if we can also show that PA doesn't show A()≠1. But if it did, then modal UDT would take action 1 because this comes first in its proof search, contradiction. Thus, PA proves A()=1→U()=0 (because this follows from A()=1→¬□┌A()=1┐), and also PA doesn't prove A()=1→U()=10. As in your argument, then, the trolljecture implies that we should think "if the agent takes action 1, it gets utility 0" is a good counterfactual, and we don't think that's true. Still interested in whether you can make your argument go through in your case as well, especially if you can use the chicken step in a way I'm not seeing yet. Like Patrick, I'd encourage you to develop this into a post.

0orthonormal10y

Nice! Yes, I encourage you to develop this into a post.