LESSWRONG
LW

11

[ Question ]

How counterfactual are logical counterfactuals?

by Donald Hobson

15th Dec 2024

AI Alignment Forum

1 min read

11

Ω 4

Logical counterfactuals are when you say something like "Suppose , what would that imply?"

They play an important role in logical decision theory.

Suppose you take a false proposition $P$ and then take a logical counterfactual in which $P$ is true. I am imagining this counterfactual as a function $f (logical propositions) \to [0, 1] \subset R$ that sends counterfactually true statements to 1 and false statements to 0.

Suppose $P$ is " not Fermat's last theorem". In the counterfactual where Fermat's last theorem is false, I would still expect 2+2=4. Perhaps not with measure 1, but close. So $f (" 2 + 2 = 4 ") > 0.9$

On the other hand, I would expect trivial rephrasing of Fermat's last theorem to be false, or at least mostly false.

But does this counterfactual produce a specific counter example? Does it think that $14^{3} + 26^{3} = 29^{3}$ ? Or does it do something where the counterfactual insists a counter-example exists, but spreads probability over many possible counter-examples. Or does it act as if there is a non-standard number counterexample?

How would I compute the value of $f$ in general?

Suppose you are a LDT agent trying to work out whether to cooperate or defect in a prisoners dilemma.

What does the defect counterfactual look like? Is it basically the same as reality except you in particular defect. (So exact clones of you defect, and any agent that knows your exact source-code and is running detailed simulations will defect.)

Or is it broader than that, is this a counterfactual world in which all LDT agents defect in prisoners dilemma situations in general. Is this a counterfactual world in which a bunch of homo-erectus defected on each other, and then all went extinct, leaving a world without humans?

All of the thought about logical counterfactuals I have seen so far is on toy problems that divide the world into Exact-simulations-of-you and Totally-different-from-you.

I can't see any clear idea about what to do with the vaguely similar but not identical to you agents.

RationalityWorld Modeling

11

Ω 4

How counterfactual are logical counterfactuals?

New Answer

New Comment

2 Answers sorted by
top scoring

Dec 17, 2024

Ω3100

Truly logical counterfactuals really only make sense in the context of bounded rationality. That is, cases where there is a logically necessary proposition, but the agent cannot determine it within their resource bounds. Essentially all aspects of bounded rationality have no satisfactory treatment as yet.

The prisoners' dilemma question does not appear to require dealing with logical counterfactuals. It is not logically contradictory for two agents to make different choices in the same situation, or even for the same agent to make different decisions given the same situation, though the setup of some scenarios may make it very unlikely or even direct you to ignore such possibilities.

[-]Donald Hobson5moΩ120

There is a model of bounded rationality, logical induction.

Can that be used to handle logical counterfactuals?

[-]Donald Hobson5moΩ120

If two Logical Decision Theory agents with perfect knowledge of each other's source code play prisoners dilemma, theoretically they should cooperate.

LDT uses logical counterfactuals in the decision making.

If the agents are CDT, then logical counterfactuals are not involved.

2JBlack5mo

If they have source code, then they are not perfectly rational and cannot in general implement LDT. They can at best implement a boundedly rational subset of LDT, which will have flaws. Assume the contrary: Then each agent can verify that the other implements LDT, since perfect knowledge of the other's source code includes the knowledge that it implements LDT. In particular, each can verify that the other's code implements a consistent system that includes arithmetic, and can run the other on their own source to consequently verify that they themselves implement a consistent system that includes arithmetic. This is not possible for any consistent system. The only way that consistency can be preserved is that at least one cannot actually verify that the other has a consistent deduction system including arithmetic. So at least one of those agents is not a LDT agent with perfect knowledge of each other's source code. We can in principle assume perfectly rational agents that implement LDT, but they cannot be described by any algorithm and we should be extremely careful in making suppositions about what they can deduce about each other and themselves.

6Jiro5mo

I get the impression that "has the agent's source code" is some Yudkowskyism which people use without thinking. Every time someone says that, I always wonder "are you claiming that the agent that reads the source code is able to solve the Halting Problem?"

2Donald Hobson5mo

The Halting problem is a worst case result. Most agents aren't maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don't halt. There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement "if I cooperate, then they cooperate" and cooperating if they found a proof. (Ie searching all proofs containing <10^100 symbols)

2Jiro5mo

In a situation where you are asking a question about an ideal reasoner, having the agents be finite means you are no longer asking it about an ideal reasoner. If you put an ideal reasoner in a Newcomb problem, he may very well think "I'll simulate Omega and act according to what I find". (Or more likely, some more complicated algorithm that indirectly amounts to that.) If the agent can't do this, he may not be able to solve the problem. Of course, real humans can't, but this may just mean that real humans are, because they are finite, unable to solve some problems.

Dec 16, 2024

Ω010

This seems like 2 questions:

Can you make up mathematical counterfactuals and propagate the counterfactual to unrelated propositions? (I'd guess no. If you are just breaking a conclusion somewhere you can't propagate it following any rules unless you specify what those rules are, in which case you just made up a different mathematical system.)
Does the identical twin one shot prisoners dilemma only work if you are functionally identical or can you be a little different and is there anything meaningful that can be said about this? (I'm interested in this one also.)

[-]Viliam5moΩ130

Does the identical twin one shot prisoners dilemma only work if you are functionally identical or can you be a little different and is there anything meaningful that can be said about this?

I guess it depends on how much the parts that make you "a little different" are involved in your decision making.

If you can put it in numbers, for example -- I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q; also I care about the well-being of my twin with a coe... (read more)

2Donald Hobson5mo

And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn't perfect. A random 0.001% of neurons are deleted. Also, you know you aren't a copy. How would you calculate that probability p,q? Even in principle.

More from Donald Hobson

Curated and popular this week