All of dankane's Comments + Replies

I feel like your discussion of predictors makes a few not-necessarily-warranted assumptions about how the predictor deals with self-reference. Then again, I guess anything that doesn't do this fails as a predictor in a wide range of useful cases. It predicts a massive fire will kill 100 people, and so naturally this prediction is used to invalidate the original prediction.

But there is a simple-ish fix. What if you simply ask it to make predictions about what would happen if it (and say all similar predictors) suddenly stopped functioning immediately before this prediction was returned?

Unless you can explain to me how prediction markets are going to break the pattern that two different shares of the same stock have correlated prices.

I'm actually not sure how prediction markets are supposed to have an effect on this issue. My issue is not that people have too much difficulty recognizing patterns. My issue is that some patterns once recognized do not provide incentives to make that pattern disappear. Unless you can tell me how prediction markets might fix this problem, your response seems like a bit of a non-sequitur.

This seems like too general a principle. I agree that in many circumstances, public knowledge of a pattern in pricing will lead to effects causing that pattern to disappear. However, it is not clear to me that this is always to case, or that the size of the effect will be sufficient to complete cancel out the original observation.

For example, I observe that two different units of Google stock have prices that are highly correlated with each other. I doubt that this observation will cause separate markets to spring up giving wildly divergent prices to diffe... (read more)

1a gently pricked vein
Is it unfair to say that prediction markets will deal with all of these cases? I understand that's like responding to "This is a complicated problem that may remain unsolved, it is not clear that we will be able to invent the appropriate math to deal with this." with "But Church-Turing thesis!". But all I'm saying is that it does apply generally, given the right apparatus.

We probably couldn't even talk ourselves out of this box.

I don't know... That sounds a lot like what an AI trying to talk itself out of a box would say.

Hmm... I would probably explain the threshold for staying in the house not as an implicit expected probability computation, but an evaluation of the price of the discomfort associated with staying in a location that you find spooky. At least for me, I think that the part of my mind that knows that ghosts do not exist would have no trouble controlling whether or not I remain in the house or not. However, it might well decide that it is not worth the $10 that I would receive to spend the entire night in a place where some other piece of my mind is constantly yelling at me to run away screaming.

dankane40

It's just that such self-referential criteria as reflective equilibrium are a necessary condition

Why? The only example of adequately friendly intelligent systems that we have (i.e. us) don't meet this condition. Why should reflective equilibrium be a necessary condition for FAI?

0Stuart_Armstrong
Because FAI's can change themselves very effectively in ways that we can't. It might be that human brain in computer software would have the same issues.
dankane20

That may be true (at least to the degree to which it is sensible to assign a specific cause to a given util). However, it is not very good evidence that investment in first world economies is the most effective way to generate utils in Africa.

dankane30

OK. So suppose that I grant your claim that donations to sub-Saharan Africa will not substantially affect the size of the future economic pie, but that other investments will. I claim that there may still be reason to donate there.

I grant that such a donation will produce fewer dollars of value than investing in capitol infrastructure. On the other hand dollars is not the objective, utils are. We can reasonably assume that marginal utility of an extra dollar for a given person is decreasing as that person's wealth increases. We can reasonably expect that w... (read more)

2drethelin
I think the vast majority of utils created in sub-saharan africa are a byproduct of wealth created elsewhere.
dankane00

[I realize that I missed the train and probably very few people will read this, but here goes]

So in non-iterated prisoner's dilemma, defect is a dominant strategy. No matter what the opponent is doing, defecting will always give you the best possible outcome. In iterated prisoner's dilemma, there is no longer a dominant strategy. If my opponent is playing Tit-for-Tat, I get the best outcome by cooperating in all rounds but the last. If my opponent ignores what I do, I get the best outcome by always defecting. It is true that all defects is the unique Nash ... (read more)

dankane00

I think that the way that humans predict other humans is the wrong way to look at this, and instead consider how humans would reason about the behavior of an AI that they build. I'm not proposing simply "don't use formal systems", or even "don't limit yourself exclusively to a single formal system". I am actually alluding to a far more specific procedure:

  • Come up with a small set of basic assumptions (axioms)
  • Convince yourself that these assumptions accurately describe the system at hand
  • Try to prove that the axioms would imply the des
... (read more)
dankane00

Yes, obviously. We solve the Lobstacle by not ourselves running on formal systems and sometimes accepting axioms that we were not born with (things like PA). Allowing the AI to only do things that it can prove will have good consequences using a specific formal system would make it dumber than us.

4orthonormal
I think, rather, that humans solve decision problems that involve predicting other human deductive processes by means of some evolved heuristics for social reasoning that we don't yet fully understand on a formal level. "Not running on formal systems" isn't a helpful answer for how to make good decisions.
dankane40

Actually, why is it that when the Lobian obstacle is discussed that it seem to always be in reference to an AI trying to determine if a successor AI is safe, and not an AI trying to determine whether or not it, itself, is safe?

3orthonormal
Because we're talking about criteria for action, not epistemology. The heart of the Lobstacle problem is that straightforward ways of evaluating the consequences of actions start to break down when those consequences involve the outcomes of deductive processes equal to or greater than the one brought to bear.
dankane20

Question: If we do manage to build a strong AI, why not just let it figure this problem out on its own when trying to construct a successor? Almost definitionally, it will do a better job of it than we will.

4orthonormal
The biggest problem with deferring the Lobstacle to the AI is that you could have a roughly human-comparable AI which solves the Lobstacle in a hacky way, which changes the value system for the successor AI, which is then intelligent enough to solve the Lobstacle perfectly and preserve that new value system. So now you've got a superintelligent AI locked in on the wrong target.
0hairyfigment
If you want to take that as a definition, then we can't build a strong AI without solving the Lobstacle!
dankane20

Relatedly, with your interview example, I think that perhaps a better model is that whether a person is confident or shy is not depending on whether they believe that they will be bold or not, but upon the degree to which they care about being laughed at. If you are confident, you don't care about being laughed at and might as well be bold. If you are afraid of being laughed at, you already know that you are shy and thus do not gain anything by being bold.

dankane20

I think my bigger point is that you don't seem to make any real argument as to which case we are in. For example, consider the following model of how people's perception of my trustworthiness might be correlated to my actual trustworthiness: There are two causal chains: My values -> Things I say -> Peoples' perceptions My values -> My actions So if I value trustworthiness, I will not, for example talk much about wanting to avoid being sucker (in contexts where it would refer to be doing trustworthy things). This will influence peoples' perceptions... (read more)

dankane150

Newcomblike problems occur whenever knowledge about what decision you will make leaks into the environment. The knowledge doesn't have to be 100% accurate, it just has to be correlated with your eventual actual action.

This is far too general. The way in which information is leaking into the environment is what separates Newcomb's problem from the smoking lesion problem. For your argument to work you need to argue that whatever signals are being picked up on would change if the subject changed their disposition, not merely that these signals are correlated with the disposition.

2dankane
Relatedly, with your interview example, I think that perhaps a better model is that whether a person is confident or shy is not depending on whether they believe that they will be bold or not, but upon the degree to which they care about being laughed at. If you are confident, you don't care about being laughed at and might as well be bold. If you are afraid of being laughed at, you already know that you are shy and thus do not gain anything by being bold.
2dankane
I think my bigger point is that you don't seem to make any real argument as to which case we are in. For example, consider the following model of how people's perception of my trustworthiness might be correlated to my actual trustworthiness: There are two causal chains: My values -> Things I say -> Peoples' perceptions My values -> My actions So if I value trustworthiness, I will not, for example talk much about wanting to avoid being sucker (in contexts where it would refer to be doing trustworthy things). This will influence peoples' perceptions of whether or not I am trustworthy. Furthermore, if I do value trustworthiness, I will want to be trustworthy. This setup makes things look very much like the smoking lesion problem. A CDT agent that values trustworthiness will be trustworthy because they place intrinsic value in it. A CDT agent that does not value trustworthiness will be perceived as being untrustworthy. Simply changing their actions will not alter this perception, and therefore they will fail to be trustworthy in situations where it benefits them, and this is the correct decision. Now you might try to break the causal link: My values -> Things that I say And doing so is certainly possible (I mean you can have spies that successfully pretend to be loyal for extended periods without giving themselves away). On the other hand, it might not happen often for several possible reasons: A) Maintaining a facade at all times is exhausting (and thus imposes high costs) B) Lying consistently is hard (as in too computationally expensive) C) The right way to lie consistently, is to simulate the altered value set, but this may actually lead to changing your values (standard advice for become more confident is pretending to be confident, right?). So yes, in this model an non-trust-valuing and self-modifying CDT agent will self-modify, but it will need to self-modify its values rather than its decision theory. Using a decision theory that is trustworthy despite not in
5So8res
Right you are. Edited for clarity.
dankane00

Sorry. I'm not quite sure what you're saying here. Though, I did ask for a specific example, which I am pretty sure is not contained here.

Though to clarify, by "reading your mind" I refer to any situation in which the scenario you face (including the given description of that scenario) depends directly on which program you are running and not merely upon what that program outputs.

dankane00

Well, yes. Then again, the game was specified as PD against BOT^CDT not as PD against BOT^{you}. It seems pretty clear that for X not equal to CDT that it is not the case that X could achieve the result CC in this game. Are you saying that it is reasonable to say that CDT could achieve a result that no other strategy could just because it's code happens to appear in the opponent's program?

I think that there is perhaps a distinction to be made between things that happen to be simulating your code and this that are causally simulating your code.

dankane00

OK. Fine. Point taken. There is a simple fix though.

MBOT^X(Y) = X'(MBOT^X) where X' is X but with randomized irrelevant experiences.

In order to produce this properly, MBOT only needs to have your prior (or a sufficiently similar probability distribution) over irrelevant experiences hardcoded. And while your actual experiences might be complicated and hard to predict, your priors are not.

dankane00

No. BOT^CDT = DefectBot. It defects against any opponent. CDT could not cause it to cooperate by changing what it does.

If it cooperated, it would get CC instead of DD.

Actually if CDT cooperated against BOT^CDT it would get $3^^^3. You can prove all sorts of wonderful things once you assume a statement that is false.

Depending on the exact setup, "irrelevant details in memory" are actually vital information that allow you to distinguish whether you are "actually playing" or are being simulated in BOT's mind.

OK... So UDT^Red and U... (read more)

0nshepperd
No, because that's a silly thing to do in this scenario. For one thing, UDT will see that they are reasoning the same way (because they are selfish and only consider "my color" vs "other color"), and therefore will both do the same thing. But also, depending on the setup, UDT^Red's prior should give equal probability to being painted red and painted blue anyway, which means trying to make the outcome favour red is silly. Compare to the version of newcomb's where the bot in the room is UDT^Red, while Omega simulates UDT^Blue. UDT can implement the conditional strategy {Red => two-box, Blue => one-box}. This is obviously unlikely, because the point of the Newcomb thought experiment is that Omega simulates (or predicts) you. So he would clearly try to avoid adding such information that "gives the game away". However in this scenario you say that BOT simulates UDT "by coincidence", not by mind reading. So it is far more likely that BOT simulates (the equivalent) of UDT^Blue, while the UDT actually playing is UDT^Red. And you are passed the code of BOT as input, so UDT can simply implement the conditional strategy {cooperate iff the color inside BOT is the same as my color}.
3hairyfigment
Maybe I've misread you, but this sounds like an assertion that your counterfactual question is the right one by definition, rather than a meaningful objection.
dankane10

It's hard to see how this doesn't count as "reading your mind".

So... UDT's source code is some mathematical constant, say 1893463. It turns out that UDT does worse against BOT^1893463. Note that it does worse against BOT^1893463 not BOT^{you}. The universe does not depend on the source code of the person playing the game (as it does in mirror PD). Furthermore, UDT does not control the output of its environment. BOT^1893463 always cooperates. It cooperates against UDT. It cooperates against CDT. It cooperates everything.

But this isn't due to

... (read more)
4nshepperd
I take it back, the scenario isn't that weird. But your argument doesn't prove what you think it does: Consider the analogous scenario, where CDT plays against BOT = CDT(BOT). CDT clearly does the wrong thing here - it defects. If it cooperated, it would get CC instead of DD. Note that if CDT did cooperate, UDT would be able to freeload by defecting (against BOT = CDT(BOT)). But CDT doesn't care about that because the prisoner's dilemma is defined such that we don't care about freeloaders. Nevertheless CDT defects and gets a worse result than it could. CDT does better than UDT against BOT = UDT(BOT) because UDT (correctly) doesn't care that CDT can freeload, and correctly cooperates to gain CC. Depending on the exact setup, "irrelevant details in memory" are actually vital information that allow you to distinguish whether you are "actually playing" or are being simulated in BOT's mind.
dankane00

Actually, I think that you are misunderstanding me. UDT's current epistemic state (at the start of the game) is encoded into BOT^UDT. No mind reading involved. Just a coincidence. [Really, your current epistemic state is part of your program]

Your argument is like saying that UDT usually gets $1001000 in Newcomb's problem because whether or not the box was full depended on whether or not UDT one-boxed when in a different epistemic state.

2nshepperd
Okay, you're saying here that BOT has a perfect copy of the UDT player's mind in its own code (otherwise how could it calculate UDT(BOT) and guarantee that the output is the same?). It's hard to see how this doesn't count as "reading your mind". Yes, sometimes its advantageous to not control the output of computations in the environment. In this case UDT is worse off because it is forced to control both its own decision and BOT's decision; whereas CDT doesn't have to worry about controlling BOT because they use different algorithms. But this isn't due to any intrinsic advantage of CDT's algorithm. It's just because they happen to be numerically inequivalent. An instance of UDT with literally any other epistemic state than the one contained in BOT would do just as well as CDT here.
dankane10

OK. Let me say this another way that involves more equations.

So let's let U(X,Y) be the utility that X gets when it plays prisoner's dilemma against Y. For a program X, let BOT^X be the program where BOT^X(Y) = X(BOT^X). Notice that BOT^X(Y) does not depend on Y. Therefore, depending upon what X is BOT^X is either equivalent CooperateBot or equivalent to DefectBot.

Now, you are claiming that UDT plays optimally against BOT_UDT because for any strategy X U(X, BOT^X) <= U(UDT, BOT^UDT) This is true, because X(BOT^X) = BOT^X(X) by the definition of BOT^X. T... (read more)

2lackofcheese
Actually, you've oversimplified and missed something critical. In reality, the only way you can force BOT^UDT(X) = UDT(BOT^UDT) = C is if the universe does, in fact, read your mind. In general, UDT can map different epistemic states to different actions, so as long as BOT^UDT has no clue about the epistemic state of the UDT agent it has no way of guaranteeing that its output is the same as that of the UDT agent. Consequently, it's possible for the UDT agent to get DC as well. The only way BOT^UDT would be able to guarantee that it gets the same output as a particular UDT agent is if the universe was able to read the UDT agent's mind.
dankane10

No. BOT(X) is cooperate for all X. It behaves in exactly the same way that CooperateBot does, it just runs different though equivalent code.

And my point was that CDT does better against BOT than UDT does. I was asked for an example where CDT does better than UDT where the universe cannot read your mind except via through your actions in counterfactuals. This is an example of such. In fact, in this example, the universe doesn't read your mind at all.

Also your argument that UDT cannot possibly do better against BOT than it does in analogous to the argument t... (read more)

dankane10

It's not UDT. It's the strategy that against any opponent does what UDT would do against it. In particular, it cooperates against any opponent. Therefore it is CooperateBot. It is just coded in a funny way.

To be clear letting Y(X) be what Y does against X we have that BOT(X) = UDT(BOT) = C This is different from UDT. UDT(X) is D for some values of X. The two functions agree when X=UDT and in relatively few other cases.

3lackofcheese
What is your point, exactly? It's clear that UDT can't do better vs "BOT" than by cooperating, because if UDT defects against BOT then BOT defects against UDT. Given that dependency, you clearly can't call it CooperateBot, and it's clear that UDT makes the right decision by cooperating with it because CC is better than DD.
dankane30

I think you mean that rational agents cannot be successfully blackmailed by others agents that for which it is common knowledge that the other agents can simulate them accurately and will only use blackmail if they predict it to be successful. All of this of course in the absence of mitigating circumstances (including for example the theoretical likelihood of other agents that reward you for counterfactualy giving into blackmail under these circumstances).

dankane10

I suppose. On the other hand, is that because other people can read your mind or because you have emotional responses that you cannot suppress and are correlated to what you are thinking? This is actually critical to what counterfactuals you want to construct.

Consider for example the terrorist who would try to bring down an airplane that he is on given the opportunity. Unfortunately, he's an open book and airport security would figure out that he's up to something and prevent him from flying. This is actually inconvenient since it also means he can't use a... (read more)

dankane-10

I'm sure we could think of some

OK. Name one.

3[anonymous]
Correlation-by-congruent-logic can show up in situations that don't necessarily have to do with minds, particularly the agent's mind, but the agent needs to either have an epistemology capable of noticing the correlations and equating them logically within its decision-making procedure -- TDT reaches in that direction.
dankane10

Fine. Your opponent actually simulates what UDT would do if Omega had told it that and returns the appropriate response (i.e. it is CooperateBot, although perhaps your finite prover is unable to verify that).

2nshepperd
Err, that's not CooperateBot, that's UDT. Yes, UDT cooperates with itself. That's the point. (Notice that if UDT defects here, the outcome is DD.)
dankane20

Actually, this is a somewhat general phenomenon. Consider for example, the version of Newcomb's problem where the box is full "if and only if UDT one-boxes in this scenario".

UDT's optimality theorem requires the in the counterfactual where it is replaced by a different decision theory that all of the "you"'s referenced in the scenario remain "you" rather than "UDT". In the latter counterfactual CDT provably wins. The fact that UDT wins these scenarios is an artifact of how you are constructing your scenarios.

dankane10

Or how about this example, that simplifies things even further. The game is PD against CooperateBot, BUT before the game starts Omega announces "your opponent will make the same decision that UDT would if I told them this." This announcement causes UDT to cooperate against CooperateBot. CDT on the other hand, correctly deduces that the opponent will cooperate no matter what it does (actually UDT comes to this conclusion too) and therefore decides to defect.

2nshepperd
No. There is no obligation to do something just because Omega claims that you will. First, if I know that my opponent is CooperateBot, then:- 1. It is known that Omega doesn't lie. 2. Therefore Omega has simulated this situation and predicted that I (UDT) cooperate. 3. Hence, I can either cooperate, and collect the standard reward for CC. 4. Or I can defect, in order to access an alternative branch of the problem (where Omega finds that UDT defects and does "something else"). 5. This alternative branch is unspecified, so the problem is incomplete. UDT cooperates or defects depending on the contents of the alternative branch. If the alternative branch is unknown then it must guess, and most likely cooperates to be on the safe side. Now, the problem is different if a CDT agent is put in my place, because that CDT agent does not control (or only weakly controls) the action of the UDT simulation that Omega ran in order to make the assertion about UDT's decision.
2dankane
Actually, this is a somewhat general phenomenon. Consider for example, the version of Newcomb's problem where the box is full "if and only if UDT one-boxes in this scenario". UDT's optimality theorem requires the in the counterfactual where it is replaced by a different decision theory that all of the "you"'s referenced in the scenario remain "you" rather than "UDT". In the latter counterfactual CDT provably wins. The fact that UDT wins these scenarios is an artifact of how you are constructing your scenarios.
dankane30

The CDT agents here are equivalent to DefectBot

And the UDT agents are equivalent to CooperateBot. What's your point?

dankane20

The CDT agents here win because they do not believe that altering their strategy will change the way that their opponents behave. This is actually true in this case, and even true for the UDT agents depending on how you choose to construct your counterfactuals. If a UDT agent suffered a malfunction and defected, it too would do better. In any case, the theorem that UDT agents perform optimally in universes that can only read your mind by knowing what you would do in hypothetical situations is false as this example shows.

UDT bots win in some scenarios where... (read more)

dankane20

Actually thinking about it this way, I have seen the light. CDT makes the faulty assumption that your initial state in uncorrelated with the universe that you find yourself in (who knows, you might wake up in the middle of Newcomb's problem and find that whether or not you get $1000000 depends on whether or not your code is such that you would one-box in Newcomb's problem). UDT goes some ways to correct this issue, but it doesn't go far enough.

I would like to propose a new, more optimal decision theory. Call it ADT for Anthropic Decision Theory. Actually, ... (read more)

dankane10

Basically this, except there's no need to actually do it beforehand.

Actually, no. To implement things correctly, UDT needs to determine its entire strategy all at once. It cannot decide whether to one-box or two-box in Newcomb just by considering the Newcomb that it is currently dealing with. It must also consider all possible hypothetical scenarios where any other agent's action depends on whether or not UDT one-boxes.

Furthermore, UDT cannot decide what it does in Newcomb independently of what it does in the Counterfactual Mugging, because some hypothe... (read more)

1nshepperd
Conceptually, yes. The point is that you don't need to actually literally explicitly compute your entire strategy at t=-∞. All you have to do is prove a particular property of the strategy (namely, its action in situation Y) at the time when you are asked for a decision. Obviously, like every computational activity ever, you must still make approximations, because it is usually infeasible to make inferences over the entire tegmark-IV multiverse when you need to make a decision. An example of such approximations would be neglecting the measure of "entities that give it rewards based on some combination of [newcomb's and counterfactual mugging]" in many situations because I expect such things to be rare (significantly rarer than newcomb's and counterfactual mugging themselves).
dankane20

Well, perhaps. I think that the bigger problem is that under reasonable priors P(Newcomb) and P(anti-Newcomb) are both so incredibly small that I would have trouble finding a meaningful way to approximate their ratio.

How confident are you that UDT actually one-boxes?

Also yeah, if you want a better scenario where UDT loses see my PD against 99% prob. UDT and 1% prob. CDT example.

dankane10

CDT does not avoid this issue by "setting its priors to the delta function". CDT deals with this issue by being a theory where your course of action only depends on your posterior distribution. You can base your actions only on what the universe actually looks like rather than having to pay attention to all possible universes. Given that it's basically impossible to determine anything about what Kolmogorov priors actually say, being able to totally ignore parts of probability space that you have ruled out is a big deal.

... And this whole issue w... (read more)

2dankane
Actually thinking about it this way, I have seen the light. CDT makes the faulty assumption that your initial state in uncorrelated with the universe that you find yourself in (who knows, you might wake up in the middle of Newcomb's problem and find that whether or not you get $1000000 depends on whether or not your code is such that you would one-box in Newcomb's problem). UDT goes some ways to correct this issue, but it doesn't go far enough. I would like to propose a new, more optimal decision theory. Call it ADT for Anthropic Decision Theory. Actually, it depends on a prior, so assume that you've picked out one of those. Given your prior, ADT is the decision theory D that maximizes the expected (given your prior) lifetime utility of all agents using D as their decision theory. Note how agents using ADT do provably better than agents using any other decision theory. Note that I have absolutely no idea what ADT does in, well, any situation, but that shouldn't stop you from adopting it. It is optimal after all.
dankane20

I guess my point is that it is nonsensical to ask "what does UDT do in situation X" without also specifying the prior over possible universes that this particular UDT is using. Given that this is the case, what exactly do you mean by "losing game X"?

2nshepperd
Well, you can talk about "what does decision theory W do in situation X" without specifying the likelyhood of other situations, by assuming that all agents start with a prior that sets P(X) = 1. In that case UDT clearly wins the anti-newcomb scenario because it knows that actual newcomb's "never happens" and therefore it (counterfactually) two-boxes. The only problem with this treatment is that in real life P(anti-newcomb) = 1 is an unrealistic model of the world, and you really should have a prior for P(anti-newcomb) vs P(newcomb). A decision theory that solves the restricted problem is not necessarily a good one for solving real life problems in general.
dankane40

Actually, here's a better counter-example, one that actually exemplifies some of the claims of CDT optimality. Suppose that the universe consists of a bunch of agents (who do not know each others' identities) playing one-off PDs against each other. Now 99% of these agents are UDT agents and 1% are CDT agents.

The CDT agents defect for the standard reason. The UDT agents reason that my opponent will do the same thing that I do with 99% probability, therefore, I should cooperate.

CDT agents get 99% DC and 1% DD. UDT agents get 99% CC and 1% CD. The CDT agents in this universe do better than the UDT agents, yet they are facing a perfectly symmetrical scenario with no mind reading involved.

2Wei Dai
A version of this problem was discussed here previously. It was also brought up during the decision theory workshop hosted by MIRI in 2013 as an open problem. As far as I know there hasn't been much progress on it since 2009.
2nshepperd
That shouldn't be surprising. The CDT agents here are equivalent to DefectBot, and if they come into existence spontaneously, are no different than natural phenomena like rocks. Notice that the UDT agents in this situation do better than the alternative (if they defected, they would get 100% DD which is a way worse result). They don't care that some DefectBots get to freeload. Of course, if the defectbots are here because someone calculated that UDT agents would cooperate and therefore being defectbot is a good way to get free utilons... then the UDT agents are incentivized to defect, because this is now an ultimatum game. And in the variant where bots do know each other's identities, the UDT bots all get 99% CC / 1% DD and the CDT bots suck it.
1one_forward
This is a good example. Thank you. A population of 100% CDT, though, would get 100% DD, which is terrible. It's a point in UDT's favor that "everyone running UDT" leads to a better outcome for everyone than "everyone running CDT."
dankane10

I think some that favor CDT would claim that you are are phrasing the counterfactual incorrectly. You are phrasing the situation as "you are playing against a copy of yourself" rather than "you are playing against an agent running code X (which just happens to be the same as yours) and thinks you are also running code X". If X=CDT, then TDT and CDT each achieve the result DD. If X=TDT, then TDT achieves CC, but CDT achieves DC.

In other words TDT does beat CDT in the self matchup. But one could argue that self matchup against TDT and self matchup against CDT are different scenarios, and thus should not be compared.

dankane-10

The fact that Newcomblike problems are fairly common in the real world is one facet of that motivation.

I disagree. CDT correctly solves all problems in which other agents cannot read your mind. Real world occurrences of mind reading are actually uncommon.

dankane-10

There's a difference between reasoning about your mind and actually reading your mind. CDT certainly faces situations in which it is advantageous to convince others that it does not follow CDT. On the other hand, this is simply behaving in a way that leads to the desired outcome. This is different from facing situations where you can only convince people of this by actually self-modifying. Those situations only occur when other people can actually read your mind.

2A1987dM
Humans are not perfect deceivers.
dankane10

Actually, I take it back. Depending on how you define things, UDT can still lose. Consider the following game:

I will clone you. One of the clones I paint red and the other I paint blue. The red clone I give $1000000 and the blue clone I fine $1000000. UDT clearly gets expectation 0 out of this. SMCDT however can replace its code with the following: If you are painted blue: wipe your hard drive If you are painted red: change your code back to standard SMCDT

Thus, SMCDT never actually has to play blue in this game, while UDT does.

2one_forward
You seem to be comparing SMCDT to a UDT agent that can't self-modify (or commit suicide). The self-modifying part is the only reason SMCDT wins here. The ability to self-modify is clearly beneficial (if you have correct beliefs and act first), but it seems separate from the question of which decision theory to use.
dankane20

OK. Fine. I will grant you this:

UDT is provably optimal if it has correct priors over possible universes and the universe can read its mind only through determining its behavior in hypothetical situations (because UDT basically is just find the behavior pattern that optimizes expected utility and implement that).

On the other hand, SMCDT is provably optimal in situations where it has an accurate posterior probability distribution, and where the universe can read its mind but not its initial state (because it just instantly self-modifies to the optimally per... (read more)

1dankane
Actually, I take it back. Depending on how you define things, UDT can still lose. Consider the following game: I will clone you. One of the clones I paint red and the other I paint blue. The red clone I give $1000000 and the blue clone I fine $1000000. UDT clearly gets expectation 0 out of this. SMCDT however can replace its code with the following: If you are painted blue: wipe your hard drive If you are painted red: change your code back to standard SMCDT Thus, SMCDT never actually has to play blue in this game, while UDT does.
dankane20

Which is actually one of the annoying things about UDT. Your strategy cannot depend simply on your posterior probability distribution, it has to depend on your prior probability distribution. How you even in practice determine your priors for Newcomb vs. anti-Newcomb is really beyond me.

But in any case, assuming that one is more common, UDT does lose this game.

2nshepperd
No-one said that winning was easy. This problem isn't specific to UDT. It's just that CDT sweeps the problem under the rug by "setting its priors to a delta function" at the point where it gets to decide. CDT can win this scenario if it self-modifies beforehand (knowing the correct frequencies of newcomb vs anti-newcomb, to know how to self-modify) - but SMCDT is not a panacea, simply because you don't necessarily get a chance to self-modify beforehand.
2one_forward
Why does UDT lose this game? If it knows anti-Newcomb is much more likely, it will two-box on Newcomb and do just as well as CDT. If Newcomb is more common, UDT one-boxes and does better than CDT.
dankane10

Yes. And likewise if you put an unconditional extortion-refuser in an environment populated by unconditional extortionists.

dankane20

Fine. How about this: "Have $1000 if you would have two-boxed in Newcomb's problem."

4nshepperd
The optimal solution to that naturally depend on the relative probabilities of that deal being offered vs newcomb's itself.
dankane10

Only if the adversary makes its decision to attempt extortion regardless of the probability of success.

And thereby the extortioner's optimal strategy is to extort independently of the probably of success. Actually, this is probably true is a lot of real cases (say ransomware) where the extortioner cannot actually ascertain the probably of success ahead of time.

2pengvado
That strategy is optimal if and only if the probably of success was reasonably high after all. Otoh, if you put an unconditional extortioner in an environment mostly populated by decision theories that refuse extortion, then the extortioner will start a war and end up on the losing side.
dankane20

Well, if the universe cannot read your source code, both agents are identical and provably optimal. If the universe can read your source code, there are easy scenarios where one or the other does better. For example,

"Here have $1000 if you are a CDT agent" Or "Here have $1000 if you are a UDT agent"

2one_forward
Ok, that example does fit my conditions. What if the universe cannot read your source code, but can simulate you? That is, the universe can predict your choices but it does not know what algorithm produces those choices. This is sufficient for the universe to pose Newcomb's problem, so the two agents are not identical. The UDT agent can always do at least as well as the CDT agent by making the same choices as a CDT would. It will only give a different output if that would lead to a better result.
dankane10

Eliezer thinks his TDT will refuse to give in to blackmail, because outputting another answer would encourage other rational agents to blackmail it.

This just means that TDT loses in honest one-off blackmail situations (in reality, you don't give in to blackmail because it will cause other people to blackmail you whether or not you then self-modify to never give into blackmail again). TDT only does better if the potential blackmailers read your code in order to decide whether or not blackmail will be effective (and then only if your priors say that such ... (read more)

Load More