16

30th Nov 2019

1 min read

16

I'm quite curious what kind of decision algorithm a CDT agent might implement in a successor AI, but I've only found a few vague references. Are there any good posts/papers/etc about this?

Causal Decision TheoryDecision theory

Frontpage

16

What's been written about the nature of "son-of-CDT"?

New Answer

New Comment

3 Answers sorted by
top scoring

mako yass

Nov 30, 2019

100

I think I saw a bit on arbital about it

Logical decision theorists use "Son-of-CDT[red link, no such article]" to denote the algorithm that CDT self-modifies to; in general we think this algorithm works out to "LDT about correlations formed after 7am, CDT about correlations formed before 7am".

https://arbital.com/p/logical_dt/?l=5gc

[-]Liam Donovan6y*10

After thinking about it some more, I don't think this is true.

A concrete example: Let's say there's a CDT paperclip maximizer in an environment with Newcomb-like problems that's deciding between 3 options.

1. Don't hand control to any successor

2. Hand off control to a "LDT about correlations formed after 7am, CDT about correlations formed before 7am" successor

3. Hand off control to a LDT successor.

My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in ... (read more)

[This comment is no longer endorsed by its author]Reply

5Rob Bensinger6y

This is true if we mean something very specific by "causes". CDT picks the action that would cause the highest number of paperclips to be created, if past predictions were uncorrelated with future events. If an agent can arbitrarily modify its own source code ("precommit" in full generality), then we can model "the agent making choices over time" as "a series of agents that are constantly choosing which successor-agent follows them at the next time-step". If Son-of-CDT were the same as LDT, this would be the same as saying that a self-modifying CDT agent will rewrite itself into an LDT agent, since nothing about CDT or LDT assigns special weight to actions that happen inside the agent's brain vs. outside the agent's brain.

1Liam Donovan6y

Yeah, I was implicitly assuming that initiating a successor agent would force Omega to update its predictions about the new agent (and put the $1m in the box). As you say, that's actually not very relevant, because it's a property of a specific decision problem rather than CDT or son-of-CDT.

Rob Bensinger

Dec 01, 2019

The Retro Blackmail Problem in "Toward Idealized Decision Theory" shows that if CDT can self-modify (i.e., build an agent that follows an arbitrary decision rule), it self-modifies to something that still gives in to some forms of blackmail. This is Son-of-CDT, though they don't use the name.

Chris_Leong

Dec 01, 2019

Mako's answer will be true if it expects to only face problems where it is rewarded based on its output. However, it wouldn't hold in other conditions. For example, if it expected alphabetical agents to be rewarded heavily, it might modify to that.

Rendering 0/3 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:07 AM

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

16

[ Question ]

What's been written about the nature of "son-of-CDT"?

16

16

3 Answers sorted by
top scoring

Nov 30, 2019

Dec 01, 2019

Dec 01, 2019

16

[ Question ]

What's been written about the nature of "son-of-CDT"?

16

16

3 Answers sorted by top scoring

Nov 30, 2019

Dec 01, 2019

Dec 01, 2019

3 Answers sorted by
top scoring