I'm not clear on the action-determined vs. decision-determined distinction. Can you give an example of a dilemma that might tempt an AI to self-modify if we didn't build it around TDT?
In general, I'm nervous around arguments that mention self-modification. If self-modification is a risk, then engineering in general is a risk, and self-modification is a special case of engineering. So IMO an argument about Friendliness that mentions self-modification immediately needs to be generalized to talk about engineering instead. Self-modification as a fundamental concept is therefore a useless distraction.
The classic is Parfit's hitch-hiker, where an agent capable of accurately predicting the AI's actions offers to give it something if and only if the AI will perform some specific action in future. A causal AI might be tempted to modify itself to desire that specific action, while a timeless AI will simply do the thing anyway without needing to self-modify.
As for your second problem, Yudkowsky himself explains much better than I could why self-modification is important in the 3rd question of this interview.
Roughly, the importance is that there's only two ki...
I don't know if this is a little too afar field for even a Discussion post, but people seemed to enjoy my previous articles (Girl Scouts financial filings, video game console insurance, philosophy of identity/abortion, & prediction market fees), so...
I recently wrote up an idea that has been bouncing around my head ever since I watched Death Note years ago - can we quantify Light Yagami's mistakes? Which mistake was the greatest? How could one do better? We can shed some light on the matter by examining DN with... basic information theory.
Presented for LessWrong's consideration: Death Note & Anonymity.