Scheduling: The remainder of the sequence will be released after some delay.
Exercise: Why does instrumental convergence happen? Would it be coherent to imagine a reality without it?
Notes
- Here, our descriptive theory relies on our ability to have reasonable beliefs about what we'll do, and how things in the world will affect our later decision-making process. No one knows how to formalize that kind of reasoning, so I'm leaving it a black box: we somehow have these reasonable beliefs which are apparently used to calculate AU.
- In technical terms, AU calculated with the "could" criterion would be closer to an optimal value function, while actual AU seems to be an on-policy prediction, whatever that means in the embedded context. Felt impact corresponds to TD error.
- This is one major reason I'm disambiguating between AU and EU; in the non-embedded context. In reinforcement learning, AU is a very particular kind of EU: , the expected return under the optimal policy.
- Framed as a kind of EU, we plausibly use AU to make decisions.
- I'm not claiming normatively that "embedded agentic" EU should be AU; I'm simply using "embedded agentic" as an adjective.
Yes.
Not quite – being confined is objectively impactful, and has some narrow value impact (not being able to see your family, perhaps).
Just because something has a positive objective impact on me doesn't mean I haven't been positively impacted. Value/objective is just a way of decomposing the total impact on an agent – they don't trade off against each other. For example, if something really good happens to Mary, she might think: "I got a raise (objective impact!) and Bob told me he likes me (value impact!). Both of these are great", and they are both great (for her)!