Followup to: How to Be Oversurprised A Bayesian update needs never lose information. In a dynamic world, though, the update is only half the story. The other half, where the agent takes an action and predicts its result, may indeed "lose" information in some sense. We have a dynamical system...
Followup to: How to Disentangle the Past and the Future Some agents are memoryless, reacting to each new observation as it happens, without generating a persisting internal structure. When a LED observes voltage, it emits light, regardless of whether it did so a second earlier. Other agents have very persistent...
I'm on my way to an important meeting. Am I worried? I'm not worried. The presentation is on my laptop. I distinctly remember putting it there (in the past), so I can safely predict that it's going to be there when I get to the office (in the future) -...
Followup to: The Bayesian Agent This post explains one interesting and influential algorithm for achieving high utility of the actions of a Bayesian agent, called Point-Based Value Iteration (original paper). Its main premise resembles some concept of internal availability. A reinforcement-learning agent chooses its actions based on its internal memory...
Edit: Following mixed reception, I decided to split this part out of the latest post in my sequence on reinforcement learning. It wasn't clear enough, and anyway didn't belong there. I'm posting this hopefully better version to Discussion, and welcome further comments on content and style. The availability heuristic seems...
Followup to: Reinforcement Learning: A Non-Standard Introduction, Reinforcement, Preference and Utility A reinforcement-learning agent interacts with its environment through the perception of observations and the performance of actions. A very abstract and non-standard description of such an agent is in two parts. The first part, the inference policy, tells us...
Followup to: Reinforcement Learning: A Non-Standard Introduction A reinforcement-learning agent is interacting with its environment through the perception of observations and the performance of actions. We describe the influence of the world on the agent in two steps. The first is the generation of a sensory input Ot based on...