royf

Update Then Forget

Followup to: How to Be Oversurprised A Bayesian update needs never lose information. In a dynamic world, though, the update is only half the story. The other half, where the agent takes an action and predicts its result, may indeed "lose" information in some sense. We have a dynamical system...

Jan 17, 201315

How to Be Oversurprised

Followup to: How to Disentangle the Past and the Future Some agents are memoryless, reacting to each new observation as it happens, without generating a persisting internal structure. When a LED observes voltage, it emits light, regardless of whether it did so a second earlier. Other agents have very persistent...

Jan 7, 201320

How to Disentangle the Past and the Future

I'm on my way to an important meeting. Am I worried? I'm not worried. The presentation is on my laptop. I distinctly remember putting it there (in the past), so I can safely predict that it's going to be there when I get to the office (in the future) -...

Jan 2, 201322

Point-Based Value Iteration

Followup to: The Bayesian Agent This post explains one interesting and influential algorithm for achieving high utility of the actions of a Bayesian agent, called Point-Based Value Iteration (original paper). Its main premise resembles some concept of internal availability. A reinforcement-learning agent chooses its actions based on its internal memory...

Oct 8, 201213

Internal Availability

Edit: Following mixed reception, I decided to split this part out of the latest post in my sequence on reinforcement learning. It wasn't clear enough, and anyway didn't belong there. I'm posting this hopefully better version to Discussion, and welcome further comments on content and style. The availability heuristic seems...

Oct 8, 20125

The Bayesian Agent

Followup to: Reinforcement Learning: A Non-Standard Introduction, Reinforcement, Preference and Utility A reinforcement-learning agent interacts with its environment through the perception of observations and the performance of actions. A very abstract and non-standard description of such an agent is in two parts. The first part, the inference policy, tells us...

Sep 18, 201219

Reinforcement, Preference and Utility

Followup to: Reinforcement Learning: A Non-Standard Introduction A reinforcement-learning agent is interacting with its environment through the perception of observations and the performance of actions. We describe the influence of the world on the agent in two steps. The first is the generation of a sensory input Ot based on...

Aug 8, 201214

royf

royf

Reinforcement Learning: A Non-Standard Introduction (Part 1)

How to Disentangle the Past and the Future

How to Be Oversurprised

The Bayesian Agent

royf

Reinforcement Learning: A Non-Standard Introduction (Part 1)

How to Disentangle the Past and the Future

How to Be Oversurprised

The Bayesian Agent

Update Then Forget

How to Be Oversurprised

How to Disentangle the Past and the Future

Point-Based Value Iteration

Internal Availability

The Bayesian Agent

Reinforcement, Preference and Utility