Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

# royf comments on The Bayesian Agent - Less Wrong

11 18 September 2012 03:23AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Sort By: Best

You are viewing a single comment's thread.

Comment author: 18 September 2012 03:12:21AM *  2 points [-]

If you're a devoted Bayesian, you probably know how to update on evidence, and even how to do so repeatedly on a sequence of observations. What you may not know is how to update in a changing world. Here's how:

$B_{t+1}(W_{t+1})=\Pr(W_{t+1}|O_1,\ldots,O_{t+1})=\frac{\sigma(O_{t+1}|W_{t+1})\cdot\Pr(W_{t+1}|O_1,\ldots,O_t)}{\sum_w\sigma(O_{t+1}|w)\cdot\Pr(w|O_1,\ldots,O_t)}$

As usual with Bayes' theorem, we only need to calculate the numerator for different values of $W_{t+1}$, and the denominator will normalize them to sum to 1, as probabilities do. We know $\sigma$ as part of the dynamics of the system, so we only need $\Pr(W_{t+1}|O_1,\ldots,O_t)$. This can be calculated by introducing the other variables in the process:

$\Pr(W_{t+1}|O_1,\ldots,O_t)=\sum_{W_t,A_t}\Pr(W_t,A_t,W_{t+1}|O_1,\ldots,O_t)$

An important thing to notice is that, given the observable history, the world state $W_t$ and the action $A_t$ are independent - the agent can't act on unseen information. We continue:

$=\sum_{W_t,A_t}\Pr(W_t|O_1,\ldots,O_t)\cdot\Pr(A_t|O_1,\ldots,O_t)\cdot p(W_{t+1}|W_t,A_t)$

Recall that the agent's belief $B_t$ is a function of the observable history, and that the action only depends on the observable history through its memory $B_t$. We conclude:

$=\sum_{W_t,A_t}B_t(W_t)\cdot\pi(A_t|B_t)\cdot p(W_{t+1}|W_t,A_t)$

Comment author: 18 September 2012 07:28:22AM 3 points [-]

I'm not seeing how this lets the agent update itself. The formula requires knowledge of sigma, pi, and p. (BTW, could someone add to the comment help text instructions for embedding Latex?) pi is part of the agent but sigma and p are not. You say

We know sigma as part of the dynamics of the system

But all the agent knows, as you've described it so far, is the sequence of observations. In fact, it's stretching it to say that we know sigma or p -- we have just given these names to them. sigma is a complete description of how the world state determines what the agent senses, and p is a complete description of how the agent's actions affect the world. As the designer of the agent, will you be explicitly providing it with that information in some future instalment?

Comment author: 18 September 2012 06:09:59PM *  3 points [-]

Everything you say is essentially true.

As the designer of the agent, will you be explicitly providing it with that information in some future instalment?

Technically, we don't need to provide the agent with p and sigma explicitly. We use these parameters when we build the agent's memory update scheme, but the agent is not necessarily "aware" of the values of the parameters from inside the algorithm.

Let's take for example an autonomous rover on Mars. The gravity on Mars is known at the time of design, so the rover's software, and even hardware, is built to operate under these dynamics. The wind velocity at the time and place of landing, on the other hand, is unknown. The rover may need to take measurements to determine this parameter, and encode it in its memory, before it can take it into account in choosing further actions.

But if we are thoroughly Bayesian, then something is known about the wind prior to experience. Is it likely to change every 5 minutes or can the rover wait longer before measuring again? What should be the operational range of the instruments? And so on. In this case we would include this prior in p, while the actual wind velocity is instead hidden in the world state (only to be observed occasionally and partially).

Ultimately, we could include all of physics in our belief - there's always some Einstein to tell us that Newtonian physics is wrong. The problem is that a large belief space makes learning harder. This is why most humans struggle with intuitive understanding of relativity or quantum mechanics - our brains are not made to represent this part of the belief space.

This is also why reinforcement learning gives special treatment to the case where there are unknown but unchanging parameters of the world dynamics: the "unknown" part makes the belief space large enough to make special algorithms necessary, while the "unchanging" part makes these algorithms possible.

For LaTeX instructions, click "Show help" and then "More Help" (or go here).