Reinforcement Learning: A Non-Standard Introduction (Part 1)
Imagine that the world is divided into two parts: one we shall call the agent and the rest - its environment. Imagine you could describe in full detail the state of both the agent and the environment. The state of the agent is denoted M: it could be a Mind if you're a philosopher, a Machine if you're researching machine learning, or a Monkey if you're a neuroscientist. Anyway, it's just the Memory of the agent. The state of the rest of the World (or just World, for short) is denoted W. These states change over time. In general, when describing the dynamics of a system, we specify how each state is determined by the previous states. So we have probability distributions for the states Wt and Mt of the world and the agent in time t: p(Wt|Wt-1,Mt-1) q(Mt|Wt-1,Mt-1) This gives us the probabilities that the world is currently in state Wt, and the agent in state Mt, given that they previously were in states Wt-1 and Mt-1. This can be illustrated in the following Bayesian network (see also): Bayesian networks look like they represent causation: that the current state is "caused" by the immediately previous state. But what they really represent is statistical independence: that the current joint state (Wt, Mt) depends only on the immediately previous joint state (Wt-1, Mt-1), and not on any earlier state. So the power of Bayesian networks is in what they don't show, in this case there's no arrow from, say, Wt-2 to Wt. The current joint state of the world and the agent represents everything we need to know in order to continue the dynamics forward. Given this state, the past is independent of the future. This property is so important, that it has a name, borrowed from one of its earliest researchers, Markov. The Markov property is not enough for our purposes. We are going to make a further assumption, which is that the states of the world and the agent don't both change together. Rather, they take turns changing, and while one does the other remains the same. This gi
It seems that your research is coming around to some concepts that are at the basis of mine. Namely, that noise in an optimization process is a constraint on the process, and that the resulting constrained optimization process avoids the nasty properties you describe.
Feel free to contact me if you'd like to discuss this further.