Would readers be interested in a sequence of posts offering an intuitive explanation of my underway thesis on the application of information theory to reinforcement learning? Please also feel free to comment on the quality of my presentation.
In this first post I offer a high-level description of the Perception-Action Cycle as an intuitive explanation of reinforcement learning.
Imagine that the world is divided into two parts: one we shall call the agent and the rest - its environment. Imagine that the two interact in turns. One moment the agent receives information from its environment in the form of an observation. Then the next moment the agent sends out information to its environment in the form of an action. Then it makes another observation, then another action, and so on.
To break down the cycle, we start with the agent having a belief about the state of its environment. This is actually the technical term: the belief is the probability that the agent assigns, implicitly, to each possible state of the environment. The cycle then proceeds in 4 phases.
In the first phase, the agent makes an observation. Since the observation conveys information of the environment, the agent needs to update its belief, ideally using Bayes' theorem. The agent now has more information about the environment.
In the second phase, the agent uses this new information to update its plan. Note the crucial underlying principle that information about the environment is useful in making better plans. This gives a desired fusion between Bayesian updates and decision making.
In the third phase, the agent executes a step of its plan - a single action. This changes the environment. Some of the things that the agent knew about the previous state of the environment may no longer be true, and the agent is back to having less information.
In the fourth phase, the agent makes a prediction about future observations. The importance of making a prediction before a scientific experiment is well understood by philosophers of science. But the importance of constantly making predictions of all of our sensory inputs as a functional part of our cognition, is only now dawning on neuroscientists and machine learning researchers.
The Perception-Action Cycle is an intuitive explanation of the technical setting of reinforcement learning. Reinforcement learning is a powerful model of machine learning, in which decision making, learning and evaluation occur simultaneously and somewhat implicitly while a learner interacts with its environment. This can be used to describe a wide variety of real-life scenarios, including biological and artificial agents. It is so general, in fact, that our work is still ahead of us if we want it to have any explanatory power, and solving it in the most general form is a computationally hard problem.
But the Perception-Action Cycle still offers symmetries to explore, analogies to physics to draw, practical learning algorithms to develop; all of which improve its Occam's razor prior score as a good model of intelligence. And to use it to actually explain things, we can narrow it down further. Not everything that it makes possible is equally probable. By applying information theory, a collection of statistical concepts, theorems and methods implied by strong Bayesianism, we can get a better picture of what intelligence is and isn't.
What is 'amount of information' ? If I do not know if coin is heads or tails, then I have 0 bits of information about state of the coin, if I open my eyes and see it is heads, I have 1 bit. The information is in narrowing of the possibilities. That is conventional meaning. edit: though of course the information is not increased until next perception - perhaps that is what you meant? edit: still, there is a counter example - you can have axially magnetized coin, and electromagnet that can make the coin flip to heads up when its powered. You initially don't know which way the coin is up, but if the action is to magnetize the electromagnet, you will have the coin be heads up. (Of course the overall entropy of world still did go up, but mostly in form of heat). One could say that it doesn't increase knowledge of environment, but decreases the entropy in environment.
You are expressing a number of misconceptions here. I may address some in future posts, but in short:
By information I mean the Shannon information (see also links in OP). Your example is correct.
By the action of powering the electromagnet you are not increasing your information on the state of the world. You are increasing your information on the state of the coin, but through making it dependent on the state of the electromagnet which you already knew. This point is clearly worth a future post.
There is no "entropy in environment". Entropy is subjective to the viewer.