royf

It seems that your research is coming around to some concepts that are at the basis of mine. Namely, that noise in an optimization process is a constraint on the process, and that the resulting constrained optimization process avoids the nasty properties you describe.

Feel free to contact me if you'd like to discuss this further.

Replying toUtility vs Probability: idea synthesis

royf11y

Utility vs Probability: idea synthesis

This is not unlike Neyman-Pearson theory. Surely this will run into the same trouble with more than 2 possible actions.

Replying to[LINK] Causal Entropic Forces

royf13y

[LINK] Causal Entropic Forces

Our research group and collaborators, foremost Daniel Polani, have been studying this for many years now. Polani calls an essentially identical concept empowerment. These guys are welcome to the party, and as former outsiders it's understandable (if not totally acceptable) that they wouldn't know about these piles of prior work.

Replying toA Little Puzzle about Termination

royf13y

A Little Puzzle about Termination

You have a good and correct point, but it has nothing to do with your question.

a machine can never halt after achieving its goal because it cannot know with full certainty whether it has achieved its goal

This is a misunderstanding of how such a machine might work.

To verify that it completed the task, the machine must match the current state to the desired state. The desired state is any state where the machine has "made 32 paperclips". Now what's a paperclip?

For quite some time we've had the technology to identify a paperclip in an image, if one exists. One lesson we've learned pretty well is this: don't overfit. The paperclip you're going... (read more)

Replying toRight for the Wrong Reasons

royf13y

Right for the Wrong Reasons

The "world state" of ASH is in fact an "information state" of p("heads")>SOME_THRESHOLD

Actually, I meant p("heads") = 0.999 or something.

(C), if I'm following you, maps roughly to the English phrase "I know for absolutely certain that the coin is almost surely heads".

No, I meant: "I know for absolutely certain that the coin is heads". We agree that this much you can never know. As for getting close to this, for example having the information state (D) where p("heads") = 0.999999: if the world is in the state "heads", (D) is (theoretically) possible; if the world is in the state "ASH", (D) is impossible.

Can you give me some examples of the kinds of

... (read more)

Replying toRight for the Wrong Reasons

royf13y

Right for the Wrong Reasons

I probably need to write a top-level post to explain this adequately, but in a nutshell:

I've tossed a coin. Now we can say that the world is in one of two states: "heads" and "tails". This view is consistent with any information state. The information state (A) of maximal ignorance is a uniform distribution over the two states. The information state (B) where heads is twice as likely as tails is the distribution p("heads") = 2/3, p("tails") = 1/3. The information state (C) of knowing for sure that the result is heads is the distribution p("heads") = 1, p("tails") = 0.

Alternatively, we can say that the world is in one of these... (read more)

Replying toRight for the Wrong Reasons

royf13y

Right for the Wrong Reasons

To clarify further: likelihood is a relative quantity, like speed - it only has meaning relative to a specific frame of reference.

If you're judging my calibration, the proper frame of reference is what I knew at the time of prediction. I didn't know what the result of the fencing match would be, but I had some evidence for who is more likely to win. The (objective) probability distribution given that (subjective) information state is what I should've used for prediction.

If you're judging my diligence as an evidence seeker, the proper frame of reference is what I would've known after reasonable information gathering. I could've taken some actions to put myself in a... (read more)

Replying toRight for the Wrong Reasons

royf13y

Right for the Wrong Reasons

This is perhaps not the best description of actualism, but I see your point. Actualists would disagree with this part of my comment:

If I believed that "you will win" (no probability qualifier), then in the many universes where you didn't I'm in Bayes Hell.

on the grounds that those other universes don't exist.

But that was just a figure of speech. I don't actually need those other universes to argue against 0 and 1 as probabilities. And if Frequentists disbelieve in that, there's no place in Bayes Heaven for them.

Replying toRight for the Wrong Reasons

royf13y

Right for the Wrong Reasons

we've already seen [...] or [...] in advance

Does this answer your question?

Replying toRight for the Wrong Reasons

royf13y

Right for the Wrong Reasons

Predictions are justified not by becoming a reality, but by the likelihood of their becoming a reality [1]. When this likelihood is hard to estimate, we can take their becoming a reality as weak evidence that the likelihood is high. But in the end, after counting all the evidence, it's really only the likelihood itself that matters.

If I predict [...] that I will win [...] and I in fact lose fourteen touches in a row, only to win by forfeit

If I place a bet on you to win and this happens, I'll happily collect my prize, but still feel that I put my money on the wrong athlete. My prior and the... (read more)

Update Then Forget

royf

13y

Followup to: How to Be Oversurprised

A Bayesian update needs never lose information. In a dynamic world, though, the update is only half the story. The other half, where the agent takes an action and predicts its result, may indeed "lose" information in some sense.

We have a dynamical system which consists of an agent and the world around it. It's often useful to describe the system in discrete time steps, and insightful to split them into half-steps where both parties (agent and world) take turns changing and affecting each other.

The agent's update is the half-step in which the agent changes. It takes in an observation O_t of the world (t is for time), and uses it to improve... (read 1034 more words →)

How to Be Oversurprised

royf

13y

Followup to: How to Disentangle the Past and the Future

Some agents are memoryless, reacting to each new observation as it happens, without generating a persisting internal structure. When a LED observes voltage, it emits light, regardless of whether it did so a second earlier.

Other agents have very persistent memories. The internal structure of an amber nugget can remain unchanged by external conditions for millions of years.

Neither of these levels of memory persistence makes for very intelligent agents, because neither allows them to be good separators of the past and the future. Memoryless agents only have access to the most recent input of their sensors, which leaves them oblivious to the hidden internal structures of... (read 1025 more words →)

How to Disentangle the Past and the Future

royf

13y

I'm on my way to an important meeting. Am I worried? I'm not worried. The presentation is on my laptop. I distinctly remember putting it there (in the past), so I can safely predict that it's going to be there when I get to the office (in the future) - this is how well my laptop carries information through space and time.

My partner has no memory of me copying the file to the laptop. For her, the past and the future have mutual information: if Omega assured her that I'd copied the presentation, she would be able to predict the future much better than she can now.

For me, the past and the... (read 928 more words →)

Point-Based Value Iteration

royf

13y

Followup to: The Bayesian Agent

This post explains one interesting and influential algorithm for achieving high utility of the actions of a Bayesian agent, called Point-Based Value Iteration (original paper). Its main premise resembles some concept of internal availability.

A reinforcement-learning agent chooses its actions based on its internal memory state. The memory doesn't have to include an exact account of all past observations - it's sufficient for the agent to keep track of a belief of the current state of the world.

This mitigates the problem of having the size of the memory state grow indefinitely. The memory-state space is now the set of all distributions over the world state. Importantly, this space doesn't grow... (read 1044 more words →)

Internal Availability

royf

13y

Edit: Following mixed reception, I decided to split this part out of the latest post in my sequence on reinforcement learning. It wasn't clear enough, and anyway didn't belong there.

I'm posting this hopefully better version to Discussion, and welcome further comments on content and style.

The availability heuristic seems to be a mechanism inside our brains which uses the ease with which images, events and concepts come to mind as evidence for their prevalence or probability of occurrence. For this heuristic to be worth the trouble it causes, there needs to be a counterpart, a second mechanism which actually makes things available to the first one in correlation with their likelihood. In this post I discuss why... (read 714 more words →)

The Bayesian Agent

royf

13y

Followup to: Reinforcement Learning: A Non-Standard Introduction, Reinforcement, Preference and Utility

A reinforcement-learning agent interacts with its environment through the perception of observations and the performance of actions. A very abstract and non-standard description of such an agent is in two parts. The first part, the inference policy, tells us what states the agent can be in, and how these states change when the agent receives new input from its environment. The second part, the action policy, tells us what action the agent chooses to perform on the environment, in each of its internal states.

There are two special choices for the inference policy, marking two extremes. One extreme is for the agent to remain absolutely oblivious to the... (read 693 more words →)

Reinforcement, Preference and Utility

royf

14y

Followup to: Reinforcement Learning: A Non-Standard Introduction

A reinforcement-learning agent is interacting with its environment through the perception of observations and the performance of actions.

We describe the influence of the world on the agent in two steps. The first is the generation of a sensory input O_t based on the state of the world W_t. We assume that this step is in accordance with the laws of physics, and out of anyone's hands. The second step is the actual changing the agent's mind to a new state M_t. The probability distributions of these steps are, respectively, σ(O_t|W_t) and q(M_t|M_t-1,O_t).

Similarly, the agent affects the world by deciding on an action A_t and performing it. The designer of the agent... (read 766 more words →)

Reinforcement Learning: A Non-Standard Introduction (Part 2)

royf

14y

Followup to: Part 1

In part 1 we modeled the dynamics of an agent and its environment as a turn-based discrete-time process. We now start on the path of narrowing the model, in the ambitious search for an explanation and an algorithm for the behavior of intelligent agents.

The spacecraft is nearing the end of its 9 month journey. In its womb rests a creature of marvelous design. Soon, it will reach its destination, and on the surface of the alien planet will gently land an intelligent agent. Almost immediately the agent will start interacting with its environment, collecting and analyzing samples of air and ground and radiation, and eventually moving about to survey the land and... (read 863 more words →)

Reinforcement Learning: A Non-Standard Introduction (Part 1)

royf

14y

Imagine that the world is divided into two parts: one we shall call the agent and the rest - its environment. Imagine you could describe in full detail the state of both the agent and the environment. The state of the agent is denoted M: it could be a Mind if you're a philosopher, a Machine if you're researching machine learning, or a Monkey if you're a neuroscientist. Anyway, it's just the Memory of the agent. The state of the rest of the World (or just World, for short) is denoted W.

These states change over time. In general, when describing the dynamics of a system, we specify how each state is determined... (read 544 more words →)

The Perception-Action Cycle

royf

14y

Would readers be interested in a sequence of posts offering an intuitive explanation of my underway thesis on the application of information theory to reinforcement learning? Please also feel free to comment on the quality of my presentation.

In this first post I offer a high-level description of the Perception-Action Cycle as an intuitive explanation of reinforcement learning.

Imagine that the world is divided into two parts: one we shall call the agent and the rest - its environment. Imagine that the two interact in turns. One moment the agent receives information from its environment in the form of an observation. Then the next moment the agent sends out information to its environment in the form... (read 378 more words →)

LESSWRONG
LW

LESSWRONG
LW

Reinforcement Learning: A Non-Standard Introduction (Part 1)

How to Disentangle the Past and the Future

How to Be Oversurprised

The Bayesian Agent

royf

royf

Update Then Forget

How to Be Oversurprised

How to Disentangle the Past and the Future

Point-Based Value Iteration

Internal Availability

The Bayesian Agent

Reinforcement, Preference and Utility

royf

Reinforcement Learning: A Non-Standard Introduction (Part 1)

How to Disentangle the Past and the Future

How to Be Oversurprised

The Bayesian Agent

royf

royf

Update Then Forget

How to Be Oversurprised

How to Disentangle the Past and the Future

Point-Based Value Iteration

Internal Availability

The Bayesian Agent

Reinforcement, Preference and Utility