Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: royf 17 July 2015 11:23:19AM *  2 points [-]

It seems that your research is coming around to some concepts that are at the basis of mine. Namely, that noise in an optimization process is a constraint on the process, and that the resulting constrained optimization process avoids the nasty properties you describe.

Feel free to contact me if you'd like to discuss this further.

Comment author: royf 27 March 2015 05:59:01PM *  2 points [-]

This is not unlike Neyman-Pearson theory. Surely this will run into the same trouble with more than 2 possible actions.

Comment author: royf 29 April 2013 06:28:57PM 0 points [-]

Our research group and collaborators, foremost Daniel Polani, have been studying this for many years now. Polani calls an essentially identical concept empowerment. These guys are welcome to the party, and as former outsiders it's understandable (if not totally acceptable) that they wouldn't know about these piles of prior work.

Comment author: royf 03 February 2013 03:16:09PM *  0 points [-]

You have a good and correct point, but it has nothing to do with your question.

a machine can never halt after achieving its goal because it cannot know with full certainty whether it has achieved its goal

This is a misunderstanding of how such a machine might work.

To verify that it completed the task, the machine must match the current state to the desired state. The desired state is any state where the machine has "made 32 paperclips". Now what's a paperclip?

For quite some time we've had the technology to identify a paperclip in an image, if one exists. One lesson we've learned pretty well is this: don't overfit. The paperclip you're going to be tested on is probably not one you've seen before. You'll need to know what features are common in paperclips (and less common in other objects) and how much variability they present. Tolerance to this variability will be necessary for generalization, and this means you can never be sure if you're seeing a paperclip. In this sense there's a limit to how well the user can specify the goal.

So after taking a few images of the paperclips it's made, the machine's major source of (unavoidable) uncertainty will be "is this what the user meant?", not "am I really getting a good image of what's on the table?". Any half-decent implementation will go do other things (such as go ask the user).

Comment author: TheOtherDave 01 February 2013 09:45:16PM 0 points [-]

the world is in one of two states: "heads" and "tails". [..] The information state (C) of knowing for sure that the result is heads is the distribution p("heads") = 1, p("tails") = 0.

Sure. And (C) is unachievable in practice if one is updating one's information state sensibly from sensible priors.

Alternatively, we can say that the world is in one of these two states: "almost surely heads" and "almost surely tails". Now information state (A) is a uniform distribution over these states

I am uncertain what you mean to convey in this example by the difference between a "world state" (e.g., ASH or AST) and an "information state" (e.g. p("ASH")=0.668).

The "world state" of ASH is in fact an "information state" of p("heads")>SOME_THRESHOLD, which is fine if you mean those terms to be denotatively synonymous but connotatively different, but problematic if you mean them to be denotatively different.

...but (C) is impossible . (C), if I'm following you, maps roughly to the English phrase "I know for absolutely certain that the coin is almost surely heads".

Yes, agreed that this is strictly speaking unachievable, just as "I know for absolutely certain that the coin is heads" was.

That said, I'm not sure what it means for a human brain to have "I know for absolutely certain that the coin is almost surely heads" as a distinct state from "I am almost sure the coin is heads," and the latter is achievable.

So we can agree to just always use the first kind of model, and avoid all this silly complication.

Works for me.

But then there are cases where there are real (physical) reasons why not every information state is possible.

And now you've lost me again. Of course there are real physical reasons why certain information states are not possible... e.g., my brain is incapable of representing certain thoughts. But I suspect that's not what you mean here.

Can you give me some examples of the kinds of cases you have in mind?

Comment author: royf 01 February 2013 10:55:43PM 0 points [-]

The "world state" of ASH is in fact an "information state" of p("heads")>SOME_THRESHOLD

Actually, I meant p("heads") = 0.999 or something.

(C), if I'm following you, maps roughly to the English phrase "I know for absolutely certain that the coin is almost surely heads".

No, I meant: "I know for absolutely certain that the coin is heads". We agree that this much you can never know. As for getting close to this, for example having the information state (D) where p("heads") = 0.999999: if the world is in the state "heads", (D) is (theoretically) possible; if the world is in the state "ASH", (D) is impossible.

Can you give me some examples of the kinds of cases you have in mind?

Mundane examples may not be as clear, so: suppose we send a coin-flipping machine deep into intergalactic space. After a few billion years it flies permanently beyond our light cone, and then flips the coin.

Now any information state about the coin, other than complete ignorance, is physically impossible. We can still say that the coin is in one of the two states "heads" and "tails", only unknown to us. Alternatively we can say that the coin is in a state of superposition. These two models are epistemologically equivalent.

I prefer the latter, and think many people in this community should agree, based on the spirit of other things they believe: the former model is ontologically more complicated. It's saying more about reality than can be known. It sets the state of the coin as a free-floating property of the world, with nothing to entangle with.

Comment author: TheOtherDave 01 February 2013 08:01:23PM *  0 points [-]

since Omega is never a useful frame of reference, I'm not constraining reality to be consistent with it. In this sense, some probabilities are in the territory.

I thought I was following you, but you lost me there.

I certainly agree that if I want to evaluate various aspects of your cognitive abilities based on your predictions, I should look at different aspects of your predictions depending on what abilities I care about, as you describe, and that often the accuracy of your prediction is not the most useful aspect to look at. And of course I agree that expecting perfect knowledge is unreasonable.

But what that has to do with Omega, and what the uselessness of Omega as a frame of reference has to do with constraints on reality, I don't follow.

Comment author: royf 01 February 2013 09:22:13PM 1 point [-]

I probably need to write a top-level post to explain this adequately, but in a nutshell:

I've tossed a coin. Now we can say that the world is in one of two states: "heads" and "tails". This view is consistent with any information state. The information state (A) of maximal ignorance is a uniform distribution over the two states. The information state (B) where heads is twice as likely as tails is the distribution p("heads") = 2/3, p("tails") = 1/3. The information state (C) of knowing for sure that the result is heads is the distribution p("heads") = 1, p("tails") = 0.

Alternatively, we can say that the world is in one of these two states: "almost surely heads" and "almost surely tails". Now information state (A) is a uniform distribution over these states; (B) is perhaps the distribution p("ASH") = 0.668, p("AST") = 0.332; but (C) is impossible, and so is any information state that is more certain than reality in this strange model.

Now, in many cases we can theoretically have information states arbitrarily close to complete certainty. In such cases we must use the first kind of model. So we can agree to just always use the first kind of model, and avoid all this silly complication.

But then there are cases where there are real (physical) reasons why not every information state is possible. In these cases reality is not constrained to be of the first kind, and it could be of the second kind. As a matter of fact, to say that reality is of the first kind - and that probability is only in the mind - is to say more about reality than can possibly be known. This goes against Jaynesianism.

So I completely agree that not knowing something is a property of the map rather than the territory. But an impossibility of any map to know something is a property of the territory.

Comment author: TheOtherDave 01 February 2013 05:58:43AM 0 points [-]

Does it help to reread royf's footnote?

[1] Where likelihood is measured either given what I know, or what I could know, or what anybody could know - depending on why we're asking the question in the first place.

That is, he's not talking about some thing out there in the world which is independent of our minds, nor am I when I adopt his terminology. The likelihood of a prediction, like all probability judgments, exists in the mind and is a function of how evidence is being evaluated. Indeed, any relationship between a prediction and a state of events in the world exists solely in the mind to begin with.

Comment author: royf 01 February 2013 06:04:13PM 1 point [-]

To clarify further: likelihood is a relative quantity, like speed - it only has meaning relative to a specific frame of reference.

If you're judging my calibration, the proper frame of reference is what I knew at the time of prediction. I didn't know what the result of the fencing match would be, but I had some evidence for who is more likely to win. The (objective) probability distribution given that (subjective) information state is what I should've used for prediction.

If you're judging my diligence as an evidence seeker, the proper frame of reference is what I would've known after reasonable information gathering. I could've taken some actions to put myself in a difference information state, and then my prediction could be better.

But it's unreasonable to expect me to know the result beyond any doubt. Even if Omega is in an information state of perfectly predicting the future, this is never a proper frame of reference by which to judge bounded agents.

And this is the major point on which I'm non-Yudkowskian: since Omega is never a useful frame of reference, I'm not constraining reality to be consistent with it. In this sense, some probabilities are in the territory.

Comment author: CronoDAS 24 January 2013 04:06:17AM 1 point [-]

Not really.

Let me elaborate:

In a book of his, Daniel Dennett appropriates the word "actualism" to mean "the belief that only things that have actually happened, or will happen, are possible." In other words, all statements that are false are not only false, but also impossible: If the coin flip comes up heads, it was never possible for the coin flip to have come up tails. He considers this rather silly, says there are good reasons for dismissing it that aren't relevant to the current discussion, and proceeds as though the matter is solved. This strikes me as one of those philosophical positions that seem obviously absurd but very difficult to refute in practice. (It also strikes me as splitting hairs over words, so maybe it's just a wrong question in the first place?)

Comment author: royf 24 January 2013 01:06:15PM *  0 points [-]

This is perhaps not the best description of actualism, but I see your point. Actualists would disagree with this part of my comment:

If I believed that "you will win" (no probability qualifier), then in the many universes where you didn't I'm in Bayes Hell.

on the grounds that those other universes don't exist.

But that was just a figure of speech. I don't actually need those other universes to argue against 0 and 1 as probabilities. And if Frequentists disbelieve in that, there's no place in Bayes Heaven for them.

Comment author: CronoDAS 24 January 2013 02:20:45AM *  0 points [-]

That suggests a question.

If I flip a fair coin, and it comes up heads, what is the probability of that coin flip, which I already made, having instead been tails? (Approximately) 0, because we've already seen that the coin didn't come up tails, or (approximately) 50%, because it's a fair coin and we have no way of knowing the outcome in advance?

Comment author: royf 24 January 2013 02:52:33AM 0 points [-]

we've already seen [...] or [...] in advance

Does this answer your question?

Comment author: royf 24 January 2013 01:22:54AM *  12 points [-]

Predictions are justified not by becoming a reality, but by the likelihood of their becoming a reality [1]. When this likelihood is hard to estimate, we can take their becoming a reality as weak evidence that the likelihood is high. But in the end, after counting all the evidence, it's really only the likelihood itself that matters.

If I predict [...] that I will win [...] and I in fact lose fourteen touches in a row, only to win by forfeit

If I place a bet on you to win and this happens, I'll happily collect my prize, but still feel that I put my money on the wrong athlete. My prior and the signal are rich enough for me to deduce that your victory, although factual, was unlikely. If I believed that you're likely to win, then my belief wasn't "true for the wrong reasons", it was simply false. If I believed that "you will win" (no probability qualifier), then in the many universes where you didn't I'm in Bayes Hell.

Conversely in the other example, your winning itself is again not the best evidence for its own likelihood. Your scoring 14 touches is. My belief that you're likely to win is true and justified for the right reasons: you're clearly the better athlete.

[1] Where likelihood is measured either given what I know, or what I could know, or what anybody could know - depending on why we're asking the question in the first place.

View more: Next