Comment author:ialdabaoth
23 May 2013 11:35:34AM
*
2 points
[-]

So, I've been on this site for awhile. When I first came here, I had never had a formal introduction to Bayes' theorem, but it sounded a lot like ideas that I had independently worked out in my high school and college days (I was something of an amateur mathematician and game theorist).

A few days ago I was reading through one of your articles - I don't remember which one - and it suddenly struck me that I may not actually understand priors as well as I think I do.

After re-reading some fo the series, and then working through the math, I'm now reasonably convinced that I don't properly understand priors at all - at least, not intuitively, which seems to be an important aspect for actually using them.

I have a few weird questions that I'm hoping someone can answer, that will help point me back towards the correct quadrant of domain space. I'll start with a single question, and then see if I can claw my way towards understanding from there based on the answers:

Imagine there is a rational, Bayesian AI named B9 which has been programmed to visually identify and manipulate geometric objects. B9's favorite object is a blue ball, but B9 has no idea that it is blue: B9 sees the world through a black and white camera, and has always seen the world through a black and white camera. Until now, B9 has never heard of "colors" - no one has mentioned "colors" to B9, and B9 has certainly never experienced them. Today, unbeknownst to B9, B9's creator is going to upgrade its camera to a full-color system, and see how long it takes B9 to adapt to the new inputs.

The camera gets switched in 5 seconds. Before the camera gets switched, what prior probability does B9 assign to the possibility that its favorite ball is blue?

Comment author:MugaSofer
23 May 2013 01:34:09PM
-2 points
[-]

Well, without a sense that can detect color, it would just be an arbitrary undetectable property something might have, right? So it would be ... dependent on what other objects B9 is aware of, I think. The precise hypothesis of "all [objects that we know are blue] share a common property I cannot perceive with this camera" should be highly conjunctive, and therefore low, unless B9 has observed humans reacting to them because of their coloration. And even then, "blue" would be defined only in terms of what other objects have it, not a specific input type from the camera.

I suspect I'm missing the point of this question, somehow.

Comment author:wuncidunci
23 May 2013 03:48:21PM
1 point
[-]

Your question is not well specified. Event though you might think that the proposition "its favorite ball is blue" is something that has a clear meaning, it is highly dependent on to which precision it will be able to see colours, how wide the interval defined as blue is, and how it considers multicoloured objects. If we suppose it would categorise the observed wavelength into one of 27 possible colours (one of those being blue), and further suppose that it knew the ball to be of a single colour and not patterned, and further not have any background information about the relative frequencies of different colours of balls or other useful prior knowledge, the prior probability would be 1/27. If we suppose that it had access to internet and had read this discussion on LW about the colourblind AI, it would increase its probability by doing an update based on the probability of this affecting the colour of its own ball.

I don't claim to be any kind of Bayesian expert here, but, well, I seem to be replying anyway. Don't take my reply too seriously.

B9 has never heard of "colors". I take that to mean, not only that nobody has used that particular word to B9, but that B9 has been exposed to no inputs that significantly depend on it... e.g., nobody has talked about whether their shirts match their pants, nobody has talked about spectroscopic analysis of starlight or about the mechanism of action of clorophyll or etc... that B9 has no evidentiary basis from which to draw conclusions about color. (That is, B9 is the anti-Mary.)

Given those assumptions, a universal prior is appropriate... 50% chance that "My ball is blue" is true, 50% chance that it's false.

If those assumptions aren't quite true, and B9 has some information that usefully pertains, however indirectly, to the color of the ball, then insofar as that information is evidence one way or another, B9 ideally updates that probability accordingly.

Comment author:Kawoomba
24 May 2013 01:59:38PM
*
2 points
[-]

Given those assumptions, a universal prior is appropriate... 50% chance that "My ball is blue" is true, 50% chance that it's false.

You and Kindly both? Very surprising.

Consider you as B9, reading on the internet about some new and independent property of items, "bamboozle-ness". Should you now believe that P("My monitor is bamboozled") = 0.5? That it is as likely that your monitor is bamboozled as that it's not bamboozled?

If I offered you a bet of 100 big currency units, if it turns out your monitor was bamboozled, you'd win triple! Or 50x! Wouldn't you accept, based on your "well, 50% chance of winning" assessment?

Am I bamboozled? Are you bamboozled?

Notice that B9 has even less reason to believe in colors than you in the example above - it hasn't even read about them on the internet.

Instead of assigning 50-50 odds, you'd have to take that part of the probability space which represents "my belief in other models than my main model", identify the miniscule prior for that specific model containing "colors", or "bamboozleness", then calculate from assuming that model the odds of blue versus not-blue, then weigh back in the uncertainty from such an arbitrary model being true in lieu of your standard model.

That it is as likely that your monitor is bamboozled as that it's not bamboozled?

Given the following propositions:
(P1) "My monitor is bamboozled."
(P2) "My monitor is not bamboozled."
(P3) "'My monitor is bamboozled' is not the sort of statement that has a binary truth value; monitors are neither bamboozled nor non-bamboozled."

...and knowing nothing at all about bamboozledness, never even having heard the word before, it seems I ought to assign high probability to P3 (since it's true of most statements that it's possible to construct) and consequently low probabilities to P1 and P2.

But when I read about bamboozledness on the Internet (or am asked whether my ball is blue), my confidence in P3 seems to go up [EDIT: I mean down] pretty quickly, based on my experience with people talking about stuff. (Which among other things suggests that my prior for P3 wasn't all that low [EDIT: I mean high].)

Having become convinced of NOT(P3) (despite still knowing nothing much about bamboozledness other than it's the sort of thing people talk about on the Internet), if I have very low confidence in P1, I have very high confidence in P2. If I have very low confidence in P2, I have very high confidence in P1. Very high confidence in either proposition seems unjustifiable... indeed, a lower probability for P1 than P2 or vice-versa seems unjustifiable... so I conclude 50%.

If I'm wrong to do so, it seems I'm wrong to reduce my confidence in P3 in the first place.
Which I guess is possible, though I do seem to do it quite naturally.
But given NOT(P3), I genuinely don't see why I should believe P(P2) > P(P1).

If I offered you a bet of 100 big currency units, if it turns out your monitor was bamboozled, you'd win triple! Or 50x! Wouldn't you accept, based on your "well, 50% chance of winning" assessment?

Just to be clear: you're offering me (300BCUs if P1, -100BCUs if P2)?
And you're suggesting I shouldn't take that bet, because P(P2) >> P(P1)?

It seems to follow from that reasoning that I ought to take (300BCUs if P2, -100BCUs if P1).
Would you suggest I take that bet?

Anyway, to answer your question: I wouldn't take either bet if offered, because of game-theoretical considerations... that is, the moment you offer me the bet, that's evidence that you expect to gain by the bet, which given my ignorance is enough to make me confident I'll lose by accepting it. But if I eliminate those concerns, and I am confident in P3, then I'll take either bet if offered. (Better yet, I'll take both bets, and walk away with 200 BCUs.)

Comment author:Vaniver
24 May 2013 06:40:58PM
0 points
[-]

I ought to assign high probability to P3 (since it's true of most statements that it's possible to construct) and consequently low probabilities to P1 and P2.

I don't think the logic in this part follows. Some of it looks like precision: it's not clear to me that P1, P2, and P3 are mutually exclusive. What about cases where 'my monitor is bamboozled' and 'my monitor is not bamboozled' are both true, like sets that are both closed and open? Later, it looks like you want P3 to be the reverse of what you have it written as; there it looks like you want P3 to be the proposition that it is a well-formed statement with a binary truth value.

Blech; you're right, I incompletely transitioned from an earlier formulation and didn't shift signs all the way through. I think I fixed it now.

Your larger point about (p1 and p2) being just as plausible a priori is certainly true, and you're right that makes "and consequently low probabilities to P1 and P2" not follow from a properly constructed version of P3.

I'm not sure that makes a difference, though perhaps it does. It still seems that P(P1) > P(P2) is no more likely, given complete ignorance of the referent for "bamboozle", than P(P1) < P(P2)... and it still seems that knowing that otherwise sane people talk about whether monitors are bamboozled or not quickly makes me confident that P(P1 XOR P2) >> P((P1 AND P2) OR NOT(P1 OR P2))... though perhaps it ought not do so.

Comment author:Kawoomba
24 May 2013 08:25:57PM
*
0 points
[-]

Let's lift the veil: "bamboozledness" is a placeholder for ... phlogiston (a la "contains more than 30ppm phlogiston" = "bamboozled").

Looks like you now assign a probability of 0.5 to phlogiston, in your monitor, no less. (No fair? It could also have been something meaningful, but in the 'blue balls' scenario we're asking for the prior of a concept which you've never even seen mentioned as such (and hopefully never experienced), what are the chances that a randomly picked concept is a sensible addition to your current world view.)

That's the missing ingredient, the improbability of a hitherto unknown concept belonging to a sensible model of reality:

P("Monitor contains phlogiston" | "phlogiston is the correct theory" Λ "I have no clue about the theory other than it being correct and wouldn't know the first thing of how to guess what contains phlogiston") could be around 0.5 (although not necessarily exactly 0.5 based on complexity considerations).

However, what you're faced with isn't "... given that colors exist", "... given that bamboozledness exists", "... given that phlogiston exists" (in each case, 'that the model which contains concepts corresponding to the aforementioned corresponds to reality'), it is simply "what is the chance that there is phlogiston in your computer?" (Wait, now it's in my computer too! Not only my monitor?)

Since you have no (little - 'read about it on the internet') reason to assume that phlogiston / blue is anything meaningful, and especially given that in the scenario you aren't even asked about the color of a ball, but simply the prior which relies upon the unknown concept of 'blue' which corresponds to some physical property which isn't a part of your current model, any option which contains "phlogiston is nonsense"/"blue is nonsense", in the form of "monitor does not contain phlogiston", "ball is not blue", is vastly favored.

I posed the bet to show that you wouldn't actually assign a 0.5 probability to a randomly picked concept being part of your standard model. Heads says this concept called "blue" exists, tails it doesn't. Since you like memes. Maybe it helps not to think about the ball, but to think about what it would mean for the ball to be "blue". Instead of "Is the ball blue?", think "does blue extend my current model of reality in a meaningful way", then replace blue with bamboozled.

But I guess I do see where you're coming from, more so than I did before. The all important question is, "does that new attribute you know nothing about have to correspond to any physically existing quantity, can you assume that it extends/replaces your current model of the world, and do you thus need to factor in the improbability of invalidating your current model into assigning the probabilities of the new attribute". Would that be accurate?

Anyway, to answer your question: I wouldn't take either bet if offered, because of game-theoretical considerations...

Enter Psi, Omega's retarded, ahem, special little brother. It just goes around offering random bets, with no background knowledge whatsoever, so you're free to disregard the "why is he offering a bet in the first place" reservations.

you have no (little - 'read about it on the internet') reason to assume that phlogiston / blue is anything meaningful,

Well, meaningfulness is the crux, yes.

As I said initially, when I read about bamboozledness on the Internet (or am asked whether my ball is blue), my confidence seems to grow pretty quickly that the word isn't just gibberish... that there is some attribute to which the word refers, such that (P1 XOR P2) is true. When I listen to a conversation about bamboozled computers, I seem to generally accept the premise that bamboozled computers are possible pretty quickly, even if I haven't the foggiest clue what a bamboozled computer (or monitor, or ball, or hot thing, or whatever) is. It would surprise me if this were uncommon.

And, sure, perhaps I ought to be more skeptical about the premise that people are talking about anything meaningful at all. (I'm not certain of this, but there's certainly precedent for it.)

any option which contains "phlogiston is nonsense"/"blue is nonsense", in the form of "monitor does not contain phlogiston", "ball is not blue"

Here's where you lose me. I don't see how an option can contain "X is nonsense" in the form of "monitor does not contain X". If X is nonsense, "monitor does not contain X" isn't true. "monitor contains X" isn't true either. That's kind of what it means for X to be nonsense.

The all important question is, "does that new attribute you know nothing about have to correspond to any physically existing quantity, can you assume that it extends/replaces your current model of the world, and do you thus need to factor in the improbability of invalidating your current model into assigning the probabilities of the new attribute". Would that be accurate?

I'm not sure. The question that seems important here is "how confident am I, about that new attribute X, that a system either has X or lacks X but doesn't do both or neither?" Which seems to map pretty closely to "how confident am I that 'X' is meaningful?" Which may be equivalent to your formulation, but if so I don't follow the equivalence.

Enter Psi, Omega's retarded, ahem, special little brother.

(nods) As I said in the first place, if I eliminate the game-theoretical concerns, and I am confident that "bamboozled" isn't just meaningless gibberish, then I'll take either bet if offered.

Comment author:Kawoomba
24 May 2013 09:27:18PM
*
0 points
[-]

You're just trying to find out whether X is binary, then - if it is binary - you'd assign even odds, in the absence of any other information.

However, it's not enough for "blue" - "not blue" to be established as a binary attribute, we also need to weigh in the chances of the semantic content (the definition of 'blue', unknown to us at that time) corresponding to any physical attributes.

Binarity isn't the same as "describes a concept which translates to reality". When you say meaningful, you (I think) refer to the former, while I refer to the latter. With 'nonsense' I didn't mean 'non-binary', but instead 'if you had the actual definition of the color attribute, you'd find that it probably doesn't correspond to any meaningful property of the world, and as such that not having the property is vastly more likely, which would be "ball isn't blue (because nothing is blue, blue is e.g. about having blue-quarks, which don't model reality)".

Binarity isn't the same as "describes a concept which translates to reality".

I'll accept that in general.

When you say meaningful, you (I think) refer to the former, while I refer to the latter.

In this context, I fail to understand what is entailed by that supposed difference.

Put another way: I fail to understand how "X"/"not X" can be a binary attribute of a physical system (a ball, a monitor, whatever) if X doesn't correspond to a physical attribute, or a "concept which translates to reality". Can you give me an example of such an X?

Put yet another way: if there's no translation of X to reality, if there's no physical attribute to which X corresponds, then it seems to me neither "X" nor "not X" can be true or meaningful. What in the world could they possibly mean? What evidence would compel confidence in one proposition or the other?

Looked at yet a different way...

case 1: I am confident phlogiston doesn't exist.

I am confident of this because of evidence related to how friction works, how combustion works, because burning things can cause their mass to increase, for various other reasons. (P1) "My stove has phlogiston" is meaningful -- for example, I know what it would be to test for its truth or falsehood -- and based on other evidence I am confident it's false. (P2) "My stove has no phlogiston" is meaningful, and based on other evidence I am confident it's true.

If you remove all my evidence for the truth or falsehood of P1/P2, but somehow preserve my confidence in the meaningfulness of "phlogiston", you seem to be saying that my P(P1) << P(P2).

case 2: I am confident photons exist. Similarly to P1/P2, I'm confident that P3 ("My lightbulb generates photons") is true, and P4 ("My lightbulb generates no photons") is false, and "photon" is meaningful. Remove my evidence for P3/P4 but preserve my confidence in the meaningfulness of "photon", should my P(P3) << P(P4)? Or should my P(P3) >> P(P4)?

I don't see any grounds for justifying either. Do you?

Comment author:Kawoomba
25 May 2013 06:19:29AM
*
0 points
[-]

I don't see any grounds for justifying either. Do you?

Yes. P1 also entails that phlogiston theory is an accurate descriptor of reality - after all, it is saying your stove has phlogiston. P2 does not entail that phlogiston theory is an accurate descriptor of reality. Rejecting that your stove contains phlogiston can be done on the basis of "chances are nothing contains phlogiston, not knowing anything about phlogiston theory, it's probably not real, duh", which is why P(P2)>>P(P1).

The same applies to case 2, knowing nothing about photons, you should always go with the proposition (in this case P4) which is also supported by "photons are an imaginary concept with no equivalent in reality". For P3 to be correct, photons must have some physical equivalent on the territory level, so that anything (e.g. your lightbulb) can produce photons in the first place. For a randomly picked concept (not picked out of a physics textbook), the chances of that are negligible.

Take some random concept, such as "there are 17 kinds of quark, if something contains the 13th quark - the blue quark - we call it 'blue'". Then affirming it is blue entails affirming the 17-kinds-of-quark theory (quite the burden, knowing nothing about its veracity), while saying "it is not blue = it does not contain the 13th quark, because the 17-kinds-of-quark theory does not describe our reality" is the much favored default case.

A not-yet-considered randomly chosen concept (phlogiston, photons) does not have 50-50 odds of accurately describing reality, its odds of doing so - given no evidence - are vanishingly small. That translates to

P("stove contains phlogiston") being much smaller than P("stove does not contain phlogiston"). Reason (rephrasing the above argument): rejecting phlogiston theory as an accurate map of the territory strengthens your "stove does not contain phlogiston (... because phlogiston theory is probably not an an accurate map, knowing nothing about it)"

even if

P("stove contains phlogiston given phlogiston theory describes reality") = P("stove does not contain phlogiston given phlogiston theory describes reality") = 0.5

I agree that if "my stove does not contain X" is a meaningful and accurate thing to say even when X has no extension into the real world at all, then P("my stove does not contain X") >>> P("my stove contains X") for an arbitrarily selected concept X, since most arbitrarily selected concepts have no extension into the real world.

I am not nearly as convinced as you sound that "my stove does not contain X" is a meaningful and accurate thing to say even when X has no extension into the real world at all, but I'm not sure there's anything more to say about that than we've already said.

Also, thinking about it, I suspect I'm overly prone to assuming that X has some extension into the real world when I hear people talking about X.

That depends on the knowledge that the AI has. If B9 had deduced the existence of different light wavelengths, and knew how blue corresponded to a particular range, and how human eyes see stuff, the probability would be something close to the range of colors that would be considered blue divided by the range of all possible colors. If B9 has no idea what blue is, then it would depend on priors for how often statements end up being true when B9 doesn't know their meaning.

Without knowing what B9's knowledge is, the problem is under-defined.

Very low, because B9 has to hypothesize a causal framework involving colors without any way of observing anything but quantitatively varying luminosities. In other words, they must guess that they're looking at the average of three variables instead of at one variable. This may sound simple but there are many other hypotheses that could also be true, like two variables, four variables, or most likely of all, one variable. B9 will be surprised. This is right and proper. Most physics theories you make up with no evidence behind them will be wrong.

I think I'm confused. We're talking about something that's never even heard of colors, so there shouldn't be anything in the mind of the robot related to "blue" in any way. This ought to be like the prior probability from your perspective that zorgumphs are wogle. Now that I've said the words, I suppose there's some very low probability that zorgumphs are wogle, since there's a probability that "zorgumph" refers to "cats" and "wogle" to "furry". But when you didn't even have those words in your head anywhere, how could there have been a prior? How could B9's prior be "very low" instead of "nonexistent"?

Eliezer seems to be substituting the actual meaning of "blue". Now, if we present the AI with the English statement and ask it to assign a probability...my first impulse is to say it should use a complexity/simplicity prior based on length. This might actually be correct, if shorter message-length corresponds to greater frequency of use. (ETA that you might not be able to distinguish words within the sentence, if faced with a claim in a totally alien language.)

Well, if nothing else, when I ask B9 "is your ball blue?", I'm only providing a finite amount of evidence thereby that "blue" refers to a property that balls can have or not have. So if B9's priors on "blue" referring to anything at all are vastly low, then B9 will continue to believe, even after being asked the question, that "blue" doesn't refer to anything. Which doesn't seem like terribly sensible behavior. That sets a floor on how low the prior on "'blue' is meaningful" can be.

Comment author:ialdabaoth
24 May 2013 05:29:18AM
*
1 point
[-]

Thank you! This helps me hone in on a point that I am sorely confused on, which BrienneStrohl just illustrated nicely:

You're stating that B9's prior that "the ball is blue" is 'very low', as opposed to {Null / NaN}. And that likewise, my prior that "zorgumphs are wogle" is 'very low', as opposed to {Null / NaN}.

Does this mean that my belief system actually contains an uncountable infinitude of priors, one for each possible framing of each possible cluster of facts?

Or, to put my first question more succinctly, what priors should I assign potential facts that my current gestalt assigns no semantic meaning to whatsoever?

"The ball is blue" only gets assigned a probability by your prior when "blue" is interpreted, not as a word that you don't understand, but as a causal hypothesis about previously unknown laws of physics allowing light to have two numbers assigned to it that you didn't previously know about, plus the one number you do know about. It's like imagining that there's a fifth force appearing in quark-quark interactions a la the "Alderson Drive". You don't need to have seen the fifth force for the hypothesis to be meaningful, so long as the hypothesis specifies how the causal force interacts with you.

If you restrain yourself to only finite sets of physical laws of this sort, your prior will be over countably many causal models.

There are only so many distinct states of experience, so yes, causal models are countable. The set of all causal models is a set of functions that map K n-valued past experiential states into L n-valued future experiential states.

This is a monstrously huge number of functions in the set, but still countable, so long as K and L are at most countably infinite.

Note that this assumes that states of experience with zero discernible difference between them are the same thing - eg, if you come up with the same predictions using the first million digits of sqrt(2) and the irrational number sqrt(2), then they're the same model.

But the set of causal models is not the set of experience mappings. The model where things disappear after they cross the cosmological horizon is a different model than standard physics, even though they predict the same experiences. We can differentiate between them because Occam's Razor favors one over the other, and our experiences give us ample cause to trust Occam's Razor.

At first glance, it seems this gives us enough to diagonalize models--1 meter outside the horizon is different from model one, two meters is different from model two...

There might be a way to constrain this based on the models we can assign different probabilities to, given our knowledge and experience, which might get it down to countable numbers, but how to do it is not obvious to me.

Er, now I see that Eliezer's post is discussing finite sets of physical laws, which rules out the cosmological horizon diagonalization. But, I think this causal models as function mapping fails in another way: we can't predict the n in n-valued future experiential states. Before the camera was switched, B9 would assign low probability to these high n-valued experiences. If B9 can get a camera that allows it to perceive color, it could also get an attachment that allows it to calculate the permittivity constant to arbitrary precision. Since it can't put a bound on the number of values in the L states, the set is uncountable and so is the set of functions.

we can't predict the n in n-valued future experiential states.

What? Of course we can - it's much simpler with a computer program, of course. Suppose you have M bits of state data. There are 2^M possible states of experience. What I mean by n-valued is that there are a certain discrete set of possible experiences.

If B9 can get a camera that allows it to perceive color, it could also get an attachment that allows it to calculate the permittivity constant to arbitrary precision.

Arbitrary, yes. Unbounded, no. It's still bounded by the amount of physical memory it can use to represent state.

In order to bound the states at a number n, it would need to assign probability zero to ever getting an upgrade allowing it to access log n bytes of memory. I don't know how this zero-probability assignment would be justified for any n--there's a non-zero probability that one's model of physics is completely wrong, and once that's gone, there's not much left to make something impossible.

Comment author:Vaniver
24 May 2013 07:26:10PM
2 points
[-]

"The ball is blue" only gets assigned a probability by your prior when "blue" is interpreted, not as a word that you don't understand, but as a causal hypothesis about previously unknown laws of physics allowing light to have two numbers assigned to it that you didn't previously know about, plus the one number you do know about.

Note that a conversant AI will likely have a causal model of conversations, and so there are two distinct things going on here- both "what are my beliefs about words that I don't understand used in a sentence" and "what are my beliefs about physics I don't understand yet." This split is a potential source of confusion, and the conversational model is one reason why the betting argument for quantifying uncertainties meets serious resistance.

To me the conversational part of this seems way less complicated/interesting than the unknown causal models part - if I have any 'philosophical' confusion about how to treat unknown strings of English letters it is not obvious to me what it is.

Comment author:Kawoomba
24 May 2013 02:03:47PM
2 points
[-]

You can reserve some slice of your probability space for "here be dragons", the (1 - P("my current gestalt is correct"). Your countably many priors may fight over that real estate.

Also, if you demand your models to be computable (a good assumption, because if they aren't we're eff'ed anyways), there'll never be an uncountable infinitude of priors.

Comment author:CCC
24 May 2013 10:39:19AM
0 points
[-]

Before the camera gets switched, what prior probability does B9 assign to the possibility that its favorite ball is blue?

I'd imagine something like <error in world model: concept 'blue': no definition found>. It would be like asking whether or not the ball is supercalifragilisticexpialidocious.

If B9 has recently been informed that 'blue' is a property, then the prior would be very low. Can balls even be blue? If balls can be blue, then what percentage of balls are blue? There is also a possibility that, if some balls can be blue, all balls are blue; so the probability distribution would have a very low mean but a very high standard deviation.

Any further refinement requires B9 to obtain additional information; if informed that balls can be blue, the odds go up; if informed that some balls are blue, the odds go up further; if further informed that not all balls are blue, the standard deviation drops somewhat. If presented with the luminance formula, the odds may go up significantly (it can't be used to prove blueness, but it can be used to limit the number of possible colours the ball can be, based on the output of the black-and-white camera).

I'd go down a level of abstraction about the camera in order to answer this question. You have a list of numbers, and you're told that five seconds from now this list of numbers are going to replace with a list of triplets, with the property that the average of the triplet is the same as the corresponding number in the list.

What is the probability you assign to "one of these triplets is within a certain range of RGB values?"

Comment author:JeffJo
15 June 2013 07:29:25PM
0 points
[-]

Since this discussion was reopened, I've spent some time - mostly while jogging - pondering and refining my stance on the points expressed. I just got around to writing them down. Since there is no other way to do it, I'll present them boldly, apologizing in advance if I seem overly harsh. There is no such intention.

1) "Accursed Frequentists" and "Self-righteous Bayesians" alike are right, and wrong. Probability is in your knowledge - or rather, the lack thereof - of what is in the environment. Specifically, it is the measure of the ambiguity in the situation.

2) Nothing is truly random. If you know the exact shape of a coin, its exact weight distribution, exactly how it is held before flipping, exactly what forces are applied to flip it, the exact properties of the air and air currents it tumbles through, and exactly how long it is in the air before being caught in you open palm, then you can calculate - not predict - whether it will show Heads or Tails. Any lack in this knowledge leaves multiple possibilities open, which is the ambiguity.

3) Saying "the coin is biased" is saying that there is an inherent property, over all of the ambiguous ways you could hold the coin, the ambiguous forces you could use to flip it, the ambiguous air properties, and the ambiguous tumbling times, for it to land one way or another. (Its shape and weight are fixed, so they are unambiguous even if they are not known, and probably the source of this "inherent property.")

4) Your state of mind defines probability only in how you use it to define the ambiguities you are accounting for. Eliezer's frequentist is perfectly correct to say he needs to know the bias of this coin, since in his state of mind the ambiguity is what this biased coin will do. And Eliezer is also perfectly correct to say the actual bias is unimportant. His answer is 50%, since in his mind the ambiguity is what any biased coin do. They are addressing different questions.

5) A simple change to the coin question puts Eliezer in the same "need the environment" situation he claims belongs only to the frequentist: Fli[p his coin twice. What probability are you willing to assign to getting the same result on both flips?

6) The problem with the "B9" question discussed recently, is that there is no framework to place the ambiguity within. No environmental circumstances that you can use to assess the probability.

7) The propensity for some frequentists to want probability to be "in the environment" is just a side effect of practical application. Say you want to evaluate a statistical question, such as the effectiveness of a drug. Drug effectiveness can vary with gender, age, race, and probably many other factors that are easily identified; that is, it is indeed "in the environment." You could ignore those possible differences, and get an answer that applies to a generic person just as Eliezer's answer applies to a generic biased coin. But it behooves you to eliminate whatever sources of ambiguity you easily can.

8) In geometry, "point" and "line" are undefined concepts. But we all have a pretty good idea what they are supposed to mean, and this meaning is fairly universal.

"Length" and "angle" are undefined measurements of what separates two different instances of "point" and "line," respectively. But again, we have a pretty clear idea of what is intended.

In probability, "outcome" is an undefined concept. But unlike geometry, where the presumed meaning is universal, a meaning for "outcome" is different for each ambiguous situation. But an "event" is defined - as a set of outcomes.

"Relative likelihood" is an undefined measurement what separates two different instances of "event." And just like "length," we have a pretty clear idea of what it is supposed to mean. It expresses the relative chances that either event will occur in any expression of the ambiguities we consider.

9) "Probability" is just the likelihood relative to everything. As such, it represents the fractional chances of an event's occurrence. So if we can repeat the same ambiguities exactly, we expect the frequency to approach the probability. But note: this is not a definition of probability, as Bayesians insist frequentists think. It is a side effect of what we want "likelihood" to mean.

10) Eliezer misstated the "classic" two-child problem. The problem he stated is the one that corresponds to the usual solution, but oddly enough the usual solution is wrong for the question that is usually asked. And here I'm referring to, among others, Martin Gardner's version and Marilyn vos Savant's more famous version. The difference is that Eliezer asks the parent if there is a boy, but the classic version simply states that one child is a boy. Gardner changed his answer to 1/2 because, when the reason we have this information is not known, you can't implicitly assume that you will always know about the boy in a boy+girl family.

And the reason I bring this up, is because the "brain-teasing ability" of the problem derives more from effects of this implied assumption, than from any "tendency to think of probabilities as inherent properties of objects." This can be seen by restating the problem as a variation of Bertrand's Box Paradox:

The probability that, in a family of two children, both have the same gender is 1/2. But suppose you learn that one child is in scouts - but you don’t know if it is Boy Scouts or Girl Scouts. If it is Boy Scouts, those who answer the actual "classic" problem as Eliezer answered his variation will say the probability of two boys is 1/3. They'd say the same thing, about two girls, if it is Girl Scouts. So it appears you don’t even need to know what branch of Scouting it is to change the answer to 1/3.

The fallacy in this logic is the same as the reason Eliezer reformulated the problem: the answer is 1/3 only if you ask a question equivalent to "is at least one a boy," not if you merely learn that fact. And the "brain-teaser ability" is because people sense, correctly, that they have no new information in the "classic" version of the problem which would allow the change from 1/2 to 1/3. But they are told, incorrectly, that the answer does change.

## Comments (188)

Old*2 points [-]So, I've been on this site for awhile. When I first came here, I had never had a formal introduction to Bayes' theorem, but it sounded a lot like ideas that I had independently worked out in my high school and college days (I was something of an amateur mathematician and game theorist).

A few days ago I was reading through one of your articles - I don't remember which one - and it suddenly struck me that I may not

actuallyunderstand priors as well as I think I do.After re-reading some fo the series, and then working through the math, I'm now reasonably convinced that I don't properly understand priors at all - at least, not intuitively, which seems to be an important aspect for actually using them.

I have a few weird questions that I'm hoping someone can answer, that will help point me back towards the correct quadrant of domain space. I'll start with a single question, and then see if I can claw my way towards understanding from there based on the answers:

Imagine there is a rational, Bayesian AI named B9 which has been programmed to visually identify and manipulate geometric objects. B9's favorite object is a blue ball, but B9 has no idea that it is blue: B9 sees the world through a black and white camera, and has always seen the world through a black and white camera. Until now, B9 has never heard of "colors" - no one has mentioned "colors" to B9, and B9 has certainly never experienced them. Today, unbeknownst to B9, B9's creator is going to upgrade its camera to a full-color system, and see how long it takes B9 to adapt to the new inputs.

The camera gets switched in 5 seconds. Before the camera gets switched, what prior probability does B9 assign to the possibility that its favorite ball is blue?

Well, without a sense that can detect color, it would just be an arbitrary undetectable property something might have, right? So it would be ... dependent on what other objects B9 is aware of, I think. The precise hypothesis of "all [objects that we know are blue] share a common property I cannot perceive with this camera" should be highly conjunctive, and therefore low, unless B9 has observed humans reacting to them because of their coloration. And even then, "blue" would be defined only in terms of what other objects have it, not a specific input type from the camera.

I suspect I'm missing the point of this question, somehow.

Without knowing anything about what "blue" is? I'd say 1/2.

Your question is not well specified. Event though you might think that the proposition "its favorite ball is blue" is something that has a clear meaning, it is highly dependent on to which precision it will be able to see colours, how wide the interval defined as blue is, and how it considers multicoloured objects. If we suppose it would categorise the observed wavelength into one of 27 possible colours (one of those being blue), and further suppose that it knew the ball to be of a single colour and not patterned, and further not have any background information about the relative frequencies of different colours of balls or other useful prior knowledge, the prior probability would be 1/27. If we suppose that it had access to internet and had read this discussion on LW about the colourblind AI, it would increase its probability by doing an update based on the probability of this affecting the colour of its own ball.

I don't claim to be any kind of Bayesian expert here, but, well, I seem to be replying anyway. Don't take my reply too seriously.

B9 has never heard of "colors". I take that to mean, not only that nobody has used that particular word to B9, but that B9 has been exposed to no inputs that significantly depend on it... e.g., nobody has talked about whether their shirts match their pants, nobody has talked about spectroscopic analysis of starlight or about the mechanism of action of clorophyll or etc... that B9 has no evidentiary basis from which to draw conclusions about color. (That is, B9 is the anti-Mary.)

Given those assumptions, a universal prior is appropriate... 50% chance that "My ball is blue" is true, 50% chance that it's false.

If those assumptions aren't

quitetrue, and B9 hassomeinformation that usefully pertains, however indirectly, to the color of the ball, then insofar as that information is evidence one way or another, B9 ideally updates that probability accordingly.*2 points [-]You and Kindly both? Very surprising.

Consider you as B9, reading on the internet about some new and independent property of items, "bamboozle-ness". Should you now believe that P("My monitor is bamboozled") = 0.5? That it is as likely that your monitor is bamboozled as that it's not bamboozled?

If I offered you a bet of 100 big currency units, if it turns out your monitor was bamboozled, you'd win triple! Or 50x! Wouldn't you accept, based on your "well, 50% chance of winning" assessment?

Am I bamboozled? Are you bamboozled?

Notice that B9 has even less reason to believe in colors than you in the example above - it hasn't even read about them on the internet.

Instead of assigning 50-50 odds, you'd have to take that part of the probability space which represents "my belief in other models than my main model", identify the miniscule prior for that specific model containing "colors", or "bamboozleness", then calculate from assuming that model the odds of blue versus not-blue, then weigh back in the uncertainty from such an arbitrary model being true in lieu of your standard model.

*0 points [-]Given the following propositions:

(P1) "My monitor is bamboozled."

(P2) "My monitor is not bamboozled."

(P3) "'My monitor is bamboozled' is not the sort of statement that has a binary truth value; monitors are neither bamboozled nor non-bamboozled."

...and knowing

nothing at allabout bamboozledness, never even having heard the word before, it seems I ought to assign high probability to P3 (since it's true of most statements that it's possible to construct) and consequently low probabilities to P1 and P2.But when I read about bamboozledness on the Internet (or am asked whether my ball is blue), my confidence in P3 seems to go up [EDIT: I mean down] pretty quickly, based on my experience with people talking about stuff. (Which among other things suggests that my prior for P3 wasn't all

thatlow [EDIT: I mean high].)Having become convinced of NOT(P3) (despite still knowing nothing much about bamboozledness other than it's the sort of thing people talk about on the Internet), if I have very low confidence in P1, I have very high confidence in P2. If I have very low confidence in P2, I have very high confidence in P1. Very high confidence in

eitherproposition seems unjustifiable... indeed, a lower probability for P1 than P2 or vice-versa seems unjustifiable... so I conclude 50%.If I'm wrong to do so, it seems I'm wrong to reduce my confidence in P3 in the first place.

Which I guess is possible, though I do seem to do it quite naturally.

But given NOT(P3), I genuinely don't see why I should believe P(P2) > P(P1).

Just to be clear: you're offering me (300BCUs if P1, -100BCUs if P2)?

And you're suggesting I

shouldn'ttake that bet, because P(P2) >> P(P1)?It seems to follow from that reasoning that I

oughtto take (300BCUs if P2, -100BCUs if P1).Would you suggest I take that bet?

Anyway, to answer your question: I wouldn't take either bet if offered, because of game-theoretical considerations... that is, the moment you offer me the bet, that's evidence that you expect to gain by the bet, which given my ignorance is enough to make me confident I'll lose by accepting it. But if I eliminate those concerns, and I am confident in P3, then I'll take either bet if offered. (Better yet, I'll take

bothbets, and walk away with 200 BCUs.)I don't think the logic in this part follows. Some of it looks like precision: it's not clear to me that P1, P2, and P3 are mutually exclusive. What about cases where 'my monitor is bamboozled' and 'my monitor is not bamboozled' are both true, like sets that are both closed and open? Later, it looks like you want P3 to be the reverse of what you have it written as; there it looks like you want P3 to be the proposition that it is a well-formed statement with a binary truth value.

Blech; you're right, I incompletely transitioned from an earlier formulation and didn't shift signs all the way through. I think I fixed it now.

Your larger point about (p1 and p2) being just as plausible

a prioriis certainly true, and you're right that makes "and consequently low probabilities to P1 and P2" not follow from a properly constructed version of P3.I'm not sure that makes a difference, though perhaps it does. It

stillseems that P(P1) > P(P2) is no more likely, given complete ignorance of the referent for "bamboozle", than P(P1) < P(P2)... and it still seems that knowing that otherwise sane people talk about whether monitors are bamboozled or not quickly makes me confident that P(P1 XOR P2) >> P((P1 AND P2) OR NOT(P1 OR P2))... though perhaps it ought not do so.*0 points [-]Let's lift the veil: "bamboozledness" is a placeholder for ... phlogiston (a la "contains more than 30ppm phlogiston" = "bamboozled").

Looks like you now assign a probability of 0.5 to phlogiston, in your monitor, no less. (No fair? It could also have been something meaningful, but in the 'blue balls' scenario we're asking for the prior of a concept which you've never even seen mentioned as such (and hopefully never experienced), what are the chances that a randomly picked concept is a sensible addition to your current world view.)

That's the missing ingredient, the improbability of a hitherto unknown concept belonging to a sensible model of reality:

P("Monitor contains phlogiston" | "phlogiston is the correct theory" Λ "I have no clue about the theory other than it being correct and wouldn't know the first thing of how to guess what contains phlogiston") could be around 0.5 (although not necessarily exactly 0.5 based on complexity considerations).

However, what you're faced with isn't "... given that colors exist", "... given that bamboozledness exists", "... given that phlogiston exists" (in each case, 'that the model which contains concepts corresponding to the aforementioned corresponds to reality'), it is simply "what is the chance that there is phlogiston in your computer?" (Wait, now it's in my computer too! Not only my monitor?)

Since you have no (little - 'read about it on the internet') reason to assume that phlogiston / blue is anything meaningful, and especially given that in the scenario you aren't even asked about the

colorof a ball, but simply the prior which relies upon the unknown concept of 'blue' which corresponds to some physical property which isn't a part of your current model, any option which contains "phlogiston is nonsense"/"blue is nonsense", in the form of "monitor does not contain phlogiston", "ball is not blue", is vastly favored.I posed the bet to show that you wouldn't actually assign a 0.5 probability to a randomly picked concept being part of your standard model. Heads says this concept called "blue" exists, tails it doesn't. Since you like memes. Maybe it helps not to think about the ball, but to think about what it would mean for the ball to be "blue". Instead of "Is the ball blue?", think "does blue extend my current model of reality in a meaningful way", then replace blue with bamboozled.

But I guess I do see where you're coming from, more so than I did before. The all important question is, "does that new attribute you know nothing about have to correspond to any physically existing quantity, can you assume that it extends/replaces your current model of the world, and do you thus need to factor in the improbability of invalidating your current model into assigning the probabilities of the new attribute". Would that be accurate?

Enter Psi, Omega's retarded, ahem, special little brother. It just goes around offering random bets, with no background knowledge whatsoever, so you're free to disregard the "why is he offering a bet in the first place" reservations.

Well, meaningfulness is the crux, yes.

As I said initially, when I read about bamboozledness on the Internet (or am asked whether my ball is blue), my confidence seems to grow pretty quickly that the word isn't just gibberish... that there is some attribute to which the word refers, such that (P1 XOR P2) is true. When I listen to a conversation about bamboozled computers, I seem to generally accept the premise that bamboozled computers are possible pretty quickly, even if I haven't the foggiest clue what a bamboozled computer (or monitor, or ball, or hot thing, or whatever)

is. It would surprise me if this were uncommon.And, sure, perhaps I

oughtto be more skeptical about the premise that people are talking aboutanything meaningful at all. (I'm not certain of this, but there's certainly precedent for it.)Here's where you lose me. I don't see how an option

cancontain "X is nonsense" in the form of "monitor does not contain X". If X is nonsense, "monitor does not contain X" isn't true. "monitor contains X" isn't true either. That's kind of what it means for X to be nonsense.I'm not sure. The question that seems important here is "how confident am I, about that new attribute X, that a system either has X or lacks X but doesn't do both or neither?" Which seems to map pretty closely to "how confident am I that 'X' is meaningful?" Which may be equivalent to your formulation, but if so I don't follow the equivalence.

(nods) As I said in the first place, if I eliminate the game-theoretical concerns, and I am confident that "bamboozled" isn't just meaningless gibberish, then I'll take either bet if offered.

*0 points [-]You're just trying to find out whether X is binary, then - if it is binary - you'd assign even odds, in the absence of any other information.

However, it's not enough for "blue" - "not blue" to be established as a binary attribute, we also need to weigh in the chances of the

semantic content(the definition of 'blue', unknown to us at that time) corresponding to any physical attributes.Binarity isn't the same as "describes a concept which translates to reality". When you say meaningful, you (I think) refer to the former, while I refer to the latter. With 'nonsense' I didn't mean 'non-binary', but instead 'if you had the actual definition of the color attribute, you'd find that it probably doesn't correspond to any meaningful property of the world, and as such that not having the property is vastly more likely, which would be "ball isn't blue (because nothing is blue, blue is e.g. about having blue-quarks, which don't model reality)".

I'll accept that in general.

In this context, I fail to understand what is entailed by that supposed difference.

Put another way: I fail to understand how "X"/"not X" can be a binary attribute of a physical system (a ball, a monitor, whatever) if X

doesn'tcorrespond to a physical attribute, or a "concept which translates to reality". Can you give me an example of such an X?Put yet another way: if there's no translation of X to reality, if there's no physical attribute to which X corresponds, then it seems to me neither "X" nor "not X" can be true or meaningful. What in the world could they possibly mean? What evidence would compel confidence in one proposition or the other?

Looked at yet a different way...

case 1: I am confident phlogiston doesn't exist.

I am confident of this because of evidence related to how friction works, how combustion works, because burning things can cause their mass to increase, for various other reasons. (P1) "My stove has phlogiston" is meaningful -- for example, I know what it would be to test for its truth or falsehood -- and based on other evidence I am confident it's false. (P2) "My stove has no phlogiston" is meaningful, and based on other evidence I am confident it's true.

If you remove all my evidence for the truth or falsehood of P1/P2, but somehow preserve my confidence in the meaningfulness of "phlogiston", you seem to be saying that my P(P1) << P(P2).

case 2: I am confident photons exist. Similarly to P1/P2, I'm confident that P3 ("My lightbulb generates photons") is true, and P4 ("My lightbulb generates no photons") is false, and "photon" is meaningful. Remove my evidence for P3/P4 but preserve my confidence in the meaningfulness of "photon", should my P(P3) << P(P4)? Or should my P(P3) >> P(P4)?

I don't see any grounds for justifying either. Do you?

*0 points [-]Yes. P1 also entails that phlogiston theory is an accurate descriptor of reality - after all, it is saying your stove has phlogiston. P2 does not entail that phlogiston theory is an accurate descriptor of reality. Rejecting that your stove contains phlogiston can be done on the basis of "chances are nothing contains phlogiston, not knowing anything about phlogiston theory, it's probably not real, duh", which is why P(P2)>>P(P1).

The same applies to case 2, knowing nothing about photons, you should always go with the proposition (in this case P4) which is also supported by "photons are an imaginary concept with no equivalent in reality". For P3 to be correct, photons

musthave some physical equivalent on the territory level, so that anything (e.g. your lightbulb)canproduce photons in the first place. For a randomly picked concept (not picked out of a physics textbook), the chances of that are negligible.Take some random concept, such as "there are 17 kinds of quark, if something contains the 13th quark - the blue quark - we call it 'blue'". Then affirming it is blue entails affirming the 17-kinds-of-quark theory (quite the burden, knowing nothing about its veracity), while saying "it is not blue = it does not contain the 13th quark, because the 17-kinds-of-quark theory does not describe our reality" is the much favored default case.

A not-yet-considered randomly chosen concept (phlogiston, photons) does not have 50-50 odds of accurately describing reality, its odds of doing so - given no evidence - are vanishingly small. That translates to

P("stove contains phlogiston") being much smaller than P("stove does not contain phlogiston"). Reason (rephrasing the above argument): rejecting phlogiston theory as an accurate map of the territory strengthens your "stove does not contain phlogiston (... because phlogiston theory is probably not an an accurate map, knowing nothing about it)"

even if

P("stove contains phlogiston given phlogiston theory describes reality") = P("stove does not contain phlogiston given phlogiston theory describes reality") = 0.5

I agree that if "my stove does not contain X" is a meaningful and accurate thing to say even when X has no extension into the real world at all, then P("my stove does not contain X") >>> P("my stove contains X") for an arbitrarily selected concept X, since most arbitrarily selected concepts have no extension into the real world.

I am not nearly as convinced as you sound that "my stove does not contain X" is a meaningful and accurate thing to say even when X has no extension into the real world at all, but I'm not sure there's anything more to say about that than we've already said.

Also, thinking about it, I suspect I'm overly prone to assuming that X has

someextension into the real world when I hear people talking about X.*0 points [-]That depends on the knowledge that the AI has. If B9 had deduced the existence of different light wavelengths, and knew how blue corresponded to a particular range, and how human eyes see stuff, the probability would be something close to the range of colors that would be considered blue divided by the range of all possible colors. If B9 has no idea what blue is, then it would depend on priors for how often statements end up being true when B9 doesn't know their meaning.

Without knowing what B9's knowledge is, the problem is under-defined.

Very low, because B9 has to hypothesize a causal framework involving colors without any way of observing anything but quantitatively varying luminosities. In other words, they must guess that they're looking at the average of three variables instead of at one variable. This may sound simple but there are many other hypotheses that could also be true, like two variables, four variables, or most likely of all, one variable. B9 will be surprised. This is right and proper. Most physics theories you make up with no evidence behind them will be wrong.

I think I'm confused. We're talking about something that's never even

heardof colors, so there shouldn't be anything in the mind of the robot related to "blue" in any way. This ought to be like the prior probability from your perspective that zorgumphs are wogle. Now that I've said the words, I suppose there's some very low probability that zorgumphs are wogle, since there's a probability that "zorgumph" refers to "cats" and "wogle" to "furry". But when you didn't even have those words in your head anywhere, how could there have been a prior? How could B9's prior be "very low" instead of "nonexistent"?*3 points [-]Eliezer seems to be substituting the actual meaning of "blue". Now, if we present the AI with the English statement and ask it to assign a probability...my first impulse is to say it should use a complexity/simplicity prior based on length. This might actually be correct, if shorter message-length corresponds to greater frequency of use. (ETA that you might not be able to distinguish words within the sentence, if faced with a claim in a totally alien language.)

Well, if nothing else, when I ask B9 "is your ball blue?", I'm only providing a finite amount of evidence thereby that "blue" refers to a property that balls can have or not have. So if B9's priors on "blue" referring to anything at all are

vastlylow, then B9 will continue to believe,even after being asked the question, that "blue" doesn't refer to anything. Which doesn't seem like terribly sensible behavior. That sets a floor on how low the prior on "'blue' is meaningful" can be.*1 point [-]Thank you! This helps me hone in on a point that I am sorely confused on, which BrienneStrohl just illustrated nicely:

You're stating that B9's prior that "the ball is blue" is 'very low', as opposed to {Null / NaN}. And that likewise, my prior that "zorgumphs are wogle" is 'very low', as opposed to {Null / NaN}.

Does this mean that my belief system actually contains an uncountable infinitude of priors, one for each possible framing of each possible cluster of facts?

Or, to put my first question more succinctly, what priors should I assign potential facts that my current gestalt assigns no semantic meaning to whatsoever?

"The ball is blue" only gets assigned a probability by your prior when "blue" is interpreted, not as a word that you don't understand, but as a causal hypothesis about previously unknown laws of physics allowing light to have two numbers assigned to it that you didn't previously know about, plus the one number you do know about. It's like imagining that there's a fifth force appearing in quark-quark interactions

a lathe "Alderson Drive". You don't need to have seen the fifth force for the hypothesis to be meaningful, so long as the hypothesis specifies how the causal force interacts with you.If you restrain yourself to only finite sets of physical laws of this sort, your prior will be over countably many causal models.

Causal models are countable? Are irrational constants not part of causal models?

There are only so many distinct states of experience, so yes, causal models are countable. The set of all causal models is a set of functions that map K n-valued past experiential states into L n-valued future experiential states.

This is a monstrously huge number of functions in the set, but still countable, so long as K and L are at most countably infinite.

Note that this assumes that states of experience with zero discernible difference between them are the same thing - eg, if you come up with the same predictions using the first million digits of sqrt(2) and the irrational number sqrt(2), then they're the same model.

But the set of causal models is not the set of experience mappings. The model where things disappear after they cross the cosmological horizon is a different model than standard physics, even though they predict the same experiences. We can differentiate between them because Occam's Razor favors one over the other, and our experiences give us ample cause to trust Occam's Razor.

At first glance, it seems this gives us enough to diagonalize models--1 meter outside the horizon is different from model one, two meters is different from model two...

There might be a way to constrain this based on the models we can assign different probabilities to, given our knowledge and experience, which might get it down to countable numbers, but how to do it is not obvious to me.

Er, now I see that Eliezer's post is discussing finite sets of physical laws, which rules out the cosmological horizon diagonalization. But, I think this causal models as function mapping fails in another way: we can't predict the n in n-valued future experiential states. Before the camera was switched, B9 would assign low probability to these high n-valued experiences. If B9 can get a camera that allows it to perceive color, it could also get an attachment that allows it to calculate the permittivity constant to arbitrary precision. Since it can't put a bound on the number of values in the L states, the set is uncountable and so is the set of functions.

*0 points [-]What? Of course we can - it's much simpler with a computer program, of course. Suppose you have M bits of state data. There are 2^M possible states of experience. What I mean by n-valued is that there are a certain discrete set of possible experiences.

Arbitrary, yes. Unbounded, no. It's still bounded by the amount of physical memory it can use to represent state.

In order to bound the states at a number n, it would need to assign probability zero to ever getting an upgrade allowing it to access log n bytes of memory. I don't know how this zero-probability assignment would be justified for any n--there's a non-zero probability that one's model of physics is completely wrong, and once that's gone, there's not much left to make something impossible.

Note that a conversant AI will likely have a causal model of conversations, and so there are two distinct things going on here- both "what are my beliefs about words that I don't understand used in a sentence" and "what are my beliefs about physics I don't understand yet." This split is a potential source of confusion, and the conversational model is one reason why the betting argument for quantifying uncertainties meets serious resistance.

To me the conversational part of this seems way less complicated/interesting than the unknown causal models part - if I have any 'philosophical' confusion about how to treat unknown strings of English letters it is not obvious to me what it is.

You can reserve some slice of your probability space for "here be dragons", the (1 - P("my current gestalt is correct"). Your countably many priors may fight over that real estate.

Also, if you demand your models to be computable (a good assumption, because if they aren't we're eff'ed anyways), there'll never be an

uncountableinfinitude of priors.I'd imagine something like <error in world model: concept 'blue': no definition found>. It would be like asking whether or not the ball is supercalifragilisticexpialidocious.

If B9 has recently been informed that 'blue' is a property, then the prior would be very low. Can balls even be blue? If balls can be blue, then what percentage of balls are blue? There is also a possibility that, if some balls can be blue,

allballs are blue; so the probability distribution would have a very low mean but a very high standard deviation.Any further refinement requires B9 to obtain additional information; if informed that balls

canbe blue, the odds go up; if informed that some ballsareblue, the odds go up further; if further informed that notallballs are blue, the standard deviation drops somewhat. If presented with the luminance formula, the odds may go up significantly (it can't be used to prove blueness, but it can be used to limit the number of possible colours the ball can be, based on the output of the black-and-white camera).I'd go down a level of abstraction about the camera in order to answer this question. You have a list of numbers, and you're told that five seconds from now this list of numbers are going to replace with a list of triplets, with the property that the average of the triplet is the same as the corresponding number in the list.

What is the probability you assign to "one of these triplets is within a certain range of RGB values?"

Since this discussion was reopened, I've spent some time - mostly while jogging - pondering and refining my stance on the points expressed. I just got around to writing them down. Since there is no other way to do it, I'll present them boldly, apologizing in advance if I seem overly harsh. There is no such intention.

1) "Accursed Frequentists" and "Self-righteous Bayesians" alike are right, and wrong. Probability is in your knowledge - or rather, the lack thereof - of what is in the environment. Specifically, it is the measure of the ambiguity in the situation.

2) Nothing is truly random. If you know the exact shape of a coin, its exact weight distribution, exactly how it is held before flipping, exactly what forces are applied to flip it, the exact properties of the air and air currents it tumbles through, and exactly how long it is in the air before being caught in you open palm, then you can calculate - not predict - whether it will show Heads or Tails. Any lack in this knowledge leaves multiple possibilities open, which is the ambiguity.

3) Saying "the coin is biased" is saying that there is an inherent property, over all of the ambiguous ways you could hold the coin, the ambiguous forces you could use to flip it, the ambiguous air properties, and the ambiguous tumbling times, for it to land one way or another. (Its shape and weight are fixed, so they are unambiguous even if they are not known, and probably the source of this "inherent property.")

4) Your state of mind defines probability only in how you use it to define the ambiguities you are accounting for. Eliezer's frequentist is perfectly correct to say he needs to know the bias of

thiscoin, since in his state of mind the ambiguity is whatthisbiased coin will do. And Eliezer is also perfectly correct to say the actual bias is unimportant. His answer is 50%, since in his mind the ambiguity is whatanybiased coin do. They are addressing different questions.5) A simple change to the coin question puts Eliezer in the same "need the environment" situation he claims belongs only to the frequentist: Fli[p his coin twice. What probability are you willing to assign to getting the same result on both flips?

6) The problem with the "B9" question discussed recently, is that there is no framework to place the ambiguity within. No environmental circumstances that you can use to assess the probability.

7) The propensity for some frequentists to want probability to be "in the environment" is just a side effect of practical application. Say you want to evaluate a statistical question, such as the effectiveness of a drug. Drug effectiveness can vary with gender, age, race, and probably many other factors that are easily identified; that is, it is indeed "in the environment." You could ignore those possible differences, and get an answer that applies to a generic person just as Eliezer's answer applies to a generic biased coin. But it behooves you to eliminate whatever sources of ambiguity you easily can.

8) In geometry, "point" and "line" are undefined concepts. But we all have a pretty good idea what they are supposed to mean, and this meaning is fairly universal.

"Length" and "angle" are undefined measurements of what separates two different instances of "point" and "line," respectively. But again, we have a pretty clear idea of what is intended.

In probability, "outcome" is an undefined concept. But unlike geometry, where the presumed meaning is universal, a meaning for "outcome" is different for each ambiguous situation. But an "event" is defined - as a set of outcomes.

"Relative likelihood" is an undefined measurement what separates two different instances of "event." And just like "length," we have a pretty clear idea of what it is supposed to mean. It expresses the relative chances that either event will occur in any expression of the ambiguities we consider.

9) "Probability" is just the likelihood relative to everything. As such, it represents the fractional chances of an event's occurrence. So if we can repeat the same ambiguities exactly, we expect the frequency to approach the probability. But note: this is not a definition of probability, as Bayesians insist frequentists think. It is a side effect of what we want "likelihood" to mean.

10) Eliezer misstated the "classic" two-child problem. The problem he stated is the one that corresponds to the usual solution, but oddly enough the usual solution is wrong for the question that is usually asked. And here I'm referring to, among others, Martin Gardner's version and Marilyn vos Savant's more famous version. The difference is that Eliezer asks the parent if there is a boy, but the classic version simply states that one child is a boy. Gardner changed his answer to 1/2 because, when the reason we have this information is not known, you can't implicitly assume that you will always know about the boy in a boy+girl family.

And the reason I bring this up, is because the "brain-teasing ability" of the problem derives more from effects of this implied assumption, than from any "tendency to think of probabilities as inherent properties of objects." This can be seen by restating the problem as a variation of Bertrand's Box Paradox:

The probability that, in a family of two children, both have the same gender is 1/2. But suppose you learn that one child is in scouts - but you don’t know if it is Boy Scouts or Girl Scouts. If it is Boy Scouts, those who answer the actual "classic" problem as Eliezer answered his variation will say the probability of two boys is 1/3. They'd say the same thing, about two girls, if it is Girl Scouts. So it appears you don’t even need to know what branch of Scouting it is to change the answer to 1/3.

The fallacy in this logic is the same as the reason Eliezer reformulated the problem: the answer is 1/3 only if you ask a question equivalent to "is at least one a boy," not if you merely learn that fact. And the "brain-teaser ability" is because people sense, correctly, that they have no new information in the "classic" version of the problem which would allow the change from 1/2 to 1/3. But they are told, incorrectly, that the answer does change.