Bayes Slays Goodman's Grue

This is a first stab at solving Goodman's famous grue problem. I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven't looked at many proposed solutions to this paradox, besides some of the basic ones in "The New Problem of Induction". So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:

I wanted to see how I would fare against this still largely open, devastating, and classic problem, using only the arsenal provided to me by my minimal Bayesian training, and my regular LW reading.
I wanted the first LW article about the grue problem to attack it from a distinctly Lesswrongian aproach without the benefit of hindsight knowledge of the solutions of non-LW philosophy.
And lastly, because, even if this solution has been found before, if it is the right solution, it is to LW's credit that its students can solve the grue problem with only the use of LW skills and cognitive tools.

I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems.

Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW's methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green."

That is the inference that the grue problem threatens, courtesy of Nelson Goodman. The grue problem starts by defining "grue":

"An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue."

So you see that before time T, from the list of premises:

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green."
(we will call these the green premises)

it follows that:

"The first emerald ever observed was grue.
The second emerald ever observed was grue.
The third emerald ever observed was grue.
… etc.
The nth emerald ever observed was grue."
(we will call these the grue premises)

The proposer of the grue problem asks at this point: "So if the green premises are evidence that the next emerald will be green, why aren't the grue premises evidence for the next emerald being grue?" If an emerald is grue after time T, it is not green. Let's say that the green premises brings the probability of "A new unobserved emerald is green." to 99%. In the skeptic's hypothesis, by symmetry it should also bring the probability of "A new unobserved emerald is grue." to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.

This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can't justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. "A philosopher is someone who say's, 'I know it works in practice, I'm trying to see if it works in principle.'" - Dan Dennett

We may look at an analogous problem. Let's suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace's rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.

If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.

Now let us suppose a grue skeptic approaching this situation. He might make up two terms "reft" and "light". Defined as you would expect, but just in case:

"A ball is reft of the line iff it is right of it before time T when it lands, or if it is left of it after time T when it lands.
A ball is light of the line iff it is left of the line before time T when it lands, or if it is right of the line after time T when it first lands."

The skeptic would continue:

"Why should we treat the observation of several occurrences of Right, as evidence for 'The next ball will land on the right.' and not as evidence for 'The next ball will land reft of the line.'?"

Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.

But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:

"Why can't I just repeat that paragraph back to you and swap every occurrence of 'right' with 'reft' and 'left' with 'light', and vice versa? They are perfectly symmetrical in terms of their logical realtions to one another.
If we take 'reft' and 'light' as primitives, then we have to define 'right' and 'left' in terms of 'reft' and 'light' with the use of time intervals."

What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though "Reft" and "Right" have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn't rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.

What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: 'Rights:Lefts' as more trials were added, would proceed as expected, and the behavior of the ratio: 'Refts:Lights' would approach the reciprocal of the ratio: 'Rights:Lefts'. The only way for this to not happen, is for us to have been calling the right side of the table "reft", or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.

To this I know of no reply which the grue skeptic can make. If he/she say's the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.

This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.

In conclusion:

Every random variable has as a part of it, stored in its definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that "frequency" is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.

The reason that:

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green."

is a valid inference, but the grue equivalent isn't, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. "Grue" changes meanings from green to blue at time T, 'green'''s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe's source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn't write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.

Take this more as a brainstorm than as a final solution. It wasn't originally but it should have been. I'll write something more organized and consize after I think about the comments more, and make some graphics I've designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.

"An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue."

I don't see any reason such an object is likely to eat me when I'm walking around in the dark.

1antigonus14y

I don't see the relevance. Nelson's problem is about the general validity of inductive inference. Do you have a solution that doesn't depend upon inductive inferences?

0dlthomas14y

Was this meant to be a response to my other comment? If not, I think one of us is missing the other's joke, but have no idea which one.

1antigonus14y

No, it was supposed to be a response to its actual parent. I assumed that you were (somewhat but not entirely) humorously suggesting that the problem can somehow be solved by some appeal to natural selection or the like.

8dlthomas14y

Ah, no, I was simply making reference to the fact that the Zork games from way back when (the first was apparently late-1970's) would warn you, ... when you wandered into an unlit area.

I have seen something like

It is dark, and after the year 2000. If you proceed, you are likely to be eaten by a bleen.

in someone's email signature, and been delighted by it. (Though I worry that part of my delight derives from smugness about getting the joke.)

0[anonymous]14y

Nice!

1antigonus14y

Sorry, my mistake!

2dlthomas14y

No worries at all; I just didn't want to invest the time trying to figure out how it related to my serious comment if it turned out to be a joke I didn't get, or vice-versa.

The problem seems trivially easy.

Each observed emerald is evidence for both "the emerald is green" and "the emerald is grue." The first is preferred because it is vastly simpler (and picking any particular T, of course, is hugely privileging the hypothesis!) Evidence that is equally strong for two propositions doesn't change their relative likelihoods - so it starts out more likely that the emeralds are green than grue, and it ends more likely that the emeralds are green than grue, but both are quickly more likely than the proposition that emeralds are uniformly red.

What's weird about this?

To clarify what potato said:

If someone was brought up from birth with the words "grue" and "bleen," how would they say something was "green," in their language? Well, they'd have to say that something was grue before, say, 2050, but bleen after. Something that changes from grue to bleen is clearly more complicated to write down than something that just stays grue all the time.

And this is just hiding the complexity, not making it simpler. Complexity isn't a function of how many words you use, cf. "The lady down the street is a witch; she did it." If we are writing a program that emits actual features of reality, rather than socially defined labels, the simplest program for green is simpler than the simplest program for grue or bleen. That you can also produce more complex programs that give the same results (defining green in terms of bleen and grue is only one such example) is both trivially true and irrelevant.

3Ronny Fernandez10y

Wait, actually, I'd like to come back to this. What programming language are we using? If it's one where either grue is primitive, or one where there are primitives that make grue easier to write than green, then true seems simpler than green. How do we pick which language we use?

1Logos0114y

This is a trick of definition only, however. Changing the definition does not cause those things affected by the old definition to conform to the new one.

0nshepperd14y

Obviously they'd have to invent a new word, for an object that emits light that causes certain kind of qualia.

5Ronny Fernandez14y

What's weird, is that without a premise about what "green" and "blue" stand for semantically, the skeptic can just repeat that paragraph back to you, but switch all the occurrences of "grue" and "green", since "grue" and "green" are logically symmetrical.

3Vaniver14y

They can claim that the grue hypothesis is simpler than the green hypothesis?

6Ronny Fernandez14y

If we take "green" and "bleen" as primitives, then it is the definition of "green" which requires the time interval, not grue.

8Matt_Simpson14y

But if we go down to the level of photons, "green" and "blue" don't require a time interval in their definitions, yet "grue" and "bleen" do.

7Vaniver14y

What do you mean by "primitives"? It seems to me that the only sensible primitives are photons, which have particular energies. A perception system that has two sets of mappings from energies to names and a clock is necessarily less simple than a perception system that has one mapping from energies to names.

-1Ronny Fernandez14y

logical primitives, look up logical atomism, take it with a grain of salt.

3Vaniver14y

(from wikipedia) For "green" to be atomic, that suggests it cannot be broken down. Are you suggesting that "green" cannot be broken down to statements about energies of photons?

5Ronny Fernandez14y

No, I just mean that (or goodman just means that) if we assume the meanings of grue and bleen, then we have to define green in terms of grue and bleen and a time interval.

But where can I find grue and bleen? If knowledge of them were deleted from my memory, would I reform those concepts?

If you deleted my knowledge of color, but left me my eyes, I could still distinguish between photons of 2.75 eV and photons of 2.3 eV. That's a difference you can find outside you and that persists.

2Ronny Fernandez14y

right, thats the point, to solve the problem, you have to move into semantics.

-3[anonymous]14y

If you were a confused philosopher then yes, you probably would! It's definitely part of thought-space that I expect people to rush to fill once they are spending their time thinking of pointless stuff. Hopefully if it was you you would proceed straight to dissolving the question!

0[anonymous]14y

You mean grue and bleen? But... why would we be allowed to do that?

The skeptic would continue:

"Why should we treat the observation of several occurrences of Right, as evidence for 'The next ball will land on the right.' and not as evidence for 'The next ball will land reft of the line.'?"

It's evidence for both.

The solution to the grue problem is a combination of biting the bullet and Occam's razor

Bayes Slays Goodman's Grue

You don't need Bayes to solve 'grue' problems. Merely reductionism.

7Ronny Fernandez14y

Explain please

5wedrifid14y

"Goodman's Grue" just doesn't seem to be a problem at all. It can only seem like a problem if you forget that 'grue' is a name given to a somewhat complex sequence of events (relative to a thing just being a color) and start making mistakes when manipulating the symbol. There just isn't any reason to suppose there is any 'threat to the whole of science' in the first place.

4Ronny Fernandez14y

I agree, you are essentially saying that if you forget that green and blue are not simply syntactical binary predicates from first order logic – if you remember that they are semantic concepts, then it is clear that the grue problem is not at all a threat to science. But this is no trivial result, it means that there is a part to the application of Bayes, i.e., induction, which requires the acquisition of semantic concepts. If you fed evidence statements into a bayesian program, it would have to have an understanding of the semantic application of terms like green and grue. So you are right: reducing "green" and "grue" to their semantic/physical tests is the key in my proposed solution. Bayes can't be enough, obviously, since bayes is a syntactical and axiomatic system. I guessed what seemed bayesian to me about the whole thing was the analogy to bayse's table problem, which was the main intuition pump I used to solve the problem. I'll edit the article to reflect this. Thanks

1dlthomas14y

I think this is incorrect. The actual application of Bayes' theorem works the same way for each of your theories. What differs is your priors, and that difference sticks around until you have some evidence that's more likely for one theory than another. If your priors are screwy, then yes, you'll hold wrong beliefs until you're given evidence that lets you distinguish between the correct and incorrect beliefs.

0wedrifid14y

Ahh, that makes sense!

-3Logos0114y

The first thing that struck me was the inherently self-contradictory nature of the grue definition. For physical properties to be retroactively alterable seems to contradict fundamental principles of causality and matter. Am I simply not understanding the topic? Or is my intuitive-conceptualization too influenced by "timeless physics"? (The notion that all moments in time can be stated to 'exist'). The most you get with statements about "grue-ness" is that some objects which we observed to be "green" were in fact green but after a specific time (T) all changed to another color. This does not change the fact that they were green in the past. Science seems perfectly-well suited to handling things that change from one state to another. Radioactive decay, for example. If this is some extra-material transition that occurs... well, I just don't see how that's an actually available physical phenomenon. If you change the definition of the term, you are now discussing a new thing.

2Ronny Fernandez14y

you are missing the point. nothing changes color. and no definitions are changed, only meanings.

0Logos0114y

... but meanings are definitions. You can't change one without changing the other. The terms are synonyms. Time-based definitions just mean you use one definition before time T and another definition after time T. I am lost as to what the paradox here is.

2Ronny Fernandez14y

defenitions point to meanings, but the meaning of a term can only be found by looking at the cognitive machines that use the term, and in that specific contxt as well.

-2Logos0114y

... definition: 1. A statement of the exact meaning of a word, esp. in a dictionary. 2. An exact statement or description of the nature, scope, or meaning of something. meaning: 1. What is meant by a word, text, concept, or action Definitions are meanings. And meanings are definitions. A ⊃ B & B ⊃ A ⊨ A = B I remain lost as to where the paradox is supposed to be.

2Larks14y

The quotes you give suggest definitions are statements of meaning, not meanings.

-2Logos0114y

... I am not especially aware of there being a functional difference between a "statement of meaning" and the meaning itself when we're discussing what terms mean. Anything that is applicable to a definition is applicable to the meaning itself. Any adjustment of the meaning adjusts the defintion. Any adjustment of the definition adjusts the meaning. When you have a direct correlation with bi-directional causality, that is mutual identity.

2Ronny Fernandez14y

Have you read the cluster structure of thing space? Or the exponential concept space article? I recommend them.

0Logos0114y

Yes I have read them, and they are not relevant to the topic of mutual identity between 'definition' and 'meaning*'. *: s/magic/meaning/. Thanks, Swype!

0TheOtherDave14y

Would you similarly say that "mortal" is a term with a self-contradictory definition?

2Vaniver14y

wedrifid is pointing to the destination of the road that I'm going down in the comment branch over here, I think.

The Wikipedia page for this problem is here: http://en.wikipedia.org/wiki/Grue_and_bleen

Nitpick: Emeralds are a bad example. An "emerald" is just green beryl - a blue instance of the same mineral is just a blue piece of beryl. They exist, but they aren't emeralds.

Philosophy of Science textbooks mention that fact. Goodman chose a bad example and now we must all pay the price.

The original problem, as stated, is "valid": a mind with a "grue"-like prior would make the grue prediction, while normal human minds (with a "green"-like prior, mostly as a result of our evolution around colors) would make the "green" prediction. If we want a more neutral prior, we go with "minimum message length", and "what are colors". Grue and green are words in a dictionary, so they do not count for math -- only Turing machines do. It's simpler to write a Turing machine which puts out "l... (read more)

6JoshuaZ14y

This seems problematic because it implies that humans would be perfectly fine with accepting grue over blue if they didn't know about the nature of light.

5Manfred14y

Fortunately, the reason this helps is deeper than counting the number of hertz. When you want to determine the complexity of a term, you have to specify what language to use to write the term. The reason grue seems complicated to us evolved animals is because it has higher complexity in the language of our observations - the language of what neurons we feel light up when we look at the rock.

2JoshuaZ14y

So does that mean that if an entity had a neuronal structure that intuited grue and bleen it would be justified in treating the hypothesis that way? I'd be willing to bite that bullet I think.

6moshez14y

It means that that entity's evolved instincts would be out-of-whack with the MML, so if that entity also got to the point where it invented Turing machines, it would see the flaw in its reasoning. This is no different than realizing that Maxwell's equations, though they look more complicated than "anger" to a human, are actually simpler. Sometimes, the intuition is wrong. In the blue/grue case, human intuition happens to not be wrong, but a hypothetical entity is -- and both humans and the entity, after understanding math and computer science, would agree that humans are wrong about anger, and hypothetical entities are wrong about grue. Why is that a problem?

2thomblake14y

Right, they would, if for weird historical reasons they also thought of "grue" and "bleen" as reasonable linguistic primitives. So the human scientists would be surprised when the next emerald turned out to be bleen rather than grue, and they'd be able to observe that the shift happened at time T, and thus observe that green is a natural property. So this isn't really much of a problem.

0JoshuaZ14y

That's not completely satisfying in that one wants an induction scheme that assigns priors independent of linguistic accident. If one tries to make hypotheses simplicity depend on language then one quickly gets very complicated hypotheses being labeled as simple (e.g. "God").

0Larks14y

Well, it is if you use hz. However, I prefer hz'. hz' are just like hz until time T, but then different in the appropriate way after time T.

0Plasmon14y

If grue-people expect the green emeralds to spontaneously change into blue emeralds, why shouldn't they also expect a simple green-detecting turing machine to spontaneously change into a blue-detecting turing machine and vice versa? Yes, a Turing machine is a mathematical construction; it does not spontaneously change. But they, using "grue" as a basic concept, would expect everything that even remotely depends on colours to change at a certain time, including physical approximations to Turing machines.

"To this I know of no reply which the grue skeptic can make, if he/she say's the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts. "

He can simply define the term "line" to imply that it flips directions at time... (read more)

1wedrifid14y

"Oh yeah? Well I'm going to go hang out in the dark while doomed. You'll see!"

I recommend editing this post to have shorter paragraphs.

0Ronny Fernandez14y

This seems hard t me, but i agree. Do you have any pointers? (edit): I tried.

It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but there is no reason to believe that the rule will change if there has never been any change demonstrated in the position of the line before

There's your error! You think that the line is in the middle of the table through the entire experiment, but actually it's in the riddle of the table, where "riddle" means "in the middle of the table before time T and on the right side of the table afterward." All of our experience before time T has confirmed this.

1damang14y

He never said where it was, the problem was to find where the line was on the table.

3endoself14y

A better objection would be to ask whether the line chooved, where 'chooved' means 'stayed in the same place before time T and moved to the opposite location afterward'.

0antigonus14y

To be honest, I don't have a clear sense of what he's saying. However, from a snippet like this: it sounds like he's trying to draw some conclusion from an assumption ("the line doesn't/won't move") that ultimately rests on inductive support. Is that not the case? If so, how does that supposed support not fall victim to the new problem of induction?

0Ronny Fernandez14y

It doesn't seem to me like the problem is to justify induction, but to justify the induction on green over grue after time T.

0antigonus14y

The problem is to justify any inductively-obtained statement vulnerable to a grue-like variant. "X will remain in the same place" is one such statement. (Namely: Any evidence that X will remain in a given place is prima facie evidence that it will same in the same place', where place' refers to its current location at T and some other location afterward.) Grue is just an example.

4Ronny Fernandez14y

That the line will stay in the same place is not something I induce, it is a premise in the hypothetical. The line, or really the area of right of the line on the table, represents the actual frequency with which an emerald turns out green, out of all the cases where an emerald is observed, this is certainly a non-moving line, since there is one and only one answer to that question.

3antigonus14y

But that's question-begging. Let me put this another way. Define the function reft-distance(x) = x's distance to the rightmost edge of the table before time T, or the distance to the leftmost edge of the table after time T. (Then "x is reft of y" is definable as reft-distance(x) < reft-distance(y). Similarly for the function light-distance(x).) Assuming the line doesn't move is equivalent to assuming that the line's right-distance remains constant, but that its reft-distance changes after T. But that's not a fair assumption, the skeptic will insist: he prefers to assume the line doesn't "anti-move," which means its reft-distance remains constant but its right-distance changes. If we're simply stipulating that your assumption (that the line doesn't move) is correct and the skeptic's assumption (that the line doesn't anti-move) is incorrect, that's not very useful. We might as well just stipulate that emeralds remain green for all time or whatever.

0Ronny Fernandez14y

you forgot to adress this part: The line is constant because the area to its right represents the frequency with which a certain result is observed out of the number of trials. What the skeptic would have to be assuming is that the first 98 balls just happened to fall on the first 100th of the table by chance.

1antigonus14y

Assuming that the line is constant is analogous to assuming that emeralds' color won't change after T, correct? The skeptic will refuse to do either of these, preferring instead to assume that the line is anti-constant and that emeralds' anti-color won't change after T.

0Ronny Fernandez14y

No, that's a common misunderstanding. No emerald ever has to change color for the grue hypothesis to be true It is analogous to assuming that there is a definite frequency of green emeralds out of emeralds ever made.

0antigonus14y

Well, O.K. "The next observed emerald is green if before T and blue otherwise" doesn't entail any change of color. I suppose I should have said, "Analogous to assuming that the emeralds' color (as opposed to anti-color) distribution doesn't vary before and after T." I'm really not seeing that analogy. It seems more analogous to assuming there's a single, time-independent probability of observing a green emerald. (Holding the line fixed means there's a single, time-independent probability of landing right of the line.) Which is again an assumption the skeptic would deny, preferring instead the existence of a single, time-invariant probability of observing a grue emerald.

0Ronny Fernandez14y

Correct, but my solution rests around there being a semantic method for testing greenness. This is what breaks the symmetry which the skeptic was abusing. Because the test stays the same the meaning of green stays the same.

0antigonus14y

I don't think I really understand what this means. Could you give more detail?

0Ronny Fernandez14y

Read my conclusion over, I made some edits, if you still don't understand comment and i'll explain.

3antigonus14y

I'm not sure I've understood that very well, either. From what I can gather, it seems like you're arguing that 1. the meaning and physical tests for grue change over time, and consequently 2. grue is a more complicated property than green is, so we're justified in privileging the green hypothesis. If that's so, then I no longer see what role the reft/light example plays in your argument. You could've just started and finished with that.

0Ronny Fernandez14y

yea, the reft light argument is just what made it obvious to me, i though it might help my readers to.

1antigonus14y

All right. Regarding the idea that the meaning of "grue" changes over time - how do you take this to be the case? What do you mean by "meaning" here? Intension, extension or what?

3Ronny Fernandez14y

The common physical test, of using your eyes. The result from your eyes, and instruments which pick up the same sort of optical information of your eyes are the test for the test for application of green. This is how we learn green. This definition of green is semantic. Theses instrument's results are the primary meaning of green, how your brain decides whether to use the term. They are semantic because their usage must refer to the outside world

3alex_zag_al14y

It seems that the assumption in your hypothetical is of an unchanging process producing the random variable, about which we have partial knowledge. In the case of the ball, we know of the unmoving invisible line, the throws uniformly distributed over the table, and whatever mechanism it is that lets us know whether the ball has fallen to the left or the right of the line. However, we don't know enough to know exactly where the balls will land. In the case of the emeralds, we know enough about the emerald construction sites to know that they are grue-blind, and that they will stay grue-blind no matter how many emeralds they produce. In both cases, we know something of the mechanism behind the random variable, and that it will not change. Is that correct? You talk of a threat to the whole of science. How does your answer to this hypothetical respond to that threat? Do scientists ever have the knowledge assumed in your hypotheticals? How can scientists gain that knowledge in the first place without getting grued up, if they need it that knowledge to stay gruefree? It reminds me of Bugs Bunny pulling himself out of a magicians hat, by holding his ears.

2Ronny Fernandez14y

It is not, my assumption is of a definite frequency with which some result comes, out of trials. When you realize that the reason you don't determine the meaning of green using grue and bleen because there is a physical test which has higher authority in defining greenhood, the threat disolves.

0alex_zag_al14y

By “frequency” I suppose you mean the fraction of balls dropped on the right out of all ball drops, past and future? And with emeralds... I guess you mean the fraction of green emeralds out of all emeralds that hbe been or will be observed? I suppose the physical test in the ball problem is the ball landing on one side or the other of the line. In the emerald problem, the physical test is, what is it?

Solomonoff Induction is a formalized answer to problems of inference which also applies to the grue problem. It basically just says to weigh all possible explanations that fit your data by their complexity, but it is specified mathematically. Since grue is more complex than green, it weighs green much higher until reason to believe in grue shows up.

This is slightly off topic though, because the key is reducing the items you're talking about to what they are made up of so that you can properly encode them in order to compare the complexity. As said here, it... (read more)

0DanielLC14y

Solomonoff induction involves defining complexity. Green and blue aren't the most basic possible things, so you can't straight up stick in grue and bleen, but you still can come up with some language where grue and bleen are just as easy to define as blue and green are in whatever we'd be likely to use. All Solomonoff induction can really do is specify that the probabilities must add up to 100%.

Let's say that the green premises brings the probability of "A new unobserved emerald is green." to 99%. In the skeptic's hypothesis, by symmetry it should also bring the probability of "A new unobserved emerald is grue." to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add the

... (read more)

Note that this question was first put forward in 1955, so that it was a purely hypothetical question until 1 January 2000, when sapphires were discovered to be grue. (Before and after images of the same gem.)

The case makes an interesting parallel to the term "black swan", another famous philosophical thought experiment that received unexpected data.

0Normal_Anomaly14y

What changed? Are you telling a joke and those are pictures of different gems? Or is one in a different kind of light, or at a different angle? I don't get it. Edit: Or are you talking about what Alicorn was talking about farther down?

2Pavitra14y

I'm telling a joke.

One would suspect that the emerald-producing locations in our universe do not behave quite as cleanly as mathematically as you describe them. Instead, fuzziness and messiness creep in. Maybe such sites degrade over time, causing the emeralds to be slightly bluer. Maybe not.

Broad principles like "green earlier implies green now" are approximations that allow us to simplify the complexity of actual, extremely difficult Bayesian inference.

So... your Bayesian answer to the grue problem is to become a frequentist? You're doing it wrong.

As has been pointed out to you, "grue" is a description of a perfectly consistent prior on observations. The reason that "green" is preferable is its simplicity (in terms of basic predictions of physical events) and specificity (i.e. if T is unspecified, then the "green" hypothesis makes more specific predictions than "grue", while if it is specified, then the complexity of the number T comes into play).

I still think that these "devastating" problems have been solved in the first chapters of Jaynes' book.

"So if the green premises are evidence that the next emerald will be green, why aren't the grue premises evidence for the next emerald being grue?"

Because the first green emeralds are no evidence that the next will be green.

Let's translate the problem differently: I write a program that shows colored dot on the screen. The first n dot are green. What is the probability that the next dot will be green? If those are your only informa... (read more)

'I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem.':

If of relevance, note http://lesswrong.com/lw/q8/many_worlds_one_best_guess/ .

I think I came up with a solution:

to date, the vast majority of grue-like hypotheses (hypotheses that suggest new items that have always been grue before time t will continue to be found grue after time t) has failed. inductive logic, then, doesn't suggest that because emeralds have been grue to date, they will continue to be grue after time t. so far, after every time t, that's not been the case.

If it's unclear what I mean when I say grue-like hypotheses have failed, let me word it better: if time t was 1975, then the hypothesis that emeralds found after ... (read more)

Let’s forget, for a moment, that the position of the invisible line reflects the long-run frequency of “right” and “left” results. (you say that it reflects the proportion of green emeralds among existing emeralds, and results of “right” are analogous to results of “green”, so.)

In the ball problem, there is an invisible line on a table. More balls falling on the right implies that the area on the right side of the line is larger, and thus that future ball drops are more likely to fall on the right side.

Or maybe it’s evidence that they’ll fall on the reft s... (read more)

When we evaluate a term's complexity, we must use some language to evaluate it in. If we use standard english, green is simpler, while if we use grue english, grue is simpler. But is there a unique choice language?

Well, if we want to describe the world, the symmetry is broken by the fact that we can observe the world - our unique "language" is our observations of the natural world - which color is simpler when describing the neurons in our visual cortex, if you will.

When we describe reality in terms of the language of our

0torekp14y

That is the fundamental point, I agree. The overall description of our world includes our description of ourselves. It is here that the simplicity advantage of blue / green resides.

What does 'first observed' mean? It seems like the sort of thing that someone with a passing knowledge of quantum mechanics would make up, giving a privileged status to conscious observers.

Apart from this objection, I see both in the post and in some of the comments a confusion about the meaning of 'grue'. Take again the definition:

An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue.

Notice that no object ever changes colour. A green object, first observed before time T, is sti... (read more)

0alex_zag_al14y

I didn't mean to sound harsh at all btw; I wouldn't want to discourage anyone from making mistakes publicly on LW, which is a great place to have mistakes corrected.

0alex_zag_al14y

A successful prediction does not weaken a hypothesis. Also, your argument works just as well for G as for g; therefore, a green emerald is evidence against emeralds being green and against emeralds being grue. You made an arithmetic mistake. I figured you might want to try and find it yourself, and reasoned: if you do want to be told, you can just ask, but if I had assumed you wanted to be told and was wrong, I couldn't untell you. The assumption that P(O) is P(G) + P(g) is also incorrect; there is also the hypothesis that half the emeralds are green, for example. But either way you shouldn't end up with P(g|O) < P(g).

2wedrifid14y

It can weaken it relative to a competing hypothesis.

Actually it is unsolvable in Bayesian framework, and the only honest answer would be to admit it.

Bayesianism gives you consistency, but it doesn't anchor you to reality in any way. Assignment of probabilities that prefers green, and assignment of probabilities that prefers grue are both equally consistent.

Many people on lesswrong have been trying to handwave the problem away with Kolmogorov Complexity, but if you check real math, then you'll see that for any finite amount of data it solves exactly nothing - two different computational models have finite di... (read more)

4lessdazed14y

"If you're insane enough, and have unreasonable enough priors, even Bayesianism won't save you," is an argument against insanity and unreasonableness, not against Bayesianism.

0taw14y

Bayesianism only attempts to give you consistency, different grue-Bayesians would see green-Bayesian as "insane and unreasonable", just as green-Bayesians would see grue-Bayesians. They're both just as consistent, and nothing about their systems of beliefs is internally different. If you want to solve green/grue problem, Bayesianism won't hurt your attempts but neither will it help you in any way.

2Ronny Fernandez14y

Yea, what really helped about the bayesian analogy to the table, line, ball thing, was remembering that there was a physical bases for right, but that reft did not have a physical basis in the same way. The same goes for grue. I completely agree that if you want to understand the reason for the use of grue over the use of green in the conclusion, you need to use more than the syntactical definitions of the terms. Bayes is of course syntactical. You have to look at the semantic meanings of the terms, their test for applicability.

3taw14y

How does property "it is grue for until some point, then it becomes bleen" have more physical basis than property "it's grue all along"? What you're saying makes no sense (...to a grue-ist).

2Ronny Fernandez14y

If I wrote a program to find things that were green before time t, and things that were blue after time t, I owuld not save any time on the programing by making it just look for grue. Grue could not be coherently defined without committing to observers, but green could be defined (even if very complicatedly) without reference to observers, and thus we can be realists about it. I am a realist about green, and not about grue. THis makes sense since grue requires observers in its definition.

1taw14y

Meanwhile, in a parallel universe, grue-potato wrote this, and grue-taw is trying to make him see that green is just as consistent.

What do you mean by "primitives"?

It seems to me that the only sensible primitives are photons, which have particular energies. A perception system that has two sets of mappings from energies to names and a clock is necessarily less simple than a perception system that has one mapping from energies to names.

logical primitives, look up logical atomism, take it with a grain of salt.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

0

Bayes Slays Goodman's Grue

0

0