"An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue."
I don't see any reason such an object is likely to eat me when I'm walking around in the dark.
I have seen something like
It is dark, and after the year 2000. If you proceed, you are likely to be eaten by a bleen.
in someone's email signature, and been delighted by it. (Though I worry that part of my delight derives from smugness about getting the joke.)
The problem seems trivially easy.
Each observed emerald is evidence for both "the emerald is green" and "the emerald is grue." The first is preferred because it is vastly simpler (and picking any particular T, of course, is hugely privileging the hypothesis!) Evidence that is equally strong for two propositions doesn't change their relative likelihoods - so it starts out more likely that the emeralds are green than grue, and it ends more likely that the emeralds are green than grue, but both are quickly more likely than the proposition that emeralds are uniformly red.
What's weird about this?
To clarify what potato said:
If someone was brought up from birth with the words "grue" and "bleen," how would they say something was "green," in their language? Well, they'd have to say that something was grue before, say, 2050, but bleen after. Something that changes from grue to bleen is clearly more complicated to write down than something that just stays grue all the time.
And this is just hiding the complexity, not making it simpler. Complexity isn't a function of how many words you use, cf. "The lady down the street is a witch; she did it." If we are writing a program that emits actual features of reality, rather than socially defined labels, the simplest program for green is simpler than the simplest program for grue or bleen. That you can also produce more complex programs that give the same results (defining green in terms of bleen and grue is only one such example) is both trivially true and irrelevant.
But where can I find grue and bleen? If knowledge of them were deleted from my memory, would I reform those concepts?
If you deleted my knowledge of color, but left me my eyes, I could still distinguish between photons of 2.75 eV and photons of 2.3 eV. That's a difference you can find outside you and that persists.
The skeptic would continue:
"Why should we treat the observation of several occurrences of Right, as evidence for 'The next ball will land on the right.' and not as evidence for 'The next ball will land reft of the line.'?"
It's evidence for both.
The solution to the grue problem is a combination of biting the bullet and Occam's razor
Bayes Slays Goodman's Grue
You don't need Bayes to solve 'grue' problems. Merely reductionism.
Nitpick: Emeralds are a bad example. An "emerald" is just green beryl - a blue instance of the same mineral is just a blue piece of beryl. They exist, but they aren't emeralds.
Philosophy of Science textbooks mention that fact. Goodman chose a bad example and now we must all pay the price.
The original problem, as stated, is "valid": a mind with a "grue"-like prior would make the grue prediction, while normal human minds (with a "green"-like prior, mostly as a result of our evolution around colors) would make the "green" prediction. If we want a more neutral prior, we go with "minimum message length", and "what are colors". Grue and green are words in a dictionary, so they do not count for math -- only Turing machines do. It's simpler to write a Turing machine which puts out "l...
"To this I know of no reply which the grue skeptic can make, if he/she say's the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts. "
He can simply define the term "line" to imply that it flips directions at time...
It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but there is no reason to believe that the rule will change if there has never been any change demonstrated in the position of the line before
There's your error! You think that the line is in the middle of the table through the entire experiment, but actually it's in the riddle of the table, where "riddle" means "in the middle of the table before time T and on the right side of the table afterward." All of our experience before time T has confirmed this.
Solomonoff Induction is a formalized answer to problems of inference which also applies to the grue problem. It basically just says to weigh all possible explanations that fit your data by their complexity, but it is specified mathematically. Since grue is more complex than green, it weighs green much higher until reason to believe in grue shows up.
This is slightly off topic though, because the key is reducing the items you're talking about to what they are made up of so that you can properly encode them in order to compare the complexity. As said here, it...
...Let's say that the green premises brings the probability of "A new unobserved emerald is green." to 99%. In the skeptic's hypothesis, by symmetry it should also bring the probability of "A new unobserved emerald is grue." to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add the
Note that this question was first put forward in 1955, so that it was a purely hypothetical question until 1 January 2000, when sapphires were discovered to be grue. (Before and after images of the same gem.)
The case makes an interesting parallel to the term "black swan", another famous philosophical thought experiment that received unexpected data.
One would suspect that the emerald-producing locations in our universe do not behave quite as cleanly as mathematically as you describe them. Instead, fuzziness and messiness creep in. Maybe such sites degrade over time, causing the emeralds to be slightly bluer. Maybe not.
Broad principles like "green earlier implies green now" are approximations that allow us to simplify the complexity of actual, extremely difficult Bayesian inference.
So... your Bayesian answer to the grue problem is to become a frequentist? You're doing it wrong.
As has been pointed out to you, "grue" is a description of a perfectly consistent prior on observations. The reason that "green" is preferable is its simplicity (in terms of basic predictions of physical events) and specificity (i.e. if T is unspecified, then the "green" hypothesis makes more specific predictions than "grue", while if it is specified, then the complexity of the number T comes into play).
I still think that these "devastating" problems have been solved in the first chapters of Jaynes' book.
"So if the green premises are evidence that the next emerald will be green, why aren't the grue premises evidence for the next emerald being grue?"
Because the first green emeralds are no evidence that the next will be green.
Let's translate the problem differently: I write a program that shows colored dot on the screen. The first n dot are green. What is the probability that the next dot will be green? If those are your only informa...
'I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem.':
If of relevance, note http://lesswrong.com/lw/q8/many_worlds_one_best_guess/ .
I think I came up with a solution:
to date, the vast majority of grue-like hypotheses (hypotheses that suggest new items that have always been grue before time t will continue to be found grue after time t) has failed. inductive logic, then, doesn't suggest that because emeralds have been grue to date, they will continue to be grue after time t. so far, after every time t, that's not been the case.
If it's unclear what I mean when I say grue-like hypotheses have failed, let me word it better: if time t was 1975, then the hypothesis that emeralds found after ...
Let’s forget, for a moment, that the position of the invisible line reflects the long-run frequency of “right” and “left” results. (you say that it reflects the proportion of green emeralds among existing emeralds, and results of “right” are analogous to results of “green”, so.)
In the ball problem, there is an invisible line on a table. More balls falling on the right implies that the area on the right side of the line is larger, and thus that future ball drops are more likely to fall on the right side.
Or maybe it’s evidence that they’ll fall on the reft s...
When we evaluate a term's complexity, we must use some language to evaluate it in. If we use standard english, green is simpler, while if we use grue english, grue is simpler. But is there a unique choice language?
Well, if we want to describe the world, the symmetry is broken by the fact that we can observe the world - our unique "language" is our observations of the natural world - which color is simpler when describing the neurons in our visual cortex, if you will.
When we describe reality in terms of the language of our
What does 'first observed' mean? It seems like the sort of thing that someone with a passing knowledge of quantum mechanics would make up, giving a privileged status to conscious observers.
Apart from this objection, I see both in the post and in some of the comments a confusion about the meaning of 'grue'. Take again the definition:
An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue.
Notice that no object ever changes colour. A green object, first observed before time T, is sti...
Actually it is unsolvable in Bayesian framework, and the only honest answer would be to admit it.
Bayesianism gives you consistency, but it doesn't anchor you to reality in any way. Assignment of probabilities that prefers green, and assignment of probabilities that prefers grue are both equally consistent.
Many people on lesswrong have been trying to handwave the problem away with Kolmogorov Complexity, but if you check real math, then you'll see that for any finite amount of data it solves exactly nothing - two different computational models have finite di...
This is a first stab at solving Goodman's famous grue problem. I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven't looked at many proposed solutions to this paradox, besides some of the basic ones in "The New Problem of Induction". So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:
I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems.
Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW's methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.
That is the inference that the grue problem threatens, courtesy of Nelson Goodman. The grue problem starts by defining "grue":
So you see that before time T, from the list of premises:
it follows that:
The proposer of the grue problem asks at this point: "So if the green premises are evidence that the next emerald will be green, why aren't the grue premises evidence for the next emerald being grue?" If an emerald is grue after time T, it is not green. Let's say that the green premises brings the probability of "A new unobserved emerald is green." to 99%. In the skeptic's hypothesis, by symmetry it should also bring the probability of "A new unobserved emerald is grue." to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.
This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can't justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. "A philosopher is someone who say's, 'I know it works in practice, I'm trying to see if it works in principle.'" - Dan Dennett
We may look at an analogous problem. Let's suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace's rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.
If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.
Now let us suppose a grue skeptic approaching this situation. He might make up two terms "reft" and "light". Defined as you would expect, but just in case:
The skeptic would continue:
Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.
But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:
What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though "Reft" and "Right" have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn't rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.
What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: 'Rights:Lefts' as more trials were added, would proceed as expected, and the behavior of the ratio: 'Refts:Lights' would approach the reciprocal of the ratio: 'Rights:Lefts'. The only way for this to not happen, is for us to have been calling the right side of the table "reft", or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.
To this I know of no reply which the grue skeptic can make. If he/she say's the paragraph back to me with the proper words swapped, it is not true, because In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.
This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.
In conclusion:
Every random variable has as a part of it, stored in its definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that "frequency" is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.
The reason that:
is a valid inference, but the grue equivalent isn't, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. "Grue" changes meanings from green to blue at time T, 'green'''s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe's source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn't write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.
Take this more as a brainstorm than as a final solution. It wasn't originally but it should have been. I'll write something more organized and consize after I think about the comments more, and make some graphics I've designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.