Does Evidence Have To Be Certain?

0 potato 30 March 2016 10:32AM

It seems like in order to go from P(H) to P(H|E) you have to become certain that E. Am I wrong about that? 

Say you have the following joint distribution:

P(H&E) = a
P(~H&E) = b
P(H&~E) = c

P(~H&~E) = d 

Where a,b,c, and d, are each larger than 0.

So P(H|E) = a/(a+b). It seems like what we're doing is going from assigning ~E some positive probability to assigning it a 0 probability. Is there another way to think about it? Is there something special about evidential statements that justifies changing their probabilities without having updated on something else? 

Computable Universal Prior

0 potato 11 December 2015 09:54AM

Suppose instead of using 2^-K(H) we just use 2^-length(H), does this do something obviously stupid? 

Here's what I'm proposing:

Take a programing language with two characters. Assign each program a prior of 2^-length(program). If the program outputs some string, then P(string | program) = 1, else it equals 0. I figure there must be some reason people don't do this already, or else there's a bunch of people doing it. I'd be real happy to find out about either. 

Clearly, it isn't a probability distribution, but we can still use it, no? 

 

 

Does Probability Theory Require Deductive or Merely Boolean Omniscience?

4 potato 03 August 2015 06:54AM

It is often said that a Bayesian agent has to assign probability 1 to all tautologies, and probability 0 to all contradictions. My question is... exactly what sort of tautologies are we talking about here? Does that include all mathematical theorems? Does that include assigning 1 to "Every bachelor is an unmarried male"?1 Perhaps the only tautologies that need to be assigned probability 1 are those that are Boolean theorems implied by atomic sentences that appear in the prior distribution, such as: "S or ~ S".

It seems that I do not need to assign probability 1 to Fermat's last conjecture in order to use probability theory when I play poker, or try to predict the color of the next ball to come from an urn. I must assign a probability of 1 to "The next ball will be white or it will not be white", but Fermat's last theorem seems to be quite irrelevant. Perhaps that's because these specialized puzzles do not require sufficiently general probability distributions; perhaps, when I try to build a general Bayesian reasoner, it will turn out that it must assign 1 to Fermat's last theorem. 

Imagine a (completely impractical, ideal, and esoteric) first order language, who's particular subjects were discrete point-like regions of space-time. There can be an arbitrarily large number of points, but it must be a finite number. This language also contains a long list of predicates like: is blue, is within the volume of a carbon atom, is within the volume of an elephant, etc. and generally any predicate type you'd like (including n place predicates).2 The atomic propositions in this language might look something like: "5, 0.487, -7098.6, 6000s is Blue" or "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant." The first of these propositions says that a certain point in space-time is blue; the second says that there is an elephant between two points at one second after the universe starts. Presumably, at least the denotational content of most english propositions could be expressed in such a language (I think, mathematical claims aside).

Now imagine that we collect all of the atomic propositions in this language, and assign a joint distribution over them. Maybe we choose max entropy, doesn't matter. Would doing so really require us to assign 1 to every mathematical theorem? I can see why it would require us to assign 1 to every tautological Boolean combination of atomic propositions [for instance: "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant OR ~((1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant)], but that would follow naturally as a consequence of filling out the joint distribution. Similarly, all the Boolean contradictions would be assigned zero, just as a consequence of filling out the joint distribution table with a set of reals that sum to 1. 

A similar argument could be made using intuitions from algorithmic probability theory. Imagine that we know that some data was produced by a distribution which is output by a program of length n in a binary programming language. We want to figure out which distribution it is. So, we assign each binary string a prior probability of 2^-n. If the language allows for comments, then simpler distributions will be output by more programs, and we will add the probability of all programs that print that distribution.3 Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would). The data might be all of your sensory inputs, and n might be Graham's number; still, there's no reason such a distribution would need to assign 1 to every mathematical theorem. 

Conclusion

A Bayesian agent does not require mathematical omniscience, or logical (if that means anything more than Boolean) omniscience, but merely Boolean omniscience. All that Boolean omniscience means is that for whatever atomic propositions appear in the language (e.g., the language that forms the set of propositions that constitute the domain of the probability function) of the agent, any tautological Boolean combination of those propositions must be assigned a probability of 1, and any contradictory Boolean combination of those propositions must be assigned 0. As far as I can tell, the whole notion that Bayesian agents must assign 1 to tautologies and 0 to contradictions comes from the fact that when you fill out a table of joint distributions (or follow the Komolgorov axioms in some other way) all of the Boolean theorems get a probability of 1. This does not imply that you need to assign 1 to Fermat's last theorem, even if you are reasoning probabilistically in a language that is very expressive.4 

Some Ways To Prove This Wrong:

Show that a really expressive semantic language, like the one I gave above, implies PA if you allow Boolean operations on its atomic propositions. Alternatively, you could show that Solomonoff induction can express PA theorems as propositions with probabilities, and that it assigns them 1. This is what I tried to do, but I failed on both occasions, which is why I wrote this. 


[1] There are also interesting questions about the role of tautologies that rely on synonymy in probability theory, and whether they must be assigned a probability of 1, but I decided to keep it to mathematics for the sake of this post. 

[2] I think this language is ridiculous, and openly admit it has next to no real world application. I stole the idea for the language from Carnap.

[3] This is a sloppily presented approximation to Solomonoff induction as n goes to infinity. 

[4] The argument above is not a mathematical proof, and I am not sure that it is airtight. I am posting this to the discussion board instead of a full-blown post because I want feedback and criticism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start being more careful when they think or write about logical omniscience. 

 

 

Meetup : Umbc meetup

1 potato 25 February 2015 04:00AM

Discussion article for the meetup : Umbc meetup

WHEN: 24 March 2015 04:56:00PM (-0500)

WHERE: 1000 Hilltop cir Catonsville

We will meet in the westhill community center, if you don't know where that is, feel free to call or text 2014066972.

Discussion article for the meetup : Umbc meetup

Humans Shouldn't make Themselves Smarter?

-2 potato 11 December 2011 12:00PM

Just thought you guys should know about this. Some work that argues that humans should not enhance their intelligence with technology, and that super intelligence probably never evolves.

LW Philosophers versus Analytics

38 potato 28 November 2011 03:40PM

By and large, I would bet money that the devoted, experienced, and properly sequenced LWer, is a better philosopher than the average current philosophy majors concentrating in the analytic tradition. I say this because I have regular philosophical conversations with both populations, and notice many philosophical desiderata lacking in my conversations with my classmates, from my school and others, that I find abundantly on this website. Those desiderata, which are roughly the twelve virtues. I find that though my classmates have healthy doses of curiosity, empiricism and even scholarship, they lack in, evenness, lightness, relinquishment, precision, perfectionism and true humility.

How could that be? LW has built a huge positivized reductionist metaphysics, and a Bayesian epistemology which can almost be read as a self improvement manual. These are unprecedented, and in some circles, outrageous truths. This is not to mention the original work that has been done in LW posts and comment trees on, meta-ethics, ethics, biases, mathematics, rationality, quantum physics, economics, self-hack, etc.  We have here a self-updating reliably transmittable well oiled machine, the likes of which philosophy has only so rarely seen.

What is even more impressive to me about LW as a philosophical movement, is that it seems to be nearly self contained when it comes to philosophy. I mean most experienced LWers probably really haven't read very much Kant, maybe some Wittgenstein or Quine; but LWers can still somehow solve the problems philosophers spend their lives solving by building disconnected and competing philosophical systems specifically designed for each task, by the use of roughly one rather generally successful epistemology and metaphysics, which can be called together LWism.

So if you agree that LW does better philosophy than analytic philosophers, let's put our money where our mouths are, as our own philosophy suggests we should. I will post a series of discussion posts each concentrating only on one currentish question from academic philosophy. In each post, I will cover the essentials of the problem, as well as provide external resources on the problem. Each post will also include a list of posts from the sequences which are recommended before participation. Each question will be solved with a consensus of less than 2 to 1 odds amongst professional philosophers, i.e., if more than 2/3s of professional philosophers agree, we won't bother. So as to not waste our time with small fish.

You guys, will then in turn cooperate in comment trees to find solutions and decide amongst them, then I'll compare the LW solutions to the solutions given by a random sampling of vaguely successful analytic philosophers, (I will use a university search for my sampling). I will compare the ratio of types of solutions of the two populations, and look for solutions that happen in the one population that don't occur in the other, then I'll post the results, hopefully the next week. (edit): This process of comparison will be the hardest part of this project for me, and if anyone with training or experience in statistics might want to help me with this, please let me know, and we can work on the comparison and the report thereof together. My prediction is that we will be able to quickly reach a high consensus on many issues that analytics have not internally resolved.

The series will be called: the "Enthusiastic Youngsters Formally Tackle Analytic Problems Test" or "the Eyftapt series" [pronounced: afe-taped]. Alternatively Eyftapt could stand for the "Eliezer Yudkowsky and Friends Train Amazing Philosophers Test." Besides shedding moderate light on our philosophical-competence/toolbox juxtaposed to analytic philosophical-competence/toolbox, I'd also like to learn what LW training offers that analytics are currently missing. So that we can focus in on that kind of training for our own benefit, and so that we can offer some advice to the analytics. That is, assuming my prediction that we'll do better is correct. This will not be as easy as comparing solutions, and I may need much more data than what I'll get out of this series, but it couldn't hurt to have a bunch of LWers doing difficult philosophy added to the available data.

What do you guys and gals think, might you be interested in something like this? Mind you it would be in discussion posts, since the main point is to discuss an issue.

(I know some of you cats don't like "philosophy", just call it "arguing about systems and elucidating messy language and thought in order to answer questions" instead. That is what I think we do better.)

BTW, if you have some problem you think we should work on, or or if you think we would be really good at solving some problem or really bad at it compared to non-LW philosophy, message me or comment below, and I'll give you credit for the suggestion. These are the topics I am already decided on: universals/nominalism, correspondence/deflation/coherency, grue/induction, science realism/constructivism, what is math?, scientific underdetermination, a priori knowledge?, radical translation, analytic synthetic division, proper name/description, deduction induction division, modality and possible worlds, what does it mean for a grammatical sentence to be meaningless and how do you tell?, meta-philosophy, i.e., questions about philosophy, and finally, personal identity, roughly to be posted in that order.

 


(edited after first posting, I just realized it may be worth mention that):

I was not happy about coming to this view. I have always thought of myself as an aspiring analytic philosopher, and even got attached to the ascetics of analytic philosophy. I thought of analytic philosophy as the new science of philosophy that finally got it right. It bothered me to no end that I had been lead to have more faith in the philosophical maturity/competence of a bunch of amateurs on a blog, than in the experts and students of the field that I planned to spend the rest of my life on. I have committed myself to the methods of academic-analytic philosophy publicly in speeches and to my closest friends, colleagues, and family; to turn around in under a year and say that that was all naive enthusiasm, and that there's this blog of college kids that do it better, made me look very stupid in more than one eye, I cared and care about. More than once, I have dissolved a question in my philosophy and cog-sci classes into an obvious cognitive error, explained why we are built to make this error, and left the class with little to do. Professors have praised me for this, and had even started approaching me outside of class to ask me about where I got my analysis from; their faces often came to a sincere awe when I tell them: "I made it up myself, but all the methods I used are neatly organized, generalized, and exemplified in this text called the 'sequences' on this blog of youngsters called 'Less Wrong'. It's only a few hundred pages, kinda reads like G.E.B."

One day, a few months back, one of my professors who I am on a particularly friendly basis with asked me: "Every time we are in class and there is a question, you use this blog of yours, and it seems it gives you an answer for everything, so why are you still studying the analytics, instead of just studying your blog?" I think he meant to ask this question sardonically, but that is not how I took it. I took it as a serious question about how to optimize my time if my goal is to do good philosophy. Not having a good answer to this question, and craving one, probably more than anything, is what prompted me to think of doing this series.

I may be wrong, and it may be that LW has just as hard of a time forming consensus on the issues that analytics have a hard time with, though I doubt it. But I am much more confident, that for some reason, even though I have had very good training, have a very high GPA, have read every classic philosophy text I could get my hands on, and had been reading several modern philosophy journals, all before I even knew about LW, LW has done more for my philosophical maturity, competence, and persuasiveness, than the entirety of the rest of my training, and I wouldn't doubt that many others have had similar thoughts.

 

 

(Subjective Bayesianism vs. Frequentism) VS. Formalism

27 potato 26 November 2011 05:05AM

One of the core aims of the philosophy of probability is to explain the relationship between frequency and probability. The frequentist proposes identity as the relationship. This use of identity is highly dubious. We know how to check for identity between numbers, or even how to check for the weaker copula relation between particular objects; but how would we test the identity of frequency and probability? It is not immediately obvious that there is some simple value out there which is modeled by probability, like position and mass are values that are modeled by Newton's Principia. You can actually check if density * volume = mass, by taking separate measurements of mass, density and volume, but what would you measure to check a frequency against a probability?

There are certain appeals to frequentest philosophy: we would like to say that if a bag has 100 balls in it, only 1 of which is white, then the probability of drawing the white ball is 1/100, and that if we take a non-white ball out, the probability of drawing the white ball is now 1/99. Frequentism would make the philosophical justification of that inference trivial. But of course, anything a frequentist can do, a Bayesian can do (better). I mean that literally: it's the stronger magic.

A Subjective Bayesian, more or less, says that the reason frequencies are related to probabilities is because when you learn a frequency you thereby learn a fact about the world, and one must update one's degrees of belief on every available fact. The subjective Bayesian actually uses the copula in another strange way:

Probability is subjective degree of belief.

and subjective Bayesians also claim:

Probabilities are not in the world, they are in your mind.

These two statements are brilliantly championed in Probability is Subjectively Objective. But ultimately, the formalism which I would like to suggest denies both of these statements. Formalists do not ontologically commit themselves to probabilities, just as they do not say that numbers exist; hence we don't allocate probabilities in the mind or anywhere else; we only commit ourselves to number theory, and probability theory. Mathematical theories are simply repeatable processes which construct certain sequences of squiggles called "theorems", by changing the squiggles of other theorems, according to certain rules called "inferences". Inferences always take as input certain sequences of squiggles called premises, and output a sequence of squiggles called the conclusion. The only thing an inference ever does is add squiggles to a theorem, take away squiggles from a theorem, or both. It turns out that these squiggle sequences mixed with inferences can talk about almost anything, certainly any computable thing. The formalist does not need to ontologically commit to numbers to assert that "There is a prime greater than 10000.", even though "There is x such that" is a flat assertion of existence; because for the formalist "There is a prime greater than 10000." simply means that number theory contains a theorem which is interpreted as "there is a prime greater than 10000." When you say a mathematical fact in English, you are interpreting a theorem from a formal theory. If under your suggested interpretation, all of the theorems of the theory are true, then whatever system/mechanism your interpretation of the theory talks about, is said to be modeled by the theory.

So, what is the relation between frequency and probability proposed by formalism? Theorems of probability, may be interpreted as true statements about frequencies, when you assign certain squiggles certain words and claim the resulting natural language sentence. Or for short we can say: "Probability theory models frequency." It is trivial to show that Komolgorov models frequency, since it also models fractions; it is an algebra after all. More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; dutch-book arguments do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here. If Bayesian probability theory really does model rational belief, which many believe it does, then that is likely the most interesting thing we are ever going to be able to model with probability. But probability theory also models spatial measurement? Why not add the position that probability is volume to the debating lines of the philosophy of probability?

Why are frequentism's and subjective Bayesianism's misuses of the copula not as obvious as volumeism's? This is because what the Bayesian and frequentest are really arguing about is statistical methodology, they've just disguised the argument as an argument about what probability is. Your interpretation of probability theory will determine how you model uncertainty, and hence determine your statistical methodology. Volumeism cannot handle uncertainty in any obvious way; however, the Bayesian and frequentest interpretations of probability theory, imply two radically different ways of handling uncertainty.

The easiest way to understand the philosophical dispute between the frequentist and the subjective Bayesian is to look at the classic biased coin:

A subjective Bayesian and a frequentist are at a bar, and the bartender (being rather bored) tells the two that he has a biased coin, and asks them "what is the probability that the coin will come up heads on the first flip?" The frequentist says that for the coin to be biased means for it not have a 50% chance of coming up heads, so all we know is that it has a probability that is not equal 50%. The Bayesain says that that any evidence I have for it coming up heads, is also evidence for it coming up tails, since I know nothing about one outcome, that doesn't hold for its negation, and the only value which represents that symmetry is 50%.

I ask you. What is the difference between these two, and the poor souls engaged in endless debate over realism about sound in the beginning of Making Beliefs Pay Rent?

If a tree falls in a forest and no one hears it, does it make a sound? One says, "Yes it does, for it makes vibrations in the air." Another says, "No it does not, for there is no auditory processing in any brain."

One is being asked: "Are there pressure waves in the air if we aren't around?" the other is being asked: "Are there auditory experiences if we are not around?" The problem is that "sound" is being used to stand for both "auditory experience" and "pressure waves through air". They are both giving the right answers to these respective questions. But they are failing to Replace the Symbol with the Substance and they're using one word with two different meanings in different places. In the exact same way, "probability" is being used to stand for both "frequency of occurrence" and "rational degree of belief" in the dispute between the Bayesian and the frequentist. The correct answer to the question: "If the coin is flipped an infinite amount of times, how frequently would we expect to see a coin that landed on heads?" is "All we know, is that it wouldn't be 50%." because that is what it means for the coin to be biased. The correct answer to the question: "What is the optimal degree of belief that we should assign to the first trial being heads?" is "Precisely 50%.", because of the symmetrical evidential support the results get from our background information. How we should actually model the situation as statisticians depends on our goal. But remember that Bayesianism is the stronger magic, and the only contender for perfection in the competition.

For us formalists, probabilities are not anywhere. We do not even believe in probability technically, we only believe in probability theory. The only coherent uses of "probability" in natural language are purely syncategorematic. We should be very careful when we colloquially use "probability" as a noun or verb, and be very careful and clear about what we mean by this word play. Probability theory models many things, including degree of belief, and frequency. Whatever we may learn about rationality, frequency, measure, or any of the other mechanisms that probability models, through the interpretation of probability theorems, we learn because probability theory is isomorphic to those mechanisms. When you use the copula like the frequentist or the subjective Bayesian, it makes it hard to notice that probability theory modeling both frequency and degree of belief, is not a contradiction. If we use "is" instead of "model", it is clear that frequency is not degree of belief, so if probability is belief, then it is not frequency.  Though frequency is not degree of belief, frequency does model degree of belief, so if probability models frequency, it must also model degree of belief.

Bayes Slays Goodman's Grue

0 potato 17 November 2011 10:45AM

This is a first stab at solving Goodman's famous grue problem. I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven't looked at many proposed solutions to this paradox, besides some of the basic ones in "The New Problem of Induction". So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:

  1. I wanted to see how I would fare against this still largely open, devastating, and classic problem, using only the arsenal provided to me by my minimal Bayesian training, and my regular LW reading.
  2. I wanted the first LW article about the grue problem to attack it from a distinctly Lesswrongian aproach without the benefit of hindsight knowledge of the solutions of non-LW philosophy. 
  3. And lastly, because, even if this solution has been found before, if it is the right solution, it is to LW's credit that its students can solve the grue problem with only the use of LW skills and cognitive tools.

I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems. 

Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW's methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.

 


 

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green."

That is the inference that the grue problem threatens, courtesy of Nelson Goodman.  The grue problem starts by defining "grue":

"An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue."

So you see that before time T, from the list of premises:

"The first emerald ever observed was green.
 The second emerald ever observed was green.
 The third emerald ever observed was green.
 … etc.
 The nth emerald ever observed was green."
 (we will call these the green premises)

it follows that:

"The first emerald ever observed was grue.
The second emerald ever observed was grue.
The third emerald ever observed was grue.
… etc.
The nth emerald ever observed was grue."
(we will call these the grue premises)

The proposer of the grue problem asks at this point: "So if the green premises are evidence that the next emerald will be green, why aren't the grue premises evidence for the next emerald being grue?" If an emerald is grue after time T, it is not green. Let's say that the green premises brings the probability of "A new unobserved emerald is green." to 99%. In the skeptic's hypothesis, by symmetry it should also bring the probability of "A new unobserved emerald is grue." to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.

This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can't justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. "A philosopher is someone who say's, 'I know it works in practice, I'm  trying to see if it works in principle.'" - Dan Dennett

We may look at an analogous problem. Let's suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace's rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.

If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.

Now let us suppose a grue skeptic approaching this situation. He might make up two terms "reft" and "light". Defined as you would expect, but just in case:

"A ball is reft of the line iff it is right of it before time T when it lands, or if it is left of it after time T when it lands.
 A ball is light of the line iff it is left of the line before time T when it lands, or if it is right of the line after time T when it first lands."

The skeptic would continue:

"Why should we treat the observation of several occurrences of Right, as evidence for 'The next ball will land on the right.' and not as evidence for 'The next ball will land reft of the line.'?"

Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for  ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.

But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:

"Why can't I just repeat that paragraph back to you and swap every occurrence of 'right' with 'reft' and 'left' with 'light', and vice versa? They are perfectly symmetrical in terms of their logical realtions to one another.
If we take 'reft' and 'light' as primitives, then we have to define 'right' and 'left' in terms of 'reft' and 'light' with the use of time intervals."

What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though "Reft" and "Right" have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn't rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.

What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: 'Rights:Lefts' as more trials were added, would proceed as expected, and the behavior of the ratio: 'Refts:Lights' would approach the reciprocal of the ratio: 'Rights:Lefts'. The only way for this to not happen, is for us to have been calling the right side of the table "reft", or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.

To this I know of no reply which the grue skeptic can make. If he/she say's the paragraph back to me with the proper words swapped, it is not true, because  In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.

This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.

 


 

In conclusion:

Every random variable has as a part of it, stored in its definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that "frequency" is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.

The reason that:

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green."

is a valid inference, but the grue equivalent isn't, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. "Grue" changes meanings from green to blue at time T, 'green'''s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe's source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn't write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.


Take this more as a brainstorm than as a final solution. It wasn't originally but it should have been. I'll write something more organized and consize after I think about the comments more, and make some graphics I've designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.

Naming the Highest Virtue of Epistemic Rationality

-3 potato 24 October 2011 11:00PM

Edit: Looking back at this a few years later. It is pretty embarrassing, but I'm going to leave it up. 

Why don't we start treating the log2 of the probability — conditional on every available piece of information — you assign to the great conjunction, as the best measure of your epistemic success? Let's call:  log_2(P(the great conjunction|your available information)), your "Bayesian competence". It is a deductive fact that no other proper scoring rule could possibly give: Score(P(A|B)) + Score(P(B)) = Score(P(A&B)), and obviously, you should get the same score for assigning P(A|B) to A, after observing B, and assigning P(B) to B a priori, as you would get for assigning P(A&B) to A&B a priori. The great conjunction is the conjunction of all true statements expressible in your idiolect. Your available information may be treated as the ordered set of your retained stimulus.

If this doesn't make sense, or you aren't familiar with these ideas, checkout Technical Explanation after checking out Intuitive Explanation.

It is standard LW doctrine that we should not name the highest value of rationality, and it is often defended quite brilliantly:

You may try to name the highest principle with names such as “the map that reflects the territory” or “experience of success and failure” or “Bayesian decision theory”. But perhaps you describe incorrectly the nameless virtue. How will you discover your mistake? Not by comparing your description to itself, but by comparing it to that which you did not name.

and of course also:

How can you improve your conception of rationality? Not by saying to yourself, “It is my duty to be rational.” By this you only enshrine your mistaken conception. Perhaps your conception of rationality is that it is rational to believe the words of the Great Teacher, and the Great Teacher says, “The sky is green,” and you look up at the sky and see blue. If you think: “It may look like the sky is blue, but rationality is to believe the words of the Great Teacher,” you lose a chance to discover your mistake. 

These quotes are from the end of Twelve Virtues

Should we really be wondering if there's a virtue higher than bayesian competence? Is there really a probability worth worrying about that the description of bayesian competence above is misunderstood? Is the description not simple enough to be mathematical? What mistake might I discover in my understanding of bayesian competence by comparing it to that which I did not name, after I've already given a proof that bayesian competence is proper, and that the restrictions: score(P(B)*P(A|B)) = score(P(B)) + score(P(A|B)), and: must be a proper scoring rule, uniquely specify Logb?

I really want answers to these questions. I am still undecided about them; and change my mind about them far too often.

Of course, your bayesian competence is ridiculously difficult to compute. But I am not proposing the measure for practical reasons. I am proposing the measure to demonstrate that degree of rationality is an objective quantity that you could compute given the source code to the universe, even though there are likely no variables in the source that ever take on this value. This may be of little to no value to the most obsessively pragmatic practitioners of rationality. But it would be a very interesting result to philosophers of science and rationality.

 

 


 

Updated to better express view of author, and take feedback into account. Apologies to any commenter who's comment may have been nullified.

The comment below:

The general reason Eliezer advocates not naming the highest virtue (as I understand it) is that there may be some type of problem for which bayesian updating (and the scoring rule referred to) yields the wrong answer. This idea sounds rather improbable to me, but there is a non-negligible probability that bayes will yield a wrong answer on some question. Not naming the virtue is supposed to be a reminder that if bayes ever gives the wrong answer, we go with the right answer, not bayes.

has changed my mind about the openness of the questions I asked.

Can't Pursue the Art for its Own Sake? Really?

0 potato 20 September 2011 02:09AM

Can anyone tell me why it is that if I use my rationality exclusively to improve my conception of rationality I fall into an infinite recursion? EY say's this in The Twelve Virtues and in Something to Protect, but I don't know what his argument is. He goes as far as to say that you must subordinate rationality to a higher value.

I understand that by committing yourself to your rationality you lose out on the chance to notice if your conception of rationality is wrong. But what if I use the reliability of win that a given conception of rationality offers me as the only guide to how correct that conception is. I can test reliability of win by taking a bunch of different problems with known answers that I don't know, solving them using my current conception of rationality and solving them using the alternative conception of rationality I want to test, then checking the answers I arrived at with each conception against the right answers. I could also take a bunch of unsolved problems and attack them from both conceptions of rationality, and see which one I get the most solutions with. If I solve a set of problems with one, that isn't a subset of the set of problems I solved with the other, then I'll see if I can somehow take the union of the two conceptions. And, though I'm still not sure enough about this method to use it, I suppose I could also figure out the relative reliability of two conceptions by making general arguments about the structures of those conceptions; if one conception is "do that which the great teacher says" and the other is "do that which has maximal expected utility", I would probably not have to solve problems using both conceptions to see which one most reliably leads to win.

And what if my goal is to become as epistimically rational as possible. Then I would just be looking for the conception of rationality that leads to truth most reliably. Testing truth by predictive power.

And if being rational for its own sake just doesn't seem like its valuable enough to motivate me to do all the hard work it requires, let's assume that I really really care about picking the best conception of rationality I know of, much more than I care about my own life.

It seems to me that if this is how I do rationality for its own sake — always looking for the conception of goal-oriented rationality which leads to win most reliably, and the conception of epistemic rationality which leads to truth most reliably — then I'll always switch to any conception I find that is less mistaken than mine, and stick with mine when presented with a conception that is more mistaken, provided I am careful enough about my testing. And if that means I practice rationality for its own sake, so what? I practice music for its own sake too. I don't think that's the only or best reason to pursue rationality, certainly some other good and common reasons are if you wanna figure something out or win. And when I do eventually find something I wanna win or figure out that no one else has (no shortage of those), if I can't, I'll know that my current conception isn't good enough. I'll be able to correct my conception by winning or figuring it out, and then thinking about what was missing from my view of rationality that wouldn't let me do that before. But that wouldn't mean that I care more about winning or figuring some special fact than I do about being as rational as possible; it would just mean that I consider my ability to solve problems a judge of my rationality.

I don't understand what I loose out on if I pursue the Art for its own sake in the way described above. If you do know of something I would loose out on, or if you know Yudkowsky's original argument showing the infinite recursion when you motivate yourself to be rational by your love of rationality, then please comment and help me out.  Thanks ahead of time.

View more: Next