Does Probability Theory Require Deductive or Merely Boolean Omniscience?
It is often said that a Bayesian agent has to assign probability 1 to all tautologies, and probability 0 to all contradictions. My question is... exactly what sort of tautologies are we talking about here? Does that include all mathematical theorems? Does that include assigning 1 to "Every bachelor is an unmarried male"?1 Perhaps the only tautologies that need to be assigned probability 1 are those that are Boolean theorems implied by atomic sentences that appear in the prior distribution, such as: "S or ~ S".
It seems that I do not need to assign probability 1 to Fermat's last conjecture in order to use probability theory when I play poker, or try to predict the color of the next ball to come from an urn. I must assign a probability of 1 to "The next ball will be white or it will not be white", but Fermat's last theorem seems to be quite irrelevant. Perhaps that's because these specialized puzzles do not require sufficiently general probability distributions; perhaps, when I try to build a general Bayesian reasoner, it will turn out that it must assign 1 to Fermat's last theorem.
Imagine a (completely impractical, ideal, and esoteric) first order language, who's particular subjects were discrete point-like regions of space-time. There can be an arbitrarily large number of points, but it must be a finite number. This language also contains a long list of predicates like: is blue, is within the volume of a carbon atom, is within the volume of an elephant, etc. and generally any predicate type you'd like (including n place predicates).2 The atomic propositions in this language might look something like: "5, 0.487, -7098.6, 6000s is Blue" or "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant." The first of these propositions says that a certain point in space-time is blue; the second says that there is an elephant between two points at one second after the universe starts. Presumably, at least the denotational content of most english propositions could be expressed in such a language (I think, mathematical claims aside).
Now imagine that we collect all of the atomic propositions in this language, and assign a joint distribution over them. Maybe we choose max entropy, doesn't matter. Would doing so really require us to assign 1 to every mathematical theorem? I can see why it would require us to assign 1 to every tautological Boolean combination of atomic propositions [for instance: "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant OR ~((1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant)], but that would follow naturally as a consequence of filling out the joint distribution. Similarly, all the Boolean contradictions would be assigned zero, just as a consequence of filling out the joint distribution table with a set of reals that sum to 1.
A similar argument could be made using intuitions from algorithmic probability theory. Imagine that we know that some data was produced by a distribution which is output by a program of length n in a binary programming language. We want to figure out which distribution it is. So, we assign each binary string a prior probability of 2^-n. If the language allows for comments, then simpler distributions will be output by more programs, and we will add the probability of all programs that print that distribution.3 Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would). The data might be all of your sensory inputs, and n might be Graham's number; still, there's no reason such a distribution would need to assign 1 to every mathematical theorem.
Conclusion:
A Bayesian agent does not require mathematical omniscience, or logical (if that means anything more than Boolean) omniscience, but merely Boolean omniscience. All that Boolean omniscience means is that for whatever atomic propositions appear in the language (e.g., the language that forms the set of propositions that constitute the domain of the probability function) of the agent, any tautological Boolean combination of those propositions must be assigned a probability of 1, and any contradictory Boolean combination of those propositions must be assigned 0. As far as I can tell, the whole notion that Bayesian agents must assign 1 to tautologies and 0 to contradictions comes from the fact that when you fill out a table of joint distributions (or follow the Komolgorov axioms in some other way) all of the Boolean theorems get a probability of 1. This does not imply that you need to assign 1 to Fermat's last theorem, even if you are reasoning probabilistically in a language that is very expressive.4
Some Ways To Prove This Wrong:
Show that a really expressive semantic language, like the one I gave above, implies PA if you allow Boolean operations on its atomic propositions. Alternatively, you could show that Solomonoff induction can express PA theorems as propositions with probabilities, and that it assigns them 1. This is what I tried to do, but I failed on both occasions, which is why I wrote this.
[1] There are also interesting questions about the role of tautologies that rely on synonymy in probability theory, and whether they must be assigned a probability of 1, but I decided to keep it to mathematics for the sake of this post.
[2] I think this language is ridiculous, and openly admit it has next to no real world application. I stole the idea for the language from Carnap.
[3] This is a sloppily presented approximation to Solomonoff induction as n goes to infinity.
[4] The argument above is not a mathematical proof, and I am not sure that it is airtight. I am posting this to the discussion board instead of a full-blown post because I want feedback and criticism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start being more careful when they think or write about logical omniscience.
Musings on the LSAT: "Reasoning Training" and Neuroplasticity
The purpose of this post is to provide basic information about the LSAT including the format of the test and a few sample questions. I also wanted to bring light to some research that has found LSAT preparation to alter brain structure in ways that strengthen hypothesized "reasoning pathways". These studies have not been discussed here before; I thought they were interesting and really just wanted to call your collective attention to them.
I really like taking tests; I get energized by intense race-against-the-clock problem solving and, for better or worse, I relish getting to see my standing relative to others when the dust settles. I like the the purity of the testing situation --how conditions are standardized in theory and more or less the same for all comers. This guilty pleasure has played no small part in the course my life has taken: I worked as a test prep tutor for 3 years and loved every minute of it, I met my wife through academic competitions in high school, and I am a currently a graduate student doing lots of coursework in psychometrics.
Well, my brother-in-law is a lawyer, and when we chat the topic of the LSAT has served as some conversational common ground. Since I like taking tests for fun, he suggested I give it a whirl because he thought it was interesting and felt like it was a fair assessment of one's logical reasoning ability. So I did, I took a practice test cold a couple Saturdays ago and I was very impressed. Here the one I took. (This is a full practice exam provided by the test-makers; it's also like the top google result for "LSAT practice test".) I wanted to post here about it because the LSAT hasn't been discussed very much on this site and I thought that some of you might find it useful to know about.
A brief run-down of the LSAT:
The test has four parts: two Logical Reasoning sections, a Critical Reading section (akin to SAT et al.), and an Analytical Reasoning, or "logic games", section. Usually when people talk about the LSAT, the logic games get emphasized because they are unusual and can be pretty challenging (the only questions I missed were of this type; I missed a few and I ran out of time). Essentially, you get a premise and a bunch of conditions from which you are required to draw conclusions. Here's an example:
A cruise line is scheduling seven week-long voyages for the ship Freedom.
Each voyage will occur in exactly one of the first seven weeks of the season: weeks 1 through 7.
Each voyage will be to exactly one of four destinations:Guadeloupe, Jamaica, Martinique, or Trinidad.
Each destination will be scheduled for at least one of the weeks.
The following conditions apply: Jamaica will not be its destination in week 4.
Trinidad will be its destination in week 7. Freedom will make exactly two voyages to Martinique,
and at least one voyage to Guadeloupe will occur in some week between those two voyages.
Guadeloupe will be its destination in the week preceding any voyage it makes to Jamaica.
No destination will be scheduled for consecutive weeks.
11. Which of the following is an acceptable schedule of destinations in order from week 1 through week 7?
(A) Guadeloupe, Jamaica, Martinique, Trinidad,Guadeloupe, Martinique, Trinidad
(B) Guadeloupe, Martinique, Trinidad, Martinique, Guadeloupe, Jamaica, Trinidad
(C) Jamaica, Martinique, Guadeloupe, Martinique, Guadeloupe, Jamaica, Trinidad
(D) Martinique, Trinidad, Guadeloupe, Jamaica, Martinique, Guadeloupe, Trinidad
(E) Martinique, Trinidad, Guadeloupe, Trinidad, Guadeloupe, Jamaica, Martinique
Clearly, this section places a huge burden on working memory and is probably the most g-loaded of the four. I'd guess that most LSAT test prep is about strategies for dumping this burden into some kind of written scheme that makes it all more manageable. But I just wanted to show you the logic games for completeness; what I was really excited by were the Logical Reasoning questions (sections II and III). You are presented with some scenario containing a claim, an argument, or a set of facts, and then asked to analyze, critique, or to draw correct conclusions. Here are most of the question stems used in these sections:
Which one of the following most accurately expresses the main conclusion of the economist’s argument?
Which one of the following uses flawed reasoning that most closely resembles the flawed reasoning in the argument?
Which one of the following most logically completes the argument?
The reasoning in the consumer’s argument is most vulnerable to criticism on the grounds that the argument...
The argument’s conclusion follows logically if which one of the following is assumed?
Which one of the following is an assumption required by the argument?
Heyo! This is exactly the kind of stuff I would like to become better at! Most of the questions were pretty straightforward, but the LSAT is known to be a tough test (score range: 120-180, 95th %ile: ~167, 99th %ile: ~172) and these practice questions probably aren't representative. What a cool test though! Here's a whole question from this section, superficially about utilitarianism:
3. Philosopher: An action is morally right if it would be reasonably expected
to increase the aggregate well-being of the people affected by it. An action
is morally wrong if and only if it would be reasonably expected to reduce the
aggregate well-being of the people affected by it. Thus, actions that would
be reasonably expected to leave unchanged the aggregate well-being of the
people affected by them are also right.
The philosopher’s conclusion follows logically if which one of the following is assumed?(A) Only wrong actions would be reasonably expected to reduce the aggregate
well-being of the people affected by them.
(B) No action is both right and wrong.
(C) Any action that is not morally wrong is morally right.
(D) There are actions that would be reasonably expected to leave unchanged the
aggregate well-being of the people affected by them.
(E) Only right actions have good consequences.
Also, the LSAT is a good test, in that it measures well one's ability to succeed in law school. Validity studies boast that “LSAT score alone continues to be a better predictor of law school performance than UGPA [undergraduate GPA] alone.” Of course, the outcome variable can be regressed on both predictors and account for more of the variance than either one taken singly, but it is uncommon for a standardized test to beat prior GPA in predicting a students future GPA.
Intensive LSAT preparation and neuroplasticity:
In two recent studies (same research team), learning to reason in the logically formal way required by the LSAT was found to alter brain structure in ways consistent with literature reviews of the neural correlates of logical reasoning. Note: my reading of these articles was pretty surface-level; I do not intend to provide a thorough review, only to bring them to your attention.
These researchers recruited pre-law students enrolling in an LSAT course and imaged their brains at rest using fMRI both before and after 3 months of this "reasoning training". As controls, they included age- and IQ-matched pre-law students intending to take LSAT in the future but not actively preparing for it.
The LSAT-prep group was found to have significantly increased connectivity between parietal and prefrontal cortices and the striatum, both within the left hemisphere and across hemispheres. In the first study, the authors note that
These experience-dependent changes fall into tracts that would be predicted by prior work showing that reasoning relies on an interhemispheric frontoparietal network (for review, see Prado et al., 2011). Our findings are also consistent with the view that reasoning is largely left-hemisphere dominent (e.g., Krawczyk, 2012), but that homologous cortex in the right hemisphere can be recruited as needed to support complex reasoning. Perhaps learning to reason more efficiently involves recruiting compensatory neural circuitry more consistently.
And in the second study, they conclude
An analysis of pairwise correlations between brain regions implicated in reasoning showed that fronto-parietal connections were strengthened, along with parietal-striatal connections. These findings provide strong evidence for neural plasticity at the level of large-scale networks supporting high-level cognition.
I think this hypothesized fronto-parietal reasoning network is supposed to go something like this:
The LSAT requires a lot of relational reasoning, the ability to compare and combine mental representations. The parietal cortex holds individual relationships between these mental representations (A->B, B->C), and the prefrontal cortex integrates this information to draw conclusions (A->B->C, therefore A->C). The striatum's role in this network would be to monitor the success/failure of reward predictions and encourage flexible problem solving. Unfortunately, my understanding here is very limited. Here are several reviews of this reasoning network stuff (I have not read any; just wanted to share them): Hampshire et al. (2011), Prado et al. (2011), Krawczyk (2012).
I hope this was useful information! According to the 2013 survey, only 2.2% of you are in law-related professions, but I was wondering (1) if anyone has personal experience studying for this exam, (2) if they felt like it improved their logical reasoning skills, and (3) if they felt that these effects were long-lasting. Studying for this test seems to have the potential to inculcate rationalist habits-of-mind; I know it's just self-report, but for those who went on to law school, did you feel like you benefited from the experience studying for the LSAT? I only ask because the Law School Admission Council, a non-profit organization made up of 200+ law schools, seems to actively encourage preparation for the exam, member schools say it is a major factor in admissions, preparation tends to increase performance, and LSAT performance is correlated moderately-to-strongly with first year law school GPA (r= ~0.4).
Proposal: Use logical depth relative to human history as objective function for superintelligence
I attended Nick Bostrom's talk at UC Berkeley last Friday and got intrigued by these problems again. I wanted to pitch an idea here, with the question: Have any of you seen work along these lines before? Can you recommend any papers or posts? Are you interested in collaborating on this angle in further depth?
The problem I'm thinking about (surely naively, relative to y'all) is: What would you want to program an omnipotent machine to optimize?
For the sake of avoiding some baggage, I'm not going to assume this machine is "superintelligent" or an AGI. Rather, I'm going to call it a supercontroller, just something omnipotently effective at optimizing some function of what it perceives in its environment.
As has been noted in other arguments, a supercontroller that optimizes the number of paperclips in the universe would be a disaster. Maybe any supercontroller that was insensitive to human values would be a disaster. What constitutes a disaster? An end of human history. If we're all killed and our memories wiped out to make more efficient paperclip-making machines, then it's as if we never existed. That is existential risk.
The challenge is: how can one formulate an abstract objective function that would preserve human history and its evolving continuity?
I'd like to propose an answer that depends on the notion of logical depth as proposed by C.H. Bennett and outlined in section 7.7 of Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications which I'm sure many of you have handy. Logical depth is a super fascinating complexity measure that Li and Vitanyi summarize thusly:
Logical depth is the necessary number of steps in the deductive or causal path connecting an object with its plausible origin. Formally, it is the time required by a universal computer to compute the object from its compressed original description.
The mathematics is fascinating and better read in the original Bennett paper than here. Suffice it presently to summarize some of its interesting properties, for the sake of intuition.
- "Plausible origins" here are incompressible, i.e. algorithmically random.
- As a first pass, the depth D(x) of a string x is the least amount of time it takes to output the string from an incompressible program.
- There's a free parameter that has to do with precision that I won't get into here.
- Both a string of length n that is comprised entirely of 1's, and a string of length n of independent random bits are both shallow. The first is shallow because it can be produced by a constant-sized program in time n. The second is shallow because there exists an incompressible program that is the output string plus a constant sized print function that produces the output in time n.
- An example of a deeper string is the string of length n that for each digit i encodes the answer to the ith enumerated satisfiability problem. Very deep strings can involve diagonalization.
- Like Kolmogorov complexity, there is an absolute and a relative version. Let D(x/w) be the least time it takes to output x from a program that is incompressible relative to w,
- It can be updated with observed progress in human history at time t' by replacing ht with ht'. You could imagine generalizing this to something that dynamically updated in real time.
- This is a quite conservative function, in that it severely punishes computation that does not depend on human history for its input. It is so conservative that it might result in, just to throw it out there, unnecessary militancy against extra-terrestrial life.
- There are lots of devils in the details. The precision parameter I glossed over. The problem of representing human history and the state of the universe. The incomputability of logical depth (of course it's incomputable!). My purpose here is to contribute to the formal framework for modeling these kinds of problems. The difficult work, like in most machine learning problems, becomes feature representation, sensing, and efficient convergence on the objective.
Thought experiments on simplicity in logical probability
A common feature of many proposed logical priors is a preference for simple sentences over complex ones. This is sort of like an extension of Occam's razor into math. Simple things are more likely to be true. So, as it is said, "why not?"
Well, the analogy has some wrinkles - unlike hypothetical rules for the world, logical sentences do not form a mutually exclusive set. Instead, for every sentence A there is a sentence not-A with pretty much the same complexity, and probability 1-P(A). So you can't make the probability smaller for all complex sentences, because their negations are also complex sentences! If you don't have any information that discriminates between them, A and not-A will both get probability 1/2 no matter how complex they get.
But if our agent knows something that breaks the symmetry between A and not-A, like that A belongs to a mutually exclusive and exhaustive set of sentences with differing complexities, then it can assign higher probabilities to simpler sentences in this set without breaking the rules of probability. Except, perhaps, the rule about not making up information.
The question: is the simpler answer really more likely to be true than the more complicated answer, or is this just a delusion? If so, is it for some ontologically basic reason, or for a contingent and explainable reason?
There are two complications to draw your attention to. The first is in what we mean by complexity. Although it would be nice to use the Kolmogorov complexity of any sentence, which is the length of the shortest program that prints the sentence, such a thing is uncomputable by the kind of agent we want to build in the real world. The only thing our real-world agent is assured of seeing is the length of the sentence as-is. We can also find something in between Kolmogorov complexity and length by doing a brief search for short programs that print the sentence - this meaning is what is usually meant in this article, and I'll call it "apparent complexity."
The second complication is in what exactly a simplicity prior is supposed to look like. In the case of Solomonoff induction the shape is exponential - more complicated hypotheses are exponentially less likely. But why not a power law? Why not even a Poisson distribution? Does the difficulty of answering this question mean that thinking that simpler sentences are more likely is a delusion after all?
Thought experiments:
1: Suppose our agent knew from a trusted source that some extremely complicated sum could only be equal to A, or to B, or to C, which are three expressions of differing complexity. What are the probabilities?
Commentary: This is the most sparse form of the question. Not very helpful regarding the "why," but handy to stake out the "what." Do the probabilities follow a nice exponential curve? A power law? Or, since there are just the three known options, do they get equal consideration?
This is all based off intuition, of course. What does intuition say when various knobs of this situation are tweaked - if the sum is of unknown complexity, or of complexity about that of C? If there are a hundred options, or countably many? Intuitively speaking, does it seem like favoring simpler sentences is an ontologically basic part of your logical prior?
2: Consider subsequences of the digits of pi. If I give you a pair (n,m), you can tell me the m digits following the nth digit of pi. So if I start a sentence like "the subsequence of digits of pi (10100, 102) = ", do you expect to see simpler strings of digits on the right side? Is this a testable prediction about the properties of pi?
Commentary: We know that there is always a short-ish program to produce the sequences, which is just to compute the relevant digits of pi. This sets a hard upper bound on the possible Kolmogorov complexity of sequences of pi (that grows logarithmically as you increase m and n), and past a certain m this will genuinely start restricting complicated sequences, and thus favoring "all zeros" - or does it?
After all, this is weak tea compared to an exponential simplicity prior, for which the all-zero sequence would be hojillions of times more likely than a messy one. On the other hand, an exponential curve allows sequences with higher Kolmogorov complexity than the computation of the digits of pi.
Does the low-level view outlined in the first paragraph above demonstrate that the exponential prior is bunk? Or can you derive one from the other with appropriate simplifications (keeping in mind Komogorov complexity vs. apparent complexity)? Does pi really contain more long simple strings than expected, and if not what's going on with our prior?
3: Suppose I am writing an expression that I want to equal some number you know - that is, the sentence "my expression = your number" should be true. If I tell you the complexity of my expression, what can you infer about the likelihood of the above sentence?
Commentary: If we had access to Kolmogorov complexity of your number, then we could completely rule out answers that were too K-simple to work. With only an approximation, it seems like we can still say that simple answers are less likely up to a point. Then as my expression gets more and more complicated, there are more and more available wrong answers (and, outside of the system a bit, it becomes less and less likely that I know what I'm doing), and so probability goes down.
In the limit that my expression is much more complex than your number, does an elegant exponential distribution emerge from underlying considerations?
Top-Down and Bottom-Up Logical Probabilities
I.
I don't know very much model theory, and thus I don't fully understand Hutter et al.'s logical prior, detailed here, but nonetheless I can tell you that it uses a very top-down approach. About 60% of what I mean is that the prior is presented as a completed object with few moving parts, which fits the authors' mathematical tastes and proposed abstract properties the function should have. And for another thing, it uses model theory - a dead giveaway.
There are plenty of reasons to take a top-down approach. Yes, Hutter et al.'s function isn't computable, but sometimes the properties you want require uncomputability. And it's easier to come up with something vaguely satisfactory if you don't have to have many moving parts. This can range from "the prior is defined as a thing that fulfills the properties I want" on the lawful good side of the spectrum, to "clearly the right answer is just the exponential of the negative complexity of the statement, duh".
Probably the best reason to use a top-down approach to logical uncertainty is so you can do math to it. When you have some elegant description of global properties, it's a lot easier to prove that your logical probability function has nice properties, or to use it in abstract proofs. Hence why model theory is a dead giveaway.
There's one other advantage to designing a logical prior from the top down, which is that you can insert useful stuff like a complexity penalty without worrying too much. After all, you're basically making it up as you go anyhow, you don't have to worry about where it comes from like you would if you were going form the bottom up.
A bottom-up approach, by contrast, starts with an imagined agent with some state of information and asks what the right probabilities to assign are. Rather than pursuing mathematical elegance, you'll see a lot of comparisons to what humans do when reasoning through similar problems, and demands for computability from the outset.
For me, a big opportunity of the bottom-up approach is to use desiderata that look like principles of reasoning. This leads to more moving parts, but also outlaws some global properties that don't have very compelling reasons behind them.
II.
Before we get to the similarities, rather than the differences, we'll have to impose the condition of limited computational resources. A common playing field, as it were. It would probably serve just as well to extend bottom-up approaches to uncomputable heights, but I am the author here, and I happen to be biased towards the limited-resources case.
The part of top-down assignment using limited resources will be played by a skeletonized pastiche of Paul Christiano's recent report:
i. No matter what, with limited resources we can only assign probabilities to a limited pool of statements. Accordingly, step one is to use some process to choose the set S0 of statements (and their negations) to assign probabilities.
ii. Then we use something a weakened consistency condition (that can be decided between pairs of sentences in polynomial time) to set constraints on the probability function over S0. For example, sentences that are identical except for a double-negation have to be given the same probability.
iii. Christiano constructs a description-length-based "pre-prior" function that is bigger for shorter sentences. There are lots of options for different pre-priors, and I think this is a pretty good one.
iv. Finally, assign a logical probability function over S0 that is as similar as possible to the pre-prior while fulfilling the consistency condition. Christiano measures similarity using cross-entropy between the two functions, so that the problem is one of minimizing cross-entropy subject to a finite list of constraints. (Even if the pre-prior decreases exponentially, this doesn't mean that complicated statements will have exponentially low logical probability, because of the condition from step two that P(a statement) + P(its negation) = 1 - in a state of ignorance, everything still gets probability 1/2. The pre-prior only kicks in when there are more options with different description lengths.)
Next, let's look at the totally different world of a bottom-up assignment of logical probabilities, played here by a mildly rephrased version of my past proposal.
i. Pick a set of sentences S1 to try and figure out the logical probabilities of.
ii. Prove the truth or falsity of a bunch of statements in the closure of S1 under conjugation and negation (i.e. if sentences a and b are in S1, a&b is in the closure of S1).
iii. Assign a logical probability function over the closure of S1 under conjugation with maximum entropy, subject to the constraints proved in part two, plus the constraints that each sentence && its negation has probability 0.
These turn out to be really similar! Look in step three of my bottom-up example - there's a even a sneakily-inserted top-down condition about going through every single statement and checking an aspect of consistency. In the top-down approach, every theorem of a certain sort is proved, while in the bottom-up approach there are allowed to be lots of gaps - but the same sorts of theorems are proved. I've portrayed one as using proofs only about sentences in S0, and the other as using proofs in the entire closure of S1 under conjunction, but those are just points on an available continuum (for more discussion, see Christiano's section on positive semidefinite methods).
The biggest difference is this "pre-prior" thing. On the one hand, it's essential for giving us guarantees about inductive learning. On the other hand, what piece of information do we have that tells us that longer sentences really are less likely? I have unresolved reservations, despite the practical advantages.
III.
A minor confession - my choice of Christiano's report was not coincidental at all. The causal structure went like this:
Last week - Notice dramatic similarities in what gets proved and how it gets used between my bottom-up proposal and Christiano's top-down proposal.
Now - Write post talking about generalities of top-down and bottom-up approaches to logical probability, and then find as a startling conclusion the thing that motivated me to write the post in the first place.
The teeensy bit of selection bias here means that though these similarities are cool, it's hard to draw general conclusions.
So let's look at one more proposal, this one due to Abram Demski, modified by to use limited resources.
i. Pick a set of sentences S2 to care about.
ii. Construct a function on sentences in S2 that is big for short sentences and small for long sentences.
iii. Start with the set of sentences that are axioms - we'll shortly add new sentences to the set.
iv. Draw a sentence from S2 with probability proportional to the function from step two.
v. Do a short consistency check (can use a weakened consistency condition, or just limited time) between this sentence and the sentences already in the set. If it's passed, add the sentence to the set.
vi. Keep doing steps four and five until you've either added or ruled out all the sentences in S2.
vii. The logical probability of a sentence is defined as the probability that it ends up in our set after going through this process. We can find this probability using Monte Carlo by just running the process a bunch of times and counting up what portion of the time each sentences is in the set by the end.
Okay, so this one looks pretty different. But let's look for the similarities. The exact same kinds of things get proved again - weakened or scattershot consistency checks between different sentences. If all you have in S2 are three mutually exclusive and exhaustive sentences, the one that's picked first wins - meaning that the probability function over what sentence gets picked first is acting like our pre-prior.
So even though the method is completely different, what's really going on is that sentences are being given measure that looks like the pre-prior, subject to the constraints of weakened consistency (via rejection sampling) and normalization (keep repeating until all statements are checked).
In conclusion: not everything is like everything else, but some things are like some other things.
Logical fallacy poster
http://www.yourlogicalfallacyis.com

Just printed an A3 of this.
See now http://lesswrong.com/lw/c9u/logical_fallacies_poster_a_lesswrong_adaptation/
States of knowledge as amplitude configurations
I am reading through the sequence on quantum physics and have had some questions which I am sure have been thought about by far more qualified people. If you have any useful comments or links about these ideas, please share.
Most of the strongest resistance to ideas about rationalism that I encounter comes not from people with religious beliefs per se, but usually from mathematicians or philosophers who want to assert arguments about the limits of knowledge, the fidelity of sensory perception as a means for gaining knowledge, and various (what I consider to be) pathological examples (such as the zombie example). Among other things, people tend to reduce the argument to the existence of proper names a la Wittgenstein and then go on to assert that the meaning of mathematics or mathematical proofs constitutes something which is fundamentally not part of the physical world.
As I am reading the quantum physics sequence (keep in mind that I am not a physicist; I am an applied mathematician and statistician and so the mathematical framework of Hilbert spaces and amplitude configurations makes vastly much more sense to me than billiard balls or waves, yet connecting it to reality is still very hard for me) I am struck by the thought that all thoughts are themselves fundamentally just amplitude configurations, and by extension, all claims about knowledge about things are also statements about amplitude configurations. For example, my view is that the color red does not exist in and of itself but rather that the experience of the color red is a statement about common configurations of particle amplitudes. When I say "that sign is red", one could unpack this into a detailed statement about statistical properties of configurations of particles in my brain.
The same reasoning seems to apply just as well to something like group theory. States of knowledge about the Sylow theorems, just as an example, would be properties of particle amplitude configurations in a brain. The Sylow theorems are not separately existing entities which are of themselves "true" in any sense.
Perhaps I am way off base in thinking this way. Can any philosophers of the mind point me in the right direction to read more about this?
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)