Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

## [Link] Putanumonit - Bayesian inference vs. null hypothesis testing

5 22 January 2017 02:31PM

## Expect to know better when you know more

3 21 April 2016 03:47PM

A seemingly trivial result, that I haven't seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Let H be the true hypothesis/model/environment/distribution, and ~H its negation. Let e be evidence we receive, taking values e1, e2, ... en. Let pi=P(e=ei|H) and qi=P(E=ei|~H).

The expected posterior weighting of H, P(e|H), is Σpipi while the expected posterior weighting of ~H, P(e|~H), is Σqipi. Then since the pi and qi both sum to 1, Cauchy–Schwarz implies that

• E(P(e|H)) ≥ E(P(e|~H)).

Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.

This, however, doesn't mean that the Bayes factor - P(e|H)/P(e|~H) - must have expectation greater than one, since ratios of expectation are not the same as expectations of ratio. The Bayes factor given e=ei is (pi/qi). Thus the expected Bayes factor is Σ(pi/qi)pi. The negative logarithm is a convex function; hence by Jensen's inequality, -log[E(P(e|H)/P(e|~H))] ≤ -E[log(P(e|H)/P(e|~H))]. That last expectation is Σ(log(pi/qi))pi. This is the Kullback–Leibler divergence of P(e|~H) from P(e|H), and hence is non-negative. Thus log[E(P(e|H)/P(e|~H))] ≥ 0, and hence

• E(P(e|H)/P(e|~H)) ≥ 1.

Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.

Note that this is not true for the inverse. Indeed E(P(e|~H)/P(e|H)) = Σ(qi/pi)pi = Σqi = 1.

In the preceding proofs, ~H played no specific role, and hence

• For all K,    E(P(e|H)) ≥ E(P(e|K))    and    E(P(e|H)/P(e|K)) ≥ 1    (and E(P(e|K)/P(e|H)) = 1).

Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.

Now we can turn to the posterior probability P(H|e). For e=ei, this is P(H)*P(e=ei|H)/P(e=ei). We can compute the expectation of P(e|H)/P(e) as above, using the non-negative Kullback–Leibler divergence of P(e) from P(e|H), and thus showing it has an expectation greater than or equal to 1. Hence:

• E(P(H|e)) ≥ P(H).

Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.

## Does Probability Theory Require Deductive or Merely Boolean Omniscience?

4 03 August 2015 06:54AM

It is often said that a Bayesian agent has to assign probability 1 to all tautologies, and probability 0 to all contradictions. My question is... exactly what sort of tautologies are we talking about here? Does that include all mathematical theorems? Does that include assigning 1 to "Every bachelor is an unmarried male"?1 Perhaps the only tautologies that need to be assigned probability 1 are those that are Boolean theorems implied by atomic sentences that appear in the prior distribution, such as: "S or ~ S".

It seems that I do not need to assign probability 1 to Fermat's last conjecture in order to use probability theory when I play poker, or try to predict the color of the next ball to come from an urn. I must assign a probability of 1 to "The next ball will be white or it will not be white", but Fermat's last theorem seems to be quite irrelevant. Perhaps that's because these specialized puzzles do not require sufficiently general probability distributions; perhaps, when I try to build a general Bayesian reasoner, it will turn out that it must assign 1 to Fermat's last theorem.

Imagine a (completely impractical, ideal, and esoteric) first order language, who's particular subjects were discrete point-like regions of space-time. There can be an arbitrarily large number of points, but it must be a finite number. This language also contains a long list of predicates like: is blue, is within the volume of a carbon atom, is within the volume of an elephant, etc. and generally any predicate type you'd like (including n place predicates).2 The atomic propositions in this language might look something like: "5, 0.487, -7098.6, 6000s is Blue" or "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant." The first of these propositions says that a certain point in space-time is blue; the second says that there is an elephant between two points at one second after the universe starts. Presumably, at least the denotational content of most english propositions could be expressed in such a language (I think, mathematical claims aside).

Now imagine that we collect all of the atomic propositions in this language, and assign a joint distribution over them. Maybe we choose max entropy, doesn't matter. Would doing so really require us to assign 1 to every mathematical theorem? I can see why it would require us to assign 1 to every tautological Boolean combination of atomic propositions [for instance: "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant OR ~((1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant)], but that would follow naturally as a consequence of filling out the joint distribution. Similarly, all the Boolean contradictions would be assigned zero, just as a consequence of filling out the joint distribution table with a set of reals that sum to 1.

A similar argument could be made using intuitions from algorithmic probability theory. Imagine that we know that some data was produced by a distribution which is output by a program of length n in a binary programming language. We want to figure out which distribution it is. So, we assign each binary string a prior probability of 2^-n. If the language allows for comments, then simpler distributions will be output by more programs, and we will add the probability of all programs that print that distribution.3 Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would). The data might be all of your sensory inputs, and n might be Graham's number; still, there's no reason such a distribution would need to assign 1 to every mathematical theorem.

Conclusion

A Bayesian agent does not require mathematical omniscience, or logical (if that means anything more than Boolean) omniscience, but merely Boolean omniscience. All that Boolean omniscience means is that for whatever atomic propositions appear in the language (e.g., the language that forms the set of propositions that constitute the domain of the probability function) of the agent, any tautological Boolean combination of those propositions must be assigned a probability of 1, and any contradictory Boolean combination of those propositions must be assigned 0. As far as I can tell, the whole notion that Bayesian agents must assign 1 to tautologies and 0 to contradictions comes from the fact that when you fill out a table of joint distributions (or follow the Komolgorov axioms in some other way) all of the Boolean theorems get a probability of 1. This does not imply that you need to assign 1 to Fermat's last theorem, even if you are reasoning probabilistically in a language that is very expressive.4

Some Ways To Prove This Wrong:

Show that a really expressive semantic language, like the one I gave above, implies PA if you allow Boolean operations on its atomic propositions. Alternatively, you could show that Solomonoff induction can express PA theorems as propositions with probabilities, and that it assigns them 1. This is what I tried to do, but I failed on both occasions, which is why I wrote this.

[1] There are also interesting questions about the role of tautologies that rely on synonymy in probability theory, and whether they must be assigned a probability of 1, but I decided to keep it to mathematics for the sake of this post.

[2] I think this language is ridiculous, and openly admit it has next to no real world application. I stole the idea for the language from Carnap.

[3] This is a sloppily presented approximation to Solomonoff induction as n goes to infinity.

[4] The argument above is not a mathematical proof, and I am not sure that it is airtight. I am posting this to the discussion board instead of a full-blown post because I want feedback and criticism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start being more careful when they think or write about logical omniscience.

## Intuitive cooperation

16 25 July 2014 01:48AM

This is an exposition of some of the main ideas in the paper Robust Cooperation. My goal is to make the ideas and proofs seem natural and intuitive - instead of some mysterious thing where we invoke Löb's theorem at the right place and the agents magically cooperate. Also I hope it is accessible to people without a math or CS background. Be warned, it is pretty cheesy ok.

In a small quirky town, far away from other cities or towns, the most exciting event is a game called (for historical reasons) The Prisoner's Dilemma. Everyone comes out to watch the big tournament at the end of Summer, and you (Alice) are especially excited because this year it will be your first time playing in the tournament! So you've been thinking of ways to make sure that you can do well.

The way the game works is this: Each player can choose to cooperate or defect with the other player. If you both cooperate, then you get two points each. If one of you defects, then that player will get three points, and the other player won't get any points. But if you both defect, then you each get only one point. You have to make your decisions separately, without communicating with each other - however, everyone is required to register the algorithm they will be using before the tournament, and you can look at the other player's algorithm if you want to. You also are allowed to use some outside help in your algorithm.

Now if you were a newcomer, you might think that no matter what the other player does, you can always do better by defecting. So the best strategy must be to always defect! Of course, you know better, if everyone tried that strategy, then they would end up defecting against each other, which is a shame since they would both be better off if they had just cooperated.

But how can you do better? You have to be able to describe your algorithm in order to play. You have a few ideas, and you'll be playing some practice rounds with your friend Bob soon, so you can try them out before the actual tournament.

I'll cooperate with Bob if I can tell from his algorithm that he'll cooperate with me. Otherwise I'll defect.

For your first try, you'll just run Bob's algorithm and see if he cooperates. But there's a problem - if Bob tries the same strategy, he'll have to run your algorithm, which will run his algorithm again, and so on into an infinite loop!

So you'll have to be a bit more clever than that... luckily you know a guy, Shady, who is good at these kinds of problems.

You call up Shady, and while you are waiting for him to come over, you remember some advice your dad Löb gave you.

(Löb's theorem) "If someone says you can trust them on X, well then they'll just tell you X."

If $S \vdash \Box X \rightarrow X$ (someone tells you If [I tell you] X, then X is true)

Then $S \vdash X$ (someone tells you X is true)

(See The Cartoon Guide to Löb's Theorem[pdf] for a nice proof of this)

Here's an example:

Sketchy watch salesman: Hey, if I tell you these watches are genuine then they are genuine!

You: Ok... so are these watches genuine?

Sketchy watch salesman: Of course!

It's a good thing to remember when you might have to trust someone. If someone you already trust tells you you can trust them on something, then you know that something must be true.

On the other hand, if someone says you can always trust them, well that's pretty suspicious... If they say you can trust them on everything, that means that they will never tell you a lie - which is logically equivalent to them saying that if they were to tell you a lie, then that lie must be true. So by Löb's theorem, they will lie to you. (Gödel's second incompleteness theorem)

Despite his name, you actually trust Shady quite a bit. He's never told you or anyone else anything that didn't end up being true. And he's careful not to make any suspiciously strong claims about his honesty.

So your new plan is to ask Shady if Bob will cooperate with you. If so, then you will cooperate. Otherwise, defect. (FairBot)

It's game time! You look at Bob's algorithm, and it turns out he picked the exact same algorithm! He's going to ask Shady if you will cooperate with him. Well, the first step is to ask Shady, "will Bob cooperate with me?"

Shady looks at Bob's algorithm and sees that if Shady says you cooperate, then Bob cooperates. He looks at your algorithm and sees that if Shady says Bob cooperates, then you cooperate. Combining these, he sees that if he says you both cooperate, then both of you will cooperate. So he tells you that you will both cooperate (your dad was right!)

Let A stand for "Alice cooperates with Bob" and B stand for "Bob cooperates with Alice".

From looking at the algorithms, $S\vdash \Box A \rightarrow B$ and $S\vdash \Box B \rightarrow A$

So combining these, $S\vdash \Box (A \& B) \rightarrow (A \& B)$.

Then by Löb's theorem, $S\vdash A \& B$.

Since that means that Bob will cooperate, you decide to actually cooperate.

Bob goes through an analagous thought process, and also decides to cooperate. So you cooperate with each other on the prisoner's dilemma! Yay!

That night, you go home and remark, "it's really lucky we both ended up using Shady to help us, otherwise that wouldn't have worked..."

Your dad interjects, "Actually, it doesn't matter - as long as they were both smart enough to count, it would work. This $\Box X$ doesn't just say 'I tell you X', it's stronger than that - it actually says 'Anyone who knows basic arithmetic will tell you X'. So as long as they both know a little arithmetic, it will still work - even if one of them is pro-axiom-of-choice, and the other is pro-axiom-of-life. The cooperation is robust." That's really cool!

But there's another issue you think of. Sometimes, just to be tricky, the tournament organizers will set up a game where you have to play against a rock. Yes, literally just a rock that holding the cooperate button down. If you played against a rock with your current algorithm, well you start by asking Shady if the rock will cooperate with you. Shady is like, "well yeah, duh." So then you cooperate too. But you could have gotten three points by defecting! You're missing out on a totally free point!

You think that it would be a good idea to make sure the other player isn't a complete idiot before you cooperate with them. How can you check? Well, let's see if they would cooperate with a rock placed on the defect button (affectionately known as 'DefectRock'). If they know better than that, and they will cooperate with you, then you will cooperate with them.

The next morning, you excitedly tell Shady about your new plan. "It will be like before, except this time, I also ask you if the other player will cooperate with DefectRock! If they are dumb enough to do that, then I'll just defect. That way, I can still cooperate with other people who use algorithms like this one, or the one from before, but I can also defect and get that extra point when there's just a rock on cooperate."

Shady get's an awkward look on his face, "Sorry, but I can't do that... or at least it wouldn't work out the way you're thinking. Let's say you're playing against Bob, who is still using the old algorithm. You want to know if Bob will cooperate with DefectRock, so I have to check and see if I'll tell Bob that DefectRock will cooperate with him. I would have say I would never tell Bob that DefectRock will cooperate with him. But by Löb's theorem, that means I would tell you this obvious lie! So that isn't gonna work."

Notation, $X(Y)=C$ if X cooperates with Y in the prisoner's dilemma (or = D if not).

You ask Shady, does $B(DR)=D$?

Bob's algorithm: $B(DR)=C$ only if $S\vdash DR(B)=C$.

So to say $B(DR)=D$, we would need $S\vdash \neg\Box (DR(B)=C)$

This is equivalent to $S\vdash \Box (DR(B)=C) \rightarrow (DR(B)=C)$, since $DR(B)=C$ is an obvious lie.

By Löb's theorem, $S\vdash DR(B)=C$, which is a lie.

<Extra credit: does the fact that Shady is the one explaining this mean you can't trust him?>

<Extra extra credit: find and fix the minor technical error in the above argument.>

Shady sees the dismayed look on your face and adds, "...but, I know a guy who can vouch for me, and I think maybe that could make your new algorithm work."

So Shady calls his friend T over, and you work out the new details. You ask Shady if Bob will cooperate with you, and you ask T if Bob will cooperate with DefectRock. So T looks at Bob's algorithm, which asks Shady if DefectRock will cooperate with him. Shady, of course, says no. So T sees that Bob will defect against DefectRock, and lets you know. Like before, Shady tells you Bob will cooperate with you, and thus you decide to cooperate! And like before, Bob decides to cooperate with you, so you both cooperate! Awesome! (PrudentBot)

If Bob is using your new algorithm, you can see that the same argument goes through mostly unchanged, and that you will still cooperate! And against a rock on cooperate, T will tell you that it will cooperate with DefectRock, so you can defect and get that extra point! This is really great!!

(ok now it's time for the really cheesy ending)

It's finally time for the tournament. You have a really good feeling about your algorithm, and you do really well! Your dad is in the audience cheering for you, with a really proud look on his face. You tell your friend Bob about your new algorithm so that he can also get that extra point sometimes, and you end up tying for first place with him!

A few weeks later, Bob asks you out, and you two start dating. Being able to cooperate with each other robustly is a good start to a healthy relationship, and you live happily ever after!

The End.

## What should a Bayesian do given probability of proving X vs. of disproving X?

0 07 June 2014 06:40PM

Consider some disputed proposition X. Suppose there appeared to be a limited number of ways of proving and of disproving X. No one has yet constructed a proof or disproof, but you have a feeling for how likely it is that someone will.

For instance, take Fermat's Last Theorem or the 4-color problem. For each of them, at one point in time, there was no proof, but people had some sense of the prior probability of observing the lack of a counterexample given the space searched so far. They could use that to assign a probability of there being a counterexample (and hence a disproof) [1]. Later, there was an alleged proof, and people could estimate the probability that the proof was correct based on the reputation of the prover and the approach used. At that point, people could assign values to both P(will_be_proven(X)) and P(will_be_disproven(X)).

Is it reasonable to assign P(X) = P(will_be_proven(X)) / (P(will_be_proven(X)) + P(will_be_disproven(X))) ?

If so, consider X = "free will exists". One could argue that the term "free will" is defined such that it is impossible to detect it, or to prove that it exists. But if one could prove that the many worlds interpretation of quantum mechanics is correct, that would constitute a disproof of X. Then P(will_be_proven(X)) / (P(will_be_proven(X)) + P(will_be_disproven(X))) = 0.

Is it possible for this to happen when you know that X is not undecidable? If so, what do you do then?

1. The computation is not as simple as it might appear, because you need to adjust for the selection effect of mathematicians being interested only in conjectures with no counterexamples.

## [LINK] Cantor's theorem, the prisoner's dilemma, and the halting problem

13 30 June 2013 08:26PM

I wouldn't normally link to a post from my math blog here, but it concerns a cute interpretation of Cantor's theorem that showed up when I was thinking about program equilibria at the April MIRI workshop, so I thought it might be of interest here (e.g. if you're trying to persuade a mathematically inclined friend of yours to attend a future workshop). A short proof of the undecidability of the halting problem falls out as a bonus.

## LW Study Group Facebook Page

16 08 April 2013 09:15PM

Update: There is now an online sign up to groups with workflowy, based on subject and current ability. You do not have to be signed up to Facebook to join a group, but do add an email address so that the group can contact you:  https://workflowy.com/shared/cf1fd9ca-885f-c1b9-c2e8-e3a315f70138/

The recent Main article, searching for interest in LWers studying maths together, had many comments showing enthusiasm, but nothing really happened.

On an aside, I think that on LessWrong, we tend not to work together all that well. The wiki isn't kept bright and shiny, and most of the ideas we search for are in loose blog posts that often take a while to find. However, I think having a single place in which to work together on a specific topic, might encourage effect groups. Especially if it's in a place that you get fairly regular reminders from.

So, here's a Less Wrong Study Group Facebook Page: https://www.facebook.com/groups/131607983690959/

Rixie suggested that we could split into smaller groups, based on age. I was thinking perhaps ability. Maybe even a group leader. However, before sitting and pondering this for eternity (just until we have a perfect structure), perhaps we should 'just try it'.

So, who exactly do I think should join the group?

Well, if you're interested in learning maths, and think that being surrounded by LWers might enhance your learning, this group is intended for you. If you're interested in learning maths, but you think that reading a textbook on your own is daunting, or you've tried and had difficulty previously, then this group is intended for you.

Also, if you're interested in learning other LessWrongy subjects (perhaps some cognitive science, or more economics-y stuffs) then this group could do that. If ten people join who want a basic idea economics, then they can work together. This isn't specificly maths, it's whatever we make it.

Personally, when I read a textbook, there's often a paragraph describing a key idea, and the author's words fly right over my head. I've often thought the best thing for me, would to have someone else who I could talk through that bit with. Maybe he or she would see it easily. Maybe I'd see something they wouldn't get.

I also wouldn't worry about level of prior knowledge. Mainly, because mine is zilch :)

So, what are you waiting for?

(No seriously. Just try it.)

Edit: It is true that anonymity is difficult to preserve on Facebook. I am entirely unfamiliar with google, and I certainly would have to make that regular effort to check it there too. If you do wish to join but have issues with public knowledge, please PM me, and I'll keep in contact with you through email (or other if you prefer). I will discuss with you there how to best take part in a study group.

## [Link] You May Already Be Aware of Your Cognitive Biases

6 21 February 2013 01:23PM

From the article:

Using an adaptation of the standard 'bat-and-ball' problem, the researchers explored this phenomenon. The typical 'bat-and-ball' problem is as follows: a bat and ball together cost \$1.10. The bat costs \$1 more than the ball. How much does the ball cost? The intuitive answer that immediately springs to mind is 10 cents. However, the correct response is 5 cents.

The authors developed a control version of this problem, without the relative statement that triggers the substitution of a hard question for an easier one: A magazine and a banana together cost \$2.90. The magazine costs \$2. How much does the banana cost?

A total of 248 French university students were asked to solve each version of the problem. Once they had written down their answers, they were asked to indicate how confident they were that their answer was correct.

Only 21 percent of the participants managed to solve the standard problem (bat/ball) correctly. In contrast, the control version (magazine/banana) was solved correctly by 98 percent of the participants. In addition, those who gave the wrong answer to the standard problem were much less confident of their answer to the standard problem than they were of their answer to the control version. In other words, they were not completely oblivious to the questionable nature of their wrong answer.

Article in Science Daily: http://www.sciencedaily.com/releases/2013/02/130219102202.htm

Original abstract (the rest is paywalled): http://link.springer.com/article/10.3758/s13423-013-0384-5

## The Emergence of Math

1 02 November 2012 01:08AM

In his latest post, Logical Pinpointing, Eliezer talks about the nature of math. I have my own views on the subject, but when writing about them, it became too long for a comment, so I'm posting it here in discussion.

I think it's important to clarify that I am not posting this under the guise that I am correct. I'm just posting my current view of things, which may or may not be correct, and which has not been assessed by others yet. So though I have pushed myself to write in a confident (and clear) tone, I am not actually confident in my views ... well, okay, I'm lying. I am confident in my views, but I know I shouldn't be, so I'm trying to pretend to myself that I'm posting this to have my mind changed, when in reality I'm posting it with the stupid expectation that you'll all magically be convinced. Thankfully, despite all that, I don't have a tendency to cling to beliefs which I have seen proven wrong, so I will change my mind when someone kicks my belief's ass. I won't go down without a fight, but once I'm dead, I'm dead.

Before I can share my views, I need to clarify a few concepts, and then I'll go about showing what I believe and why.

Abstractions

Abstract models are models in which some information is ignored. Take, for example, the abstract concept of the ball. I don't care if the ball is made of rubber or steel, nor do I care if it has a radius of 6.7cm or a metre, a ball is a ball. I couldn't care less about the underlying configuration of quarks in the ball, as long as it's a sphere.

Numbers are also abstractions. If I've got some apples, when I abstract this into a number, I don't care that they are apples. All I care about is how many there are. I can conveniently forget about the fact that it's apples, and about the concept of time, and the hole in my bag through which the apples might fall.

Axiomatic Systems (for example, Peano Arithmetic)

Edit: clarified this paragraph a bit.

I don't have much to say about these for now, except that they can all be reduced to physics. Given that mathematicians store axiomatic systems in their minds, and use them to prove things, they cannot help but be reducible to physical things, unless the mind itself is not physical. I think most LessWrongers, being reductionists, believe this, so I won't go into much more detail. I'll just say that this will be important later on.

I Can Prove Things About Numbers Using Apples

Let's say I have 2 apples. Then someone gives me 2 more. I now have 4. I have just shown that 2+2=4, assuming that apples behave like natural numbers (which they do).

But let's continue this hypothetical. As I put my 2 delicious new apples into my bag, one falls out through a hole in my bag. So if I count how many I have, I see 3. I had 2 to begin with, and 2 more were given to me. It seems I have just shown 2+2=3, if I can prove things about numbers using apples.

The problem lies in the information that I abstracted away during the conversion from apples to numbers. Because my conversion from apples to numbers failed to include information about the hole, my abstract model gave incorrect results. Like my predictions about balls might end up being incorrect if I don't take into account every quark that composes it. Upon observing that the apple had fallen through the hole, I would realize that an event which rendered my model erroneous had occurred, so I would abstract this new event into -1, which would fix the error: 2+2-1=3.

To summarize this section, I can "prove" things about numbers using apples, but because apples are not simple numbers (they have many properties which numbers don't), when I fail to take into account certain apple properties which will affect the number of apples I have, I will get incorrect results about the numbers.

Apples vs Peano Arithmetic

We know that Peano Arithmetic describes numbers very well. Numbers emerge in PA; we designed PA to do so. If PA described unicorns instead, it wouldn't be very useful. And if PA emerges from the laws of physics (we can see PA emerge in mathematicians' minds, and even on pieces of paper in the form of writing), then the numbers which emerge from PA emerge from the laws of physics. So there is nothing magical about PA. It's just a system of "rules" (physical processes) from which numbers emerge, like apples (I patched up the hole in my bag ;) ).

Of course, PA is much more convenient for proving things about numbers than apples. But they are inherently just physical processes from which I have decided to ignore most details, to focus only on the numbers. In my bag of 3 apples, if I ignore that it's apples there, I get the number 3. In SSS0, if I forget about the whole physical process giving emergence to PA, I am just left with 3.

So I can go from 3 apples to the number 3 by removing details, and from the number 3 to PA by adding in a couple of details. I can likewise go from PA to numbers, and then to apples.

To Conclude, Predictions are "Proofs"

From all this, I conclude that numbers are simply an abstraction which emerges in many places thanks to our uniform laws of physics, much like the abstract concept "ball". I also conclude that what we classify as a "prediction" is in fact a "proof". It's simply using the rules to find other truths about the object. If I predict the trajectory of a ball, I am using the rules behind balls to get more information about the ball. If I use PA or apples to prove something about numbers, I am using the rules behind PA (or apples) to prove something about the numbers which emerge from PA (or apples). Of course, the proof with PA (or apples) is much more general than the "proof" about the ball's trajectory, because numbers are much more abstract than balls, and so they emerge in more places.

So my response to this part of Eliezer's post:

Never mind the question of why the laws of physics are stable - why is logic stable?

Logic is stable for the same reasons the laws of physics are stable. Logic emerges from the laws of physics, and the laws of physics themselves are stable (or so it seems). In this way, I dissolve the question and mix it with the question why the laws of physics are stable -- a question which I don't know enough to attempt to answer.

Edit: I'm going to retract this and try to write a clearer post. I still have not seen arguments which have fully convinced me I am wrong, though I still have a bit to digest.

## A follow-up probability question: Data samples with different priors

3 25 October 2012 08:07PM

(Rewritten entirely after seeing pragmatist's answer.)

In this post, helpful people including DanielLC gave me the multiply-odds-ratios method for combining probability estimates given by independent experts with a constant prior, with many comments about what to do when they aren't independent.  (DanielLC's method turns out to be identical to summing up the bits of information for and against the hypothesis, which is what I'd expected to be correct.)

I ran into problems applying this, because sometimes the prior isn't constant across samples.  Right now I'm combining different sources of information to choose the correct transcription start site for a gene.  These bacterial genes typically have from 1 to 20 possible start sites.  The prior is 1 / (number of possible sites).

Suppose I want to figure out the correct likelihood multiplier for the information that a start site overlaps the stop of the previous gene, which I will call property Q.  Assume this multiplier, lm, is constant, regardless of the prior.  This is reasonable, since we always factor out the prior.  Some function of the prior gives me the posterior probability that a site s is the correct start (Q(s) is true), given that O(s).  That's P(Q(s) | prior=1/numStarts, O(s)).

Suppose I look just at those cases where numStarts = 4, I find that P(Q(s) | numStarts=4, O(s)) = .9.

9:1 / 1:3 = 27:1

Or I can look at the cases where numStarts=2, and find that in these cases, P(Q(s) | numStarts=2, O(s)) = .95:

19:1 / 1:1 = 19:1

I want to take one pass through the data and come up with a single likelihood multiplier, rather than binning all the data into different groups by numStarts.  I think I can just compute it as

(sum of numerator : sum of denominator) over all cases s_i where O(s_i) is true, where

numerator = (numStarts_i-1) * Q(s_i)

denominator = (1-Q(s_i))

Is this correct?

## A probability question

6 19 October 2012 10:34PM

Suppose you have a property Q which certain objects may or may not have.  You've seen many of these objects; you know the prior probability P(Q) that an object has this property.

You have 2 independent measurements of object O, which each assign a probability that Q(O) (O has property Q).  Call these two independent probabilities A and B.

What is P(Q(O) | A, B, P(Q))?

To put it another way, expert A has opinion O(A) = A, which asserts P(Q(O)) = A = .7, and expert B says P(Q(O)) = B = .8, and the prior P(Q) = .4, so what is P(Q(O))?  The correlation between the opinions of the experts is unknown, but probably small.  (They aren't human experts.)  I face this problem all the time at work.

You can see that the problem isn't solvable without the prior P(Q), because if the prior P(Q) = .9, then two experts assigning P(Q(O)) < .9 should result in a probability lower than the lowest opinion of those experts.  But if P(Q) = .1, then the same estimates by the two experts should result in a probability higher than either of their estimates.  But is it solvable or at least well-defined even with the prior?

The experts both know the prior, so if you just had expert A saying P(Q(O)) = .7, the answer must be .7 .  Expert B's opinion B must revise the probability upwards if B > P(Q), and downwards if B < P(Q).

When expert A says O(A) = A, she probably means, "If I consider all the n objects I've seen that looked like this one, nA of them had property Q."

One approach is to add up the bits of information each expert gives, with positive bits for indications that Q(O) and negative bits that not(Q(O)).

## A Mathematical Explanation of Why Charity Donations Shouldn't Be Diversified

2 20 September 2012 11:03AM

There is a standard argument against diversification of donations, popularly explained by Steven Landsburg in the essay Giving Your All. This post is an attempt to communicate a narrow special case of that argument in a form that resists misinterpretation better, for the benefit of people with a bit of mathematical training. Understanding this special case in detail might be useful as a stepping stone to the understanding of the more general argument. (If you already agree that one should donate only to the charity that provides the greatest marginal value, and that it makes sense to talk about the comparison of marginal value of different charities, there is probably no point in reading this post.)1

Suppose you are considering two charities, one that accomplishes the saving of antelopes, and the other the saving of babies. Depending on how much funding these charities secure, they are able to save respectively A antelopes and B babies, so the outcome can be described by a point (A,B) that specifies both pieces of data.

Let's say you have a complete transitive preference over possible values of (A,B), that is you can make a comparison between any two points, and if you prefer (A1,B1) over (A2,B2) and also (A2,B2) over (A3,B3), then you prefer (A1,B1) over (A3,B3). Let's further suppose that this preference can be represented by a sufficiently smooth real-valued function U(A,B), such that U(A1,B1)>U(A2,B2) precisely when you prefer (A1,B1) to (A2,B2). U doesn't need to be a utility function in the standard sense, since we won't be considering uncertainty, it only needs to represent ordering over individual points, so let's call it "preference level".

Let A(Ma) be the dependence of the number of antelopes saved by the Antelopes charity if it attains the level of funding Ma, and B(Mb) the corresponding function for the Babies charity. (For simplicity, let's work with U, A, B, Ma and Mb as variables that depend on each other in specified ways.)

You are considering a decision to donate, and at the moment the charities have already secured Ma and Mb amounts of money, sufficient to save A antelopes and B babies, which would result in your preference level U. You have a relatively small amount of money dM that you want to distribute between these charities. dM is such that it's small compared to Ma and Mb, and if donated to either charity, it will result in changes of A and B that are small compared to A and B, and in a change of U that is small compared to U.

## [call-to-arms] Computer-based Math Education

-3 25 February 2012 07:18AM

TL;DR= There doesn't exist a course/curriculum/general textbook based on Conrad Wolfram's "Computer-Based Math Education" idea. Let's create an open-content one! .... if we can

By computer-based math, I don't mean "math as usual, now taught through a computer!" (a la Khan Academy) I mean "math where we let computers do the calculation drudge-work, while we do the interesting parts."

Or, paraphrasing Conrad Wolfram: "stop teaching kids how to take derivatives; that's what MathematicaTM is for. Just teach them what a derivative is, so we can move on to more interesting problems. Like, you know, the ones in the real world." (Here's Wolfram's original polemic about the issue.)

Obviously, this is controversial, and Wolfram spends most of his talk rebutting arguments against it. If, after reading them, you're still not convinced that this is a good idea, then start another thread to discuss it. I don't intend this thread to become a blues-vs-greens battleground. Seriously, just start another thread.

On the other hand, if you are convinced that Wolfram is on to something...

My problem with this whole venture is that it's too important (IMO) to be left to the Wolframs.

I mean, come on. Wolfram's basic thesis might be true, but it's no coincidence that this particular truth is being spouted by the brother of the guy who created Mathematica.

And, unfortunately, the Wolframs seem to be the only ones pushing for it. Which means that we won't get any "math, not computation!" courses/textbooks until they can find a taker.

Now I'm guessing that most LWers would want to reap the benefits of Wolfram's basic idea without having to pay his family a fortune for it, and before however long it takes them to convince an education board about it. (How many "How do I go about learning useful math?" threads have we had so far?)

So why don't we give the world a leg-up on the path to the widespread mathematical literacy that Wolfram promises? Why don't we put out a computer-based math course for the world?

Obviously, we'd have to use free stuff... Sage instead of Mathematica, for instance. And whatever we put out would have to be free, because... well, if you could write textbooks that people are likely to pay for, you wouldn't need to be part of an LW community venture to do it.

My major questions, therefore, are:

Are there enough (a) mathematically literate LWers with (b) tons of free time who (c) think computer-based math education is a good cause and (d) are willing to work for free toward a good cause?

## More intuitive explanations!

22 06 January 2012 06:10PM

The post on two easy to grasp explanations on Gödel's theorem and the Banach-Tarski paradox made me think of other explanations that I found easy or insightful and that I could share them as well.

1) Here is a nice proof of the Pythagorean theorem:

2) An easy and concise explanation of expected utility calculations by Luke Muehlhauser:

Decision theory is about choosing among possible actions based on how much you desire the possible outcomes of those actions.

How does this work? We can describe what you want with something called a utility function, which assigns a number that expresses how much you desire each possible outcome (or “description of an entire possible future”). Perhaps a single scoop of ice cream has 40 “utils” for you, the death of your daughter has -⁠274,000 utils for you, and so on. This numerical representation of everything you care about is your utility function.

We can combine your probabilistic beliefs and your utility function to calculate the expected utility for any action under consideration. The expected utility of an action is the average utility of the action’s possible outcomes, weighted by the probability that each outcome occurs.

Suppose you’re walking along a freeway with your young daughter. You see an ice cream stand across the freeway, but you recently injured your leg and wouldn’t be able to move quickly across the freeway. Given what you know, if you send your daughter across the freeway to get you some ice cream, there’s a 60% chance you’ll get some ice cream, a 5% your child will be killed by speeding cars, and other probabilities for other outcomes.

To calculate the expected utility of sending your daughter across the freeway for ice cream, we multiply the utility of the first outcome by its probability: 0.6 × 40 = 24. Then, we add to this the product of the next outcome’s utility and its probability: 24 + (0.05 × -⁠274,000) = -⁠13,676. And suppose the sum of the products of the utilities and probabilities for other possible outcomes was 0. The expected utility of sending your daughter across the freeway for ice cream is thus very low (as we would expect from common sense). You should probably take one of the other actions available to you, for example the action of not sending your daughter across the freeway for ice cream — or, some action with even higher expected utility.

A rational agent aims to maximize its expected utility, because an agent that does so will on average get the most possible of what it wants, given its beliefs and desires.

3) Micro- and macroevolution visualized.

5) Proof of Euler's formula using power series expansions.

12) Löb's theorem.

## Explained: Gödel's theorem and the Banach-Tarski Paradox

10 06 January 2012 05:23PM

I want to share the following explanations that I came across recently and which I enjoyed very much. I can't tell and don't suspect that they come close to an understanding of the original concepts but that they are so easy to grasp that it is worth the time if you don't already studied the extended formal versions of those concepts. In other words, by reading the following explanations your grasp of the matter will be less wrong than before but not necessarily correct.

### World's shortest explanation of Gödel's theorem

by Raymond Smullyan, '5000 BC and Other Philosophical Fantasies' via Mark Dominus (ask me for the PDF of the book)

We have some sort of machine that prints out statements in some sort of language. It needn't be a statement-printing machine exactly; it could be some sort of technique for taking statements and deciding if they are true. But let's think of it as a machine that prints out statements.

In particular, some of the statements that the machine might (or might not) print look like these:

 P*x (which means that the machine will print x) NP*x (which means that the machine will never print x) PR*x (which means that the machine will print xx) NPR*x (which means that the machine will never print xx)

For example, NPR*FOO means that the machine will never print FOOFOO. NP*FOOFOO means the same thing. So far, so good.

Now, let's consider the statement NPR*NPR*. This statement asserts that the machine will never print NPR*NPR*.

Either the machine prints NPR*NPR*, or it never prints NPR*NPR*.

If the machine prints NPR*NPR*, it has printed a false statement. But if the machine never prints NPR*NPR*, then NPR*NPR* is a true statement that the machine never prints.

So either the machine sometimes prints false statements, or there are true statements that it never prints.

So any machine that prints only true statements must fail to print some true statements.

Or conversely, any machine that prints every possible true statement must print some false statements too.

Mark Dominus further writes,

The proof of Gödel's theorem shows that there are statements of pure arithmetic that essentially express NPR*NPR*; the trick is to find some way to express NPR*NPR* as a statement about arithmetic, and most of the technical details (and cleverness!) of Gödel's theorem are concerned with this trick. But once the trick is done, the argument can be applied to any machine or other method for producing statements about arithmetic.

The conclusion then translates directly: any machine or method that produces statements about arithmetic either sometimes produces false statements, or else there are true statements about arithmetic that it never produces. Because if it produces something like NPR*NPR* then it is wrong, but if it fails to produce NPR*NPR*, then that is a true statement that it has failed to produce.

So any machine or other method that produces only true statements about arithmetic must fail to produce some true statements.

### The Banach-Tarski Paradox

by MarkCC

Suppose you have a sphere. You can take that sphere, and slice it into a finite number of pieces. Then you can take those pieces, and re-assemble them so that, without any gaps, you now have two spheres of the exact same size as the original.

[...]

How about this? Take the set of all natural numbers. Divide it into two sets: the set of even naturals, and the set of odd naturals. Now you have two infinite sets, the set {0, 2, 4, 6, 8, ...}, and the set {1, 3, 5, 7, 9, ...}. The size of both of those sets is the ω - which is also the size of the original set you started with.

Now take the set of even numbers, and map it so that for any given value i, f(i) = i/2. Now you've got a copy of the set of natural numbers. Take the set of odd naturals, and map them with g(i) = (i-1)/2. Now you've got a second copy of the set of natural numbers. So you've created two identical copies of the set of natural numbers out of the original set of natural numbers.

[...] math doesn't have to follow conservation of mass [...]. A sphere doesn't have a mass. It's just an uncountably infinite set of points with a particular collection of topological relationship and geometric relationships.

## Intuition and Mathematics

5 31 December 2011 06:58PM

While reading the answer to the question 'What is it like to have an understanding of very advanced mathematics?' I became curious about the value of intuition in mathematics and why it might be useful.

It usually seems to be a bad idea to try to solve problems intuitively or use our intuition as evidence to judge issues that our evolutionary ancestors never encountered and therefore were never optimized to judge by natural selection.

And so it seems to be especially strange to suggest that intuition might be a good tool to make mathematical conjectures. Yet people like fields medalist Terence Tao seem to believe that intuition should not be disregarded when doing mathematics,

...“fuzzier” or “intuitive” thinking (such as heuristic reasoning, judicious extrapolation from examples, or analogies with other contexts such as physics) gets deprecated as “non-rigorous”. All too often, one ends up discarding one’s initial intuition and is only able to process mathematics at a formal level, thus getting stalled at the second stage of one’s mathematical education.

The point of rigour is not to destroy all intuition; instead, it should be used to destroy bad intuition while clarifying and elevating good intuition. It is only with a combination of both rigorous formalism and good intuition that one can tackle complex mathematical problems;

The author mentioned at the beginning also makes the case that intuition is an important tool,

You are often confident that something is true long before you have an airtight proof for it (this happens especially often in geometry). The main reason is that you have a large catalogue of connections between concepts, and you can quickly intuit that if X were to be false, that would create tensions with other things you know to be true, so you are inclined to believe X is probably true to maintain the harmony of the conceptual space. It's not so much that you can imagine the situation perfectly, but you can quickly imagine many other things that are logically connected to it.

But what do those people mean when they talk about 'intuition', what exactly is its advantage? The author hints at an answer,

You go up in abstraction, "higher and higher". The main object of study yesterday becomes just an example or a tiny part of what you are considering today. For example, in calculus classes you think about functions or curves. In functional analysis or algebraic geometry, you think of spaces whose points are functions or curves -- that is, you "zoom out" so that every function is just a point in a space, surrounded by many other "nearby" functions. Using this kind of zooming out technique, you can say very complex things in short sentences -- things that, if unpacked and said at the zoomed-in level, would take up pages. Abstracting and compressing in this way allows you to consider extremely complicated issues while using your limited memory and processing power.

At this point I was reminded of something Scott Aaronson wrote in his essay 'Why Philosophers Should Care About Computational Complexity',

...even if computers were better than humans at factoring large numbers or at solving randomly-generated Sudoku puzzles, humans might still be better at search problems with “higher-level structure” or “semantics,” such as proving Fermat’s Last Theorem or (ironically) designing faster computer algorithms. Indeed, even in limited domains such as puzzle-solving, while computers can examine solutions millions of times faster, humans (for now) are vastly better at noticing global patterns or symmetries in the puzzle that make a solution either trivial or impossible. As an amusing example, consider the Pigeonhole Principle, which says that n+1 pigeons can’t be placed into n holes, with at most one pigeon per hole. It’s not hard to construct a propositional Boolean formula Φ that encodes the Pigeonhole Principle for some fixed value of n (say, 1000). However, if you then feed Φ to current Boolean satisfiability algorithms, they’ll assiduously set to work trying out possibilities: “let’s see, if I put this pigeon here, and that one there ... darn, it still doesn’t work!” And they’ll continue trying out possibilities for an exponential number of steps, oblivious to the “global” reason why the goal can never be achieved. Indeed, beginning in the 1980s, the field of proof complexity—a close cousin of computational complexity—has been able to show that large classes of algorithms require exponential time to prove the Pigeonhole Principle and similar propositional tautologies.

Again back to the answer on 'what it is like to have an understanding of very advanced mathematics'. The author writes,

...you are good at modularizing a conceptual space and taking certain calculations or arguments you don't understand as "black boxes" and considering their implications anyway. You can sometimes make statements you know are true and have good intuition for, without understanding all the details. You can often detect where the delicate or interesting part of something is based on only a very high-level explanation.

Humans are good at 'zooming out' to detect global patterns. Humans can jump conceptual gaps by treating them as "black boxes".

Intuition is a conceptual bird's-eye view that allows humans to draw inferences from high-level abstractions without having to systematically trace out each step. Intuition is a wormhole. Intuition allows us get from here to there given limited computational resources.

If true, it also explains many of our shortcomings and biases. Intuitions greatest feature is also our biggest flaw.

The introduction of suitable abstractions is our only mental aid to organize and master complexity. — Edsger W. Dijkstra

Our computational limitations make it necessary to take shortcuts and view the world as a simplified model. That heuristic is naturally prone to error and introduces biases. We draw connections without establishing them systematically. We recognize patterns in random noise.

Many of our biases can be seen as a side-effect of making judgments under computational restrictions. A trade off between optimization power and resource use.

It it possible to correct for the shortcomings of intuition other than by refining rationality and becoming aware of our biases? That's up to how optimization power scales with resources and if there are more efficient algorithms that work under limited resources.

## What mathematics to learn

6 23 November 2011 06:14PM

There is, of course, Kahn Academy for fundamentals. We have already had a discussion on How to learn math.

What resources exist detailing which mathematics to learn in what order? What resources exist that explain the utility of different mathematical subfields for the purpose of directing studies?

## Which fields of learning have clarified your thinking? How and why?

12 [deleted] 11 November 2011 01:04AM

Did computer programming make you a clearer, more precise thinker? How about mathematics? If so, what kind? Set theory? Probability theory?

Microeconomics? Poker? English? Civil Engineering? Underwater Basket Weaving? (For adding... depth.)

Anything I missed?

Context: I have a palette of courses to dab onto my university schedule, and I don't know which ones to chose. This much is for certain: I want to come out of university as a problem solving beast. If there are fields of inquiry whose methods easily transfer to other fields, it is those fields that I want to learn in, at least initially.

Rip apart, Less Wrong!

## Foundations of Inference

9 31 October 2011 07:48PM

I've recently been getting into all of this wonderful Information Theory stuff and have come across a paper (thanks to John Salvatier) that was written by Kevin H. Knuth:

Foundations of Inference

The paper sets up some intuitive minimal axioms for quantifying power sets and then (seems to) use them to derive Bayesian probability theory, information gain, and Shannon Entropy. The paper also claims to use less assumptions than both Cox and Kolmogorov when choosing axioms. This seems like a significant foundation/unification. I'd like to hear whether others agree and what parts of the paper you think are the significant contributions.

If a 14 page paper is too long for you, I recommend skipping to the conclusion (starting at the bottom of page 12) where there is a nice picture representation of the axioms and a quick summary of what they imply.

## Re-evaluate old beliefs

1 05 October 2011 01:18AM

I've noticed that, although people can become more rational, they don't win noticeably more.  We usually re-calibrate our self-confidence, become more stubborn, and make bigger errors.

Is it possible that the benefit from increasing your prediction accuracy is no greater than the loss incurred from taking riskier bets due to greater self-confidence?

## Free Tutoring in Math/Programming

70 29 September 2011 01:45PM

I enjoy teaching, and I'd like to do my bit for the Less Wrong community. I've tutored a few people on the #lesswrong IRC channel in freenode without causing permanent brain damage. Hence I'm extending my offer of free tutoring from #lesswrong to lesswrong.com.

I offer tutoring in the following programming languages:

2. C
3. Python
4. Scheme

I offer tutoring in the following areas of mathematics:

1. Elementary Algebra
2. Trigonometry
3. Calculus
4. Linear Algebra
5. Analysis
6. Abstract Algebra
7. Logic
8. Category Theory
9. Probability Theory
10. Combinatorics
11. Computational Complexity

If you're interested contact me. Contact details below:

IRC: PatrickRobotham

Skype: grey_fox26

Email: patrick.robotham2@gmail.com

## A math question

-3 18 June 2011 02:22AM

Is (4^^^4)/(3^^((3^^3)^^3)) larger than one?

I need to know for a game of Nomic

## A simple counterexample to deBlanc 2007?

3 30 May 2011 05:09AM

Peter de Blanc submitted a paper to arXiv.org in 2007 called "Convergence of Expected Utilities with Algorithmic Probability Distributions."  It claims to show that a computable utility function can have an expected value only if the utility function is bounded.

This is important because it implies that, if a utility function is unbounded, it is useless.  The purpose of a utility function is to compare possible actions k by choosing the k for which U(k) is maximal.  You can't do this if U(k) is undefined for any k, let alone for every k.

I don't know whether any agent we contemplate can have a truly unbounded utility function, since the universe is finite.  (The multiverse, supposing you believe in that, might not be finite; but as the utility function is meant to choose a single universe from the multiverse, I doubt that's relevant.)  But it is worth exploring, as computable functions are worth exploring despite not having infinitely long tapes for our Turing machines.  I previously objected that the decision process is not computable; but this is not important - we want to know whether the expected value exists, before asking how to compute (or approximate) it.

The math in the paper was too difficult for me to follow all the way through; so instead, I tried to construct a counterexample.  This counterexample does not work; the flaw is explained in one of comments below.  Can you find the flaw yourself?  This type of error is both subtle and common.  (The problem is not that the theorem actually proves that for any unbounded utility function, there is some set of possible worlds for which the expected value does not converge.)

## Training for math olympiads

6 29 May 2011 09:34AM

Lately I've resolved to try harder at teaching myself math so I have a better shot at the international olympiad (IMO). These basically involve getting, say, three really hard math problems and trying your best to solve them within 5 hours.

My current state:

• I have worked through a general math problem-solving guide (Art and Craft of Problem-Solving), a general math olympiad guide (A Primer for Mathematics Competitions) and practice problems.
• I've added all problems and solutions and theorems and techniques into an Anki deck. When reviewing, I do not re-solve the problem, I only try to remember any key insights and outline the solution method.
• I am doing n-back, ~20 sessions (1 hour) daily, in an attempt to increase my general intelligence (my IQ is ~125, sd 15).
• I am working almost permanently; akrasia is not much of a problem.
• I am not _yet_ at the level of IMO medallists.

What does the intrumental-rationality skill of LWers have to say about this? What recommendations do you guys have for improving problem-solving ability, in general and specifically for olympiad-type environments? Specifically,

• How should I spread my time between n-backing, solving problems, and learning more potentially-useful math?
• Should I take any nootropics? I am currently looking to procure some fish oil (I don't consume any normally) and perhaps a racetam. I have been experimenting with cycling caffeine weekends on, weekdays off (to prevent tolerance being developed), with moderate success (Monday withdrawal really sucks, but Saturday is awesome).
• Should I add the problems to Anki? It takes time to create the cards and review them; is that time better spent doing more problems?

## Free Stats Textbook: Principles of Uncertainty

20 24 May 2011 07:45PM

Joseph Kadane, emeritus at Carnegie Mellon, released his new statistics textbook Principles of Uncertainty as a free pdf. The book is written from a Bayesian perspective, covering basic probability, decision theory, conjugate distribution analysis, hierarchical modeling, MCMC simulation, and game theory. The focus is mathematical, but computation with R is touched on. A solid understanding of calculus seems sufficient to use the book. Curiously, the author devotes a fair number of pages to developing the McShane integral, which is equivalent to Lebesgue integration on the real line. There are lots of other unusual topics you don't normally see in an intermediate statistics textbook.

Having came across this today, I can't say whether it is actually very good or not, but the range of topics seems perfectly suited to Less Wrong readers.

## Gödel and Bayes: quick question

1 14 April 2011 06:12AM

Kurt Gödel showed that we could write within a system of arithmetic the statement, "This statement has no proof within the system," in such a way that we couldn't dismiss it as meaningless. This proved that if the system (or part of it) could prove the logical consistency of the whole, it would thereby contradict itself. We nevertheless think arithmetic does not contradict itself because it never has.

From what I understand we could write a version of the Gödel statement for the axioms of probability theory, or even for the system that consists of those axioms plus our current best guess at P(axioms' self-consistency). Edit: or not. According to the comments the Incompleteness Theorem does not apply until you have a stronger set of assumptions than the minimum you need for probability theory. So let's say you possess the current source code of an AGI running on known hardware. It's just now reached the point where it could pass the test of commenting extensively on LW without detection. (Though I guess we shouldn't yet assume this will continue once the AI encounters certain questions.) For some reason it tries to truthfully answer any meaningful question. (So nobody mention the Riemann hypothesis. We may want the AI to stay in this form for a while.) Whenever an internal process ends in a Jaynes-style error message that indicates a contradiction, the AI takes this as strong evidence of a contradiction in the relevant assumptions. Now according to my understanding we can take the source code and ask about a statement which says, "The program will never call this statement true." Happily the AI can respond by calling the statement "likely enough given the evidence." So far so good.

So, can we or can't we write a mathematically meaningful statement Q saying, "The program will never say 'P(Q)≥0.5'"? What about, "The program will never call 'P(Q)≥0.5' true (or logically imply it)"? How does the AI respond to questions about variations of these statements?

It seems as if we could form a similar question by modifying the Halting-Oracle Killer program to refute more possible responses to the question of its run-time, assuming the AI will know this simpler program's source code. Though it feels like a slightly different problem because we'd have to address a lot of possible responses directly – with the previous examples, if the AI doesn't kill us in one sense or another, we can go on to ask for clarification. Or we can say the AI wants to clarify any response that evades the question.

## Open Thread: Mathematics

11 13 February 2011 11:50PM

In Luke's recent post on what sort of posts we would like to see more of, one suggestion was "Open Thread: Math". This suggestion has been voted up by (at least) 12 people. Since it's going to take me less than 2 minutes to type this post, I figured I might as well just go ahead and post the thread, rather than vote up the suggestion.

So, this is an open thread on mathematics. As things stand, I have no idea what the rules should be (I don't know what the people who voted up the post suggestion expected the rules to be), but I guess the general principle should be that we have maths questions which are vaguely related to LW-type ideas, as there are plenty of more appropriate fora for general mathematical discussion already out there.

## An Overview of Formal Epistemology (links)

43 06 January 2011 07:57PM

The branch of philosophy called formal epistemology has very similar interests to those of the Less Wrong community. Formal epistemologists mostly work on (1) mathematically formalizing concepts related to induction, belief, choice, and action, and (2) arguing about the foundations of probability, statistics, game theory, decision theory, and algorithmic learning theory.

Those who value the neglected virtue of scholarship may want to study for themselves the arguments that have lead scholars either toward or against the very particular positions on formalizing language, decision theory, explanation, and probability typically endorsed at Less Wrong. As such, here's a brief overview of the field by way of some helpful links:

Enjoy.

## Recent results on lower bounds in circuit complexity.

7 09 November 2010 05:02AM

There's a new paper which substantially improves lower bounds for circuit complexity. The paper, by Ryan Williams, proves that NEXP does not have ACC circuits of third-exponential size.

This is a somewhat technical result (and I haven't read the proof yet), but there's a summary of what this implies at Scott Aaronson's blog. The main upshot is that this is a substantial improvement over prior circuit complexity bounds. This is relevant since circuit complexity bounds look to be one of the most promising methods to potentially show that P != NP. These results make circuit complexity bounds be still very far off from showing that. But this result looks like it in some ways might get around the relativization barrier and natural proof barriers which are major barriers to resolving P ?=NP.

## Learning the foundations of math

4 24 October 2010 07:29PM

I have recently become interested in the foundations of math. I am interested in tracing the fundamentals of math in a path such as: propositional logic -> first order logic -> set theory -> measure theory. Does anyone have any resources (books, webpages, pdfs etc.) they would like to recommend?

This seems like it would be a popular activity among LWers, so I thought this would be a good place to ask for advice.

My criteria (feel free to post resources which you think others who stumble across this might be interested in):

• The more basic the starting point the better: I would prefer a resource that defines propositional logic in terms of a context free grammar and an evaluation procedure (don't know if that is possible, but that's the sort of thing I am interested in) to one that just describes propositional logic in English; I would prefer a resource which builds first order logic from propositional logic + some definitions to one that just describes how first order logic works; etc.
• The fewer axioms (perhaps that's not quite the right word) the better. I prefer a resource defines describes propositional logic with just two operators (say negation and conjugation) and then builds the other operators of interest to one that defines it with 5 or 6 operators (I've seen many resources which do this).
• I expect that there are multiple ways to build math from basic building blocks. I am more interested in standard ways than than non-standard ways.

## Discuss: How to learn math?

12 [deleted] 09 October 2010 06:07PM

Learning math is hard. Those that have braved some of its depths, what did you discover that allowed you to go deeper?

This is a place to share insights, methods, and tips for learning mathematics effectively, as well as resources that contain this information.