You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Truth is holistic

9 MrMind 23 April 2015 07:26AM

You already know by now that truth is undefinable: by a famous result of Tarski, no formal system powerful enough (from now on, just system) can consistently talk about the truth of its own sentences.

You may however not know that Hamkins proved that truth is holistic.
Let me explain: while no system can talk about its own truth, it can nevertheless talk about the truth of its own substructures. For example, in every model of ZFC (the standard axioms of set theory) you can consistently define a model of standard arithmetics and a predicate that works as arithmetics' truth predicate. This can happen because ZFC is strictly more powerful than PA (the axioms of standard arithmetics).
Intuitively, one could think that if you have the same substructure in two different models, what they believe is the truth about that substructure is the same in both. Along this line, two models of ZFC ought to believe the same things about standard arithmetics.
However, it turns out this is not the case. Two different models extending ZFC may very well agree on which entities are standard natural numbers, and yet still disagree about which arithmetic sentences are true or false. For example, they could agree about the standard numbers, how the successor and addition operator works, and yet disagree on multiplication (corollary 7.1 in Hamkins' paper).
This means that when you can talk consistently about the truth of a model (that is, when you are in a more powerful formal system), that truth depends not only on the substructure, but on the entire structure you're immersed in. Figuratively speaking, local truth depends on global truth. Truth is holistic.
There's more: suppose that two model agree on the ontology of some common substructure. Suppose also that they agree about the truth predicate on that structure: they could still disagree about the meta-truths. Or the meta-meta-truths, etc., for all the ordinal levels of the definable truth predicates.

Another striking example from the same paper. There are two different extensions of set theory which agree on the structure of standard arithmetics and on the members of a subset A of natural numbers, and yet one thinks that A is first-order definable while the other thinks it's not (theorem 10).

Not even "being a model of ZFC" is an absolute property: there are two models which agree on an initial segment of the set hierarchy, and yet one thinks that the segment is a model of ZFC while the other proves that it's not (theorem 12).

Two concluding remarks: what I wrote was that there are different models which disagrees the truth of standard arithmetics, not that every different model has different arithmetic truths. Indeed, if two models have access one to the truth relation of the other, then they are bound to have the same truths. This is what happens for example when you prove absoluteness results in forcing.
I'm also remembered of de Blanc's ontological crises: changing ontology can screw with your utility function. It's interesting to note that updating (that is, changing model of reality) can change what you believe even if you don't change ontology.

Meditations on Löb's theorem and probabilistic logic [LINK]

10 Quinn 10 August 2014 09:41PM

A post on my own blog following a MIRIx workshop from two weekends ago.

http://qmaurmann.wordpress.com/2014/08/10/meditations-on-l-and-probabilistic-logic/

Reproducing the intro:

This post is a second look at The Definability of Truth in Probabilistic Logic, a preprint by Paul Christiano and other Machine Intelligence Research Institute associates, which I first read and took notes on a little over one year ago.

In particular, I explore relationships between Christiano et al’s probabilistic logic and stumbling blocks for self-reference in classical logic, like the liar’s paradox (“This sentence is false”) and in particular Löb’s theorem.

The original motivation for the ideas in this post was an attempt to prove a probabilistic version of Löb’s theorem to analyze the truth-teller sentences (“This sentence is [probably] true”) of probabilistic logic, an idea that came out of some discussions at a MIRIx workshop that I hosted in Seattle.

What should a Bayesian do given probability of proving X vs. of disproving X?

0 PhilGoetz 07 June 2014 06:40PM

Consider some disputed proposition X. Suppose there appeared to be a limited number of ways of proving and of disproving X. No one has yet constructed a proof or disproof, but you have a feeling for how likely it is that someone will.

For instance, take Fermat's Last Theorem or the 4-color problem. For each of them, at one point in time, there was no proof, but people had some sense of the prior probability of observing the lack of a counterexample given the space searched so far. They could use that to assign a probability of there being a counterexample (and hence a disproof) [1]. Later, there was an alleged proof, and people could estimate the probability that the proof was correct based on the reputation of the prover and the approach used. At that point, people could assign values to both P(will_be_proven(X)) and P(will_be_disproven(X)).

Is it reasonable to assign P(X) = P(will_be_proven(X)) / (P(will_be_proven(X)) + P(will_be_disproven(X))) ?

If so, consider X = "free will exists". One could argue that the term "free will" is defined such that it is impossible to detect it, or to prove that it exists. But if one could prove that the many worlds interpretation of quantum mechanics is correct, that would constitute a disproof of X. Then P(will_be_proven(X)) / (P(will_be_proven(X)) + P(will_be_disproven(X))) = 0.

Is it possible for this to happen when you know that X is not undecidable? If so, what do you do then?

 

1. The computation is not as simple as it might appear, because you need to adjust for the selection effect of mathematicians being interested only in conjectures with no counterexamples.

To like, or not to like?

2 PhilGoetz 14 November 2013 02:26AM

Do you like Shakespeare?

I've been reading the Paris Review interviews with famous authors of the 20th century. Famous authors don't always like other famous authors. Hemingway, Faulkner, Joyce, Fitzgerald — for all of them, you could find some famous author who found them unreadable. (Especially Joyce and Faulkner.)

Except Shakespeare. Everyone loved Shakespeare. In fact, those who mentioned Shakespeare sometimes said he was the best author who has ever lived.

How likely is this?

continue reading »

Notes/blog posts on two recent MIRI papers

21 Quinn 14 July 2013 11:11PM

I've been learning math lately; specifically I've been reading MIRI's recent research preprints and the prerequisite material. In order to actually learn math, I typically have to write it down again, usually with more details and context. I started a blog to make my notes on these papers public, and I think they're of high enough quality that I ought to share them here.

Note: my use of the pronoun "we" is instilled habit; I am not claiming to have helped develop the core ideas herein.

 

  • Löb's Theorem and the Prisoner's Dilemma is an account of the LaVictoire et al paper Robust Cooperation in the Prisoner's Dilemma.
  • Details in Provability Logic is a technical followup to the above, which goes into the details of modal logic needed for the LaVictoire et al paper; namely the normal form theorem, the fixed point theorem, and the decidability of GL via Kripke semantics.
  • Definability of Truth in Probabilistic Logic goes through the Christiano et al paper of the same name. It's a little rougher around the edges on account of being the first blog post I ever wrote (and being produced more hastily than the other two). I note that the construction doesn't truly require the Axiom of Choice.

 

 

[LINK] Cantor's theorem, the prisoner's dilemma, and the halting problem

13 Qiaochu_Yuan 30 June 2013 08:26PM

I wouldn't normally link to a post from my math blog here, but it concerns a cute interpretation of Cantor's theorem that showed up when I was thinking about program equilibria at the April MIRI workshop, so I thought it might be of interest here (e.g. if you're trying to persuade a mathematically inclined friend of yours to attend a future workshop). A short proof of the undecidability of the halting problem falls out as a bonus. 

Probabilistic Löb theorem

24 Stuart_Armstrong 26 April 2013 06:45PM

In this post (based on results from MIRI's recent workshop), I'll be looking at whether reflective theories of logical uncertainty (such as Paul's design) still suffer from Löb's theorem.

Theories of logical uncertainty are theories which can assign probability to logical statements. Reflective theories are theories which know something about themselves within themselves. In Paul's theory, there is an external P, in the meta language, which assigns probabilities to statements, an internal P, inside the theory, that computes probabilities of coded versions of the statements inside the language, and a reflection principle that relates these two P's to each other.

And Löb's theorem is the result that if a (sufficiently complex, classical) system can prove that "a proof of Q implies Q" (often abbreviated as □Q → Q), then it can prove Q. What would be the probabilistic analogue? Let's use □aQ to mean P('Q')≥1-a (so that □0Q is the same as the old □Q; see this post on why we can interchange probabilistic and provability notions). Then Löb's theorem in a probabilistic setting could:

Probabilistic Löb's theorem: for all a<1, if the system can prove □aQ → Q, then the system can prove Q.

To understand this condition, we'll go through the proof of Löb's theorem in a probabilistic setting, and see if and when it breaks down. We'll conclude with an example to show that any decent reflective probability theory has to violate this theorem.

continue reading »

Logic in the language of probability

12 Stuart_Armstrong 26 April 2013 06:45PM

This post is a minor note, to go along with the post on the probabilistic Löb theorem. It simply seeks to justify why terms like "having probability 1" are used interchangeably with "provable" and why implications symbols "→" can be used in a probabilistic setting.

Take a system of classical logic, with a single rule of inference: modus ponens:

From A and A→B, deduce B.

Having a single rule of inference isn't much of a restriction, because you can replace other rules of inference ("from A1,A2,... and An, deduce B") with an axiom or axiom schema ("A1∧A2∧...∧An → B") and then use modus ponens on that axiom to get the other rule of inference.

In this logical system, I'm now going to make some purely syntactical changes - not changing the meaning of anything, just the way we write things. For any sentence A that doesn't contain an implication arrow →, replace

A with P(A)=1.

Similarly, replace any sentence of the type

A → B with P(B|A)=1.

This is recursive, so we replace

(A → B) → C with P(C | P(B|A)=1 )=1.

And instead of using modus ponens, we'll use a combined Bayesian inference and law of total probability:

From P(A)=1 and P(B|A)=1, deduce P(B)=1.

continue reading »

Thoughts on the frame problem and moral symbol grounding

0 Stuart_Armstrong 11 March 2013 04:18PM

(some thoughts on frames, grounding symbols, and Cyc)

The frame problem is a problem in AI to do with all the variables not expressed within the logical formalism - what happens to them? To illustrate, consider the Yale Shooting Problem: a person is going to be shot with a gun, at time 2. If that gun is loaded, the person dies. The gun will get loaded at time 1. Formally, the system is:

  • alive(0)     (the person is alive to start with)
  • ¬loaded(0)     (the gun begins unloaded)
  • true → loaded(1)     (the gun will get loaded at time 1)
  • loaded(2) → ¬alive(3)     (the person will get killed if shot with a loaded gun)

So the question is, does the person actually die? It would seem blindingly obvious that they do, but that isn't formally clear - we know the gun was loaded at time 1, but was it still loaded at time 2? Again, this seems blindingly obvious - but that's because of the words, not the formalism. Ignore the descriptions in italics, and the names of the suggestive LISP tokens.

Since that's hard to do, consider the following example. Alicorn, for instance, hates surprises - they make her feel unhappy. Let's say that we decompose time into days, and that a surprise one day will ruin her next day. Then we have a system:

  • happy(0)     (Alicorn starts out happy)
  • ¬surprise(0)     (nobody is going to surprise her on day 0)
  • true → surprise(1)     (somebody is going to surprise her on day 1)
  • surprise(2) → ¬happy(3)     (if someone surprises her on day 2, she'll be unhappy the next day)

So here, is Alicorn unhappy on day 3? Well, it seems unlikely - unless someone coincidentally surprised her on day 2. And there's no reason to think that would happen! So, "obviously", she's not unhappy on day 3.

Except... the two problems are formally identical. Replace "alive" with "happy" and "loaded" with "surprise". And though our semantic understanding tells us that "(loaded(1) → loaded (2))" (guns don't just unload themselves) but "¬(surprise(1) → surprise(2))" (being surprised one day doesn't mean you'll be surprised the next), we can't tell this from the symbols.

And we haven't touched on all the other problems with the symbolic setup. For instance, what happens with "alive" on any other time than 0 and 3? Does that change from moment to moment? If we want the words to do what we want, we need to put in a lot of logical conditionings, so that our intuitions are all there.

This shows that there's a connection between the frame problem and symbol grounding. If we and the AI both understand what the symbols mean, then we don't need to specify all the conditionals - we can simply deduce them, if asked ("yes, if the person is dead at 3, they're also dead at 4").  But conversely, if we have a huge amount of logical conditioning, then there is less and less that the symbols could actually mean. The more structure we put in our logic, the less structures there are in the real world that fit it ("X(i) → X(i+1)" is something that can apply to being dead, not to being happy, for instance).

This suggests a possible use for the Cyc project - the quixotic attempt to build an AI by formalising all of common sense ("Bill Clinton belongs to the collection of U.S. presidents" and "all trees are plants"). You're very unlikely to get an AI through that approach - but it might be possible to train an already existent AI with it. Especially if the AI had some symbol grounding, then there might not be all that many structures in the real world that could correspond to that mass of logical relations. Some symbol grounding + Cyc + the internet - and suddenly there's not that many possible interpretations for "Bill Clinton was stuck up a tree". The main question, of course, is whether there is a similar restricted meaning for "this human is enjoying a worthwhile life".

Do I think that's likely to work? No. But it's maybe worth investigating. And it might be a way of getting across ontological crises: you reconstruct a model as close as you can to your old one, in the new formalism.

Pigliucci's comment on Yudkowsky's and Dai's stance on morality and logic

1 mapnoterritory 05 January 2013 08:05AM

Pigliucci:

So morality has a lot to do with logic — indeed I have argued that moral reasoning is a type of applied logical reasoning — but it is not logic “all the way down,” it is anchored by certain contingent facts about humanity, bonoboness and so forth.

 

But, despite Yudkowsky’s confident claim, morality isn’t a matter of logic “all the way down,” because it has to start with some axioms, some brute facts about the type of organisms that engage in moral reasoning to begin with. Those facts don’t come from physics (though, like everything else, they better be compatible with all the laws of physics), they come from biology. A reasonable theory of ethics, then, can emerge only from a combination of biology (by which I mean not just evolutionary biology, but also cultural evolution) and logic.

 

http://rationallyspeaking.blogspot.de/2013/01/lesswrong-on-morality-and-logic.html

Morality Isn't Logical

19 Wei_Dai 26 December 2012 11:08PM

What do I mean by "morality isn't logical"? I mean in the same sense that mathematics is logical but literary criticism isn't: the "reasoning" we use to think about morality doesn't resemble logical reasoning. All systems of logic, that I'm aware of, have a concept of proof and a method of verifying with high degree of certainty whether an argument constitutes a proof. As long as the logic is consistent (and we have good reason to think that many of them are), once we verify a proof we can accept its conclusion without worrying that there may be another proof that makes the opposite conclusion. With morality though, we have no such method, and people all the time make moral arguments that can be reversed or called into question by other moral arguments. (Edit: For an example of this, see these posts.)

Without being a system of logic, moral philosophical reasoning likely (or at least plausibly) doesn't have any of the nice properties that a well-constructed system of logic would have, for example, consistency, validity, soundness, or even the more basic property that considering arguments in a different order, or in a different mood, won't cause a person to accept an entirely different set of conclusions. For all we know, somebody trying to reason about a moral concept like "fairness" may just be taking a random walk as they move from one conclusion to another based on moral arguments they encounter or think up.

In a recent post, Eliezer said "morality is logic", by which he seems to mean... well, I'm still not exactly sure what, but one interpretation is that a person's cognition about morality can be described as an algorithm, and that algorithm can be studied using logical reasoning. (Which of course is true, but in that sense both math and literary criticism as well as every other subject of human study would be logic.) In any case, I don't think Eliezer is explicitly claiming that an algorithm-for-thinking-about-morality constitutes an algorithm-for-doing-logic, but I worry that the characterization of "morality is logic" may cause some connotations of "logic" to be inappropriately sneaked into "morality". For example Eliezer seems to (at least at one point) assume that considering moral arguments in a different order won't cause a human to accept an entirely different set of conclusions, and maybe this is why. To fight this potential sneaking of connotations, I suggest that when you see the phrase "morality is logic", remind yourself that morality isn't logical.

 

Help Reform A Philosophy Curriculum

22 JonathanLivengood 08 December 2012 10:45PM

A couple of days ago, Luke posted a recommendation for reforming how philosophy is taught. My department at the University of Illinois is in the midst of some potentially large-scale changes.* Hence, now seems to be a great time to think about concrete steps towards reforming or partially reforming the curriculum in an actual philosophy department. I would appreciate some help thinking through how to make changes that will (a) improve the philosophy education of our undergraduates, (b) recruit and retain better students, (c) improve faculty experiences with teaching philosophy, and (d) be salable to the rest of the philosophy faculty. To some extent, this post is me thinking out loud through what I want to say to my department's curriculum committee (probably in January).

 

How Things Stand Right Now

In this section, I will try to lay out the situation as I see it right now.

First, we have the following problem: philosophy courses that we offer are not sufficiently gated. In the mathematics department at my university, you can't take mathematical logic until you've taken a course called Fundamental Mathematics, which looks to be a class about proof techniques, mathematical induction, etc. And you can't take that until you've taken the second semester of calculus. Computer science, economics, physics, and most every other science curriculum works like this. If you want to take advanced courses, you have to pass through the gates of less advanced courses, which (theoretically, at least) prepare you for the material covered in the more advanced course.

By contrast, in the philosophy department, you may take a senior-level (400 at my school) course after taking a freshman-level (100 at my school) introduction to philosophy. The result is that students who take our 400-level courses are typically unprepared. At least, that has been my experience. (Shockingly, many students taking 400-level classes then complain that they were expected to know things about philosophy!) A big part of the problem here is that we do not presently have enough faculty to cover intermediate-level courses on a regular enough basis, and let's be honest, faculty members don't usually want to teach lower-level courses anyway.

Second, we have the following resource: our department currently has strong and growing connections with several world-class science or science-related departments. We have cross-appointed faculty and/or cross-listed courses with mathematics, linguistics, psychology, and physics, all of which are very strong departments. We have philosophy graduate students who do research and teach courses in these disciplines as well. And I am hoping to expand our connections to include computer science and statistics. I think there ought to be a good way to make use of these resources.

Rather than trying to reform the entire philosophy curriculum all at once, I want to focus first on our logic offerings. What we have now is the following mess.

  • 102 -- Introduction to Logic: A critical thinking course almost never taught by a faculty member.
  • 103 -- Quantitative Introduction to Logic: An introductory formal logic course taught by me about half the time
  • 202 -- Symbolic Logic: A basic symbolic logic course (unclear in how it is different from 103 except in that it is completely restricted to deductive logic)
  • 307 -- Elements of Semantics and Pragmatics: Cross-listed with linguistics
  • 407 -- Logic and Linguistics: Cross-listed with linguistics
  • 453 -- Formal Logic and Philosophy: An extension of 202 but with emphasis on philosophical issues
  • 454 -- Advanced Symbolic Logic: Basically, a math logic course covering completeness, compactness, Lowenheim-Skolem, incompleteness, and undecidability

The only pre-requisite for 453 and 454 is 202, and 202 has no pre-requisites at all; the pre-req for 407 is 307, and 307 depends on a 100-level linguistics course or (more commonly) consent of the instructor. We also have a 400-level philosophy of mathematics course. Along with these, the mathematics department has a 400-level mathematical logic course and a 400-level course on set theory and topology, neither of which is currently cross-listed, but both of which, I think, should be cross-listed as philosophy courses, which would also raise the bar for the philosophy students interested in logic by requiring that they take the calculus sequence and the fundamentals course.

On the defects side, I think we have poor use of gating and spotty coverage of even deductive logic. For example, we have no courses on modal logics, we have no courses on intuitionist/constructivist logic, we have no courses on relevance logic, we have no courses on more exotic logics, we have no courses on set theory or category theory, and we have no courses on computation. We get sort of close to the last two in 453/454, and we might address them more directly by cross-listing with mathematics. But as it is, we do not do those things. And we have a huge gaping hole where inductive logic, probability theory, statistics, causal inference, and so on should be. On an individual level, that hole could be filled somewhat by taking courses in statistics; however, that is not quite the same as having courses available on confirmation theory or inductive logics.


Recommendations

So, now you know the basic situation ... what to do?

Below are some specific recommendations that I want to make to my department's curriculum committee. I would really appreciate input on how to refine my recommendations, how to make them more palatable, and so on.

First, we need to make the 100- and 200-level courses connect in a relevant way. I recommend entirely relabeling (and maybe even renumbering) 102 so that it is clear that it is a terminal, service course intended for non-majors. The course should stand to philosophy education as courses like Physics Made Easy stands to physics education. I further recommend making a relabeled (and maybe renumbered) version of 103 a pre-requisite for all 200-level logic courses. In terms of material covered, ideally 103 would introduce symbolic conventions (to be made as standard as possible across the curriculum), proof skills, and basic ideas in model theory, set theory, and probability theory. (I go back and forth between liking this idea, which fits closely with how I teach 103 now, and wanting to do something more like, deductive logic in the first half and confirmation theory in the second half with no formal exposure to set theory. The biggest barrier to the second approach is in formally developing confirmation theory without set theory.) Then 202 could do more meta-logic, go into more detail on model theory, go into more detail on set theory, or whatever.

Second, we need more courses covering inductive logic, probability theory, statistics, and so on. I recommend adding a 200-level course parallel to 202, which would cover some probability theory and some causal and statistical reasoning. Let's call this proposed course PHIL 204, since we don't offer anything under that number right now. I have in mind something slightly more advanced than CMU's Open Learning Initiative course here.

Third, I recommend expanding our 300-level course-offerings as follows. We need a second semester of inductive logic (etc.) that builds off of 204. And we need a course that does a simple survey of exotic logics, like modal logics, intuitionistic logic, relevance logic, free logic, etc. The 300-level survey of exotic logics need not be a pre-req for 400-level courses, provided the 400-level courses cover the same material that they have been covering. And that seems fine to me, although it might be in our long-term interests to drop our 454 in the event that we cross-list with mathematics on their math logic course. Depending on how their math logic course is taught, we might try to convince them that our 202 should satisfy the pre-req as effectively as their fundamentals course. (I don't know if that would be a hard sell or not.)

Fourth, I recommend cross-listing some courses with mathematics and statistics to get more regular coverage of the tools we want our students to have without necessarily having our faculty teach those courses all the time.

 

What Is The Goal Here?

I haven't spent any time in this write-up thinking about the goal(s) of logic education in philosophy. I am not sure whether it would be worth backing up to address this question or whether there is sufficient implicit agreement about the value and goals of logic to leave it alone. If you think I should be saying something about the place of logic in the overall curriculum or making an argument for teaching logic at all or making an argument for understanding logic broadly enough to get probability and statistics through the door, please tell me and make a suggestion about how to develop the heading.

 

--------------------------------------------------------------------------------------------------------------------

*My department is currently much too small relative to the size of my university, but the powers that be have recently become receptive to our requests to expand (really, to replace a large number of retirements from the last five years). Hence, we are about to undergo an external review, which we hope will result in a plan for phased growth of the department to a little more than twice its current size by the end of the decade. (Yes, our numbers are seriously depleted!)

The Emergence of Math

1 AnotherIdiot 02 November 2012 01:08AM

In his latest post, Logical Pinpointing, Eliezer talks about the nature of math. I have my own views on the subject, but when writing about them, it became too long for a comment, so I'm posting it here in discussion.

I think it's important to clarify that I am not posting this under the guise that I am correct. I'm just posting my current view of things, which may or may not be correct, and which has not been assessed by others yet. So though I have pushed myself to write in a confident (and clear) tone, I am not actually confident in my views ... well, okay, I'm lying. I am confident in my views, but I know I shouldn't be, so I'm trying to pretend to myself that I'm posting this to have my mind changed, when in reality I'm posting it with the stupid expectation that you'll all magically be convinced. Thankfully, despite all that, I don't have a tendency to cling to beliefs which I have seen proven wrong, so I will change my mind when someone kicks my belief's ass. I won't go down without a fight, but once I'm dead, I'm dead.

Before I can share my views, I need to clarify a few concepts, and then I'll go about showing what I believe and why.

Abstractions

Abstract models are models in which some information is ignored. Take, for example, the abstract concept of the ball. I don't care if the ball is made of rubber or steel, nor do I care if it has a radius of 6.7cm or a metre, a ball is a ball. I couldn't care less about the underlying configuration of quarks in the ball, as long as it's a sphere.

Numbers are also abstractions. If I've got some apples, when I abstract this into a number, I don't care that they are apples. All I care about is how many there are. I can conveniently forget about the fact that it's apples, and about the concept of time, and the hole in my bag through which the apples might fall.

Axiomatic Systems (for example, Peano Arithmetic)

Edit: clarified this paragraph a bit.

I don't have much to say about these for now, except that they can all be reduced to physics. Given that mathematicians store axiomatic systems in their minds, and use them to prove things, they cannot help but be reducible to physical things, unless the mind itself is not physical. I think most LessWrongers, being reductionists, believe this, so I won't go into much more detail. I'll just say that this will be important later on.

I Can Prove Things About Numbers Using Apples

Let's say I have 2 apples. Then someone gives me 2 more. I now have 4. I have just shown that 2+2=4, assuming that apples behave like natural numbers (which they do).

But let's continue this hypothetical. As I put my 2 delicious new apples into my bag, one falls out through a hole in my bag. So if I count how many I have, I see 3. I had 2 to begin with, and 2 more were given to me. It seems I have just shown 2+2=3, if I can prove things about numbers using apples.

The problem lies in the information that I abstracted away during the conversion from apples to numbers. Because my conversion from apples to numbers failed to include information about the hole, my abstract model gave incorrect results. Like my predictions about balls might end up being incorrect if I don't take into account every quark that composes it. Upon observing that the apple had fallen through the hole, I would realize that an event which rendered my model erroneous had occurred, so I would abstract this new event into -1, which would fix the error: 2+2-1=3.

To summarize this section, I can "prove" things about numbers using apples, but because apples are not simple numbers (they have many properties which numbers don't), when I fail to take into account certain apple properties which will affect the number of apples I have, I will get incorrect results about the numbers.

Apples vs Peano Arithmetic

We know that Peano Arithmetic describes numbers very well. Numbers emerge in PA; we designed PA to do so. If PA described unicorns instead, it wouldn't be very useful. And if PA emerges from the laws of physics (we can see PA emerge in mathematicians' minds, and even on pieces of paper in the form of writing), then the numbers which emerge from PA emerge from the laws of physics. So there is nothing magical about PA. It's just a system of "rules" (physical processes) from which numbers emerge, like apples (I patched up the hole in my bag ;) ).

Of course, PA is much more convenient for proving things about numbers than apples. But they are inherently just physical processes from which I have decided to ignore most details, to focus only on the numbers. In my bag of 3 apples, if I ignore that it's apples there, I get the number 3. In SSS0, if I forget about the whole physical process giving emergence to PA, I am just left with 3.

So I can go from 3 apples to the number 3 by removing details, and from the number 3 to PA by adding in a couple of details. I can likewise go from PA to numbers, and then to apples.

To Conclude, Predictions are "Proofs"

From all this, I conclude that numbers are simply an abstraction which emerges in many places thanks to our uniform laws of physics, much like the abstract concept "ball". I also conclude that what we classify as a "prediction" is in fact a "proof". It's simply using the rules to find other truths about the object. If I predict the trajectory of a ball, I am using the rules behind balls to get more information about the ball. If I use PA or apples to prove something about numbers, I am using the rules behind PA (or apples) to prove something about the numbers which emerge from PA (or apples). Of course, the proof with PA (or apples) is much more general than the "proof" about the ball's trajectory, because numbers are much more abstract than balls, and so they emerge in more places.

So my response to this part of Eliezer's post:

Never mind the question of why the laws of physics are stable - why is logic stable?

Logic is stable for the same reasons the laws of physics are stable. Logic emerges from the laws of physics, and the laws of physics themselves are stable (or so it seems). In this way, I dissolve the question and mix it with the question why the laws of physics are stable -- a question which I don't know enough to attempt to answer.

 

Edit: I'm going to retract this and try to write a clearer post. I still have not seen arguments which have fully convinced me I am wrong, though I still have a bit to digest.

Formalising cousin_it's bounded versions of Gödel's theorem

10 Stuart_Armstrong 29 June 2012 11:24PM

In this post, I'll try and put cousin_it's bounded version of Gödel's theorem on a rigorous footing. This will be logic-heavy, not programming heavy. The key unproved assumptions are listed at the bottom of the post.

Notes on terminology:

  • T is our deductive system.
  • ⊥ is a canonical contradiction in T. ¬A is defined to be A→⊥.
  • T ⊢ A means that T proves the proposition A.
  • T ⊢j A means that T proves A in a proof requiring no more than j symbols.
  • □ A is the statement "there exists a number that is an encoding in Gödel numbering of a valid proof in T of A".
  • n A is the statement "there exists a number that is an encoding in Gödel numbering of a valid proof in A with no more than n symbols in the proof".

The first needed fact is that there is a computable function g such that if T ⊢n A, then T ⊢g(n) □n A. In informal language, this means that if T can prove A in j symbols, it can prove that it can do so, in g(j) symbols. This g will be the critical piece of the infrastructure.

Now, these arguments work with general definitions of proofs, but I like to put exact bounds where I can and move uncertainty to a single place. So I will put some restrictions on the language we use, and what counts as a proof. First, I will assume the symbols "(", ")", "j", and "=" are part of the language - ( and ) as parentheses, and j as a free variable.

I will require that proofs start and end with parenthesis. A proof need not state what statement it is proving! As long as there is a computable process that can check that "p is a proof of A", p need not mention A. This means that I can make the following assumptions:

  • If p is a proof of A→B and q a proof of A, then (pq) is a proof of B (modus ponens).
  • If p is a proof of A→B and q a proof of ¬B, then (pq) is a proof of ¬A (modus tollens).
  • A proof that A↔B will be treated as a valid proof of A→B and B→A also.
  • Well-formed sentences with free variables can also have proofs (as in "j=0 or j>0" and "j>3 → j2>9").
  • If A and B have free variable j, and p is a proof that A→B, then (p(j=n)) is a proof that A(n)→B(n).
continue reading »

Adaptive Logics: the best of both worlds?

7 Stuart_Armstrong 19 April 2012 09:39AM

Consider the following premises from which we would like to reason:

  1. Abraham Lincoln was born in Michigan.
  2. Abraham Lincoln was born in Kentucky.
  3. Eliezer Yudkowsky has a beard or he has blond hair.
  4. Eliezer Yudkowsky doesn't have blond hair.

We'd like to be able to conclude that Eliezer Yudkowsky has a beard, but we have a problem doing that. Classical logic can certainly derive that result, but because (1) and (2) contradict each other, by the principle of explosion, classical logic can derive any result - including the fact that Eliezer doesn't have a beard. If we use paraconsistent logics then we're safe from explosion. But because we've been rejecting the disjunctive syllogism in order to avoid explosion, we can't actually put (3) and (4) together in order to get the result we want.

Informally, what we'd want to do is something like "use classical logic wherever there's no contradictions, but restrict to paraconsistent logic whenever there are". Adaptive logics attempt to implement this idea. The presentation in this post will be even more informal than usual, as the ideas of adaptive logics are more useful than the specific implementation I've seen.

Between sky and earth, between classical and relevance

Adaptive logics make use of two logical systems: an upper limit logic ULL (generally take to be classical logic) and a weaker lower limit logic LLL (we will be using a relevance logic for this). Adaptive logics will then start with a collection of premises (that may include contradictions), and reason from there.

The basic rules are that proofs using the LLL are always valid (since it is a weaker logic, they are also valid for the ULL). Proofs using the ULL are valid if they did not use a contradiction in their reasoning. In the example above, we would freely apply classical logic when using premises (3) and (4), but would restrict to relevance logic when using premises (1) and (2).

This might be enough for UDT; we would label the antecedent A()==a in the (A()==a → U()==u) statement as (potentially) contradictory, and only allow relevance logic to use it as a premise, while hitting everything else that moves with classical logic.

continue reading »

If you can't be right, at least be relevant

13 Stuart_Armstrong 18 April 2012 10:17AM

We've done the motivation, and the warm-ups, and it's now time to get to the meat of this arc: relevance logics. These try to recapture a more intuitive meaning to the material implication, "if... then..." and "→". Consider for instance the following:

  • "It rains in Spain" → ("Darwin died for our sins" → "It rains in Spain")
  • ("It rains in Spain" and "It doesn't rain in Spain") → "Ravens are the stupidest of birds"

Both of these are perfectly acceptable classical logical theorems. The problem is that there doesn't seem to be any connection between the different components: no logical leap between rain, Darwin or ravens. These are all tautologies, so we could substitute any other propositions in here and it would still be true. In common speech, "if A then B" implies some sort of relevant connection between A and B. Relevance logics attempt to do the same for the formal "→".

Unfortunately, it is easier to motivate relevance logics, and write formal axiomatic systems for them, than it is to pin down exactly what "relevance" means - see the later section on relevance semantics. One oft-mentioned requirement is that the expressions on both sides of "→" share propositional variables. A propositional variable is a basic sentence such as A:"It rains in Spain". Sharing A would mean that the whole expression would be something like:

(blah ∧ blah ∨ A ∧ blah)  →  (blah ∧ ¬A ∨ blah ∧ blah)

Unfortunately, though this is a necessary condition for relevance, it is not sufficient. Even the hated disjunctive syllogism (the very thing we are trying to avoid) shares variables. In theorem form, it is:

(A ∨ B) ∧ ¬B  →  A

So let's defer worrying about what relevance means, and look at what relevance is.

continue reading »

Who's afraid of impossible worlds?

12 Stuart_Armstrong 17 April 2012 12:56PM

In order to clarify the semantics of paraconsistent and relevance logics, we need to make a detour into impossible worlds - a fruitful detour opening up fun new vistas. Note that this is an intuitive introduction to the subject, and logic is probably the area of mathematics where it is the most dangerous to rely on your intuition; this is no substitute for rigorously going through the formal definitions. With that proviso in mind, let's get cracking.

 

Possible worlds: the meaning of necessity

Modal logics were developed around the concept of necessity and possibility. They do this with two extra operators: the necessity operator □, and possibility operator ◊. The sentence □A is taken to mean "it is necessary that A" and ◊A means "it is possible that A". The two operators are dual to each other – thus "it is necessary that A" is the same as "it not possible that not-A" (in symbols: □A ↔ ¬◊¬A). A few intuitive axioms then lead to an elegant theory.

There was just one problem: early modal logicians didn't have a clue what they were talking about. They had the syntax, the symbols, the formal rules, but they didn't have the semantics, the models, the meanings of their symbols.

To see the trouble they had, imagine someone tossing a coin and covering it with their hand. Call wH the world in which it comes out heads and wT the world in which it comes out, you guessed it, tails. Now, is the coin necessarily heads? Is it possibly heads?

Intuitively, the answers should be no and yes. But this causes a problem. We may be in the world wH. So if we agree that the coin is not necessarily heads, then it is not necessarily heads even though it is actually heads (forget your Bayescraft here and start thinking like a logician). Similarly, in wT, the coin is in actuality tails yet it is possibly heads.

Saul Kripke found the breakthrough: necessity and possibility are not about individual worlds, but about collections of possible worlds, and relationships between them. In this case, there is a indistinguishability relationship between wT and wH, because we can't (currently) tell them apart.

Because of this relationship, the statement A:"the coin is heads" is possible in both wT and wH. The rule is that a statement is possible in world w if it is true in at least one world that is related to w. For w=wH, ◊A is true because A is true in wH and wH is related to itself. Similarly, for w=wT, ◊A is true because A is true in wH and wH is related to wT.

Conversely B:"the coin is heads or the coin is tails" is necessary in both wT and wH: here the rule is that a statement is necessary in world w if it is true in all worlds related to w. wT and ware related only to each other through the indistinguishability relationship, and B is true in both of them, so □B is also true in both of them. However □A is not true in either wT or wH, because both those worlds are related to wT and A is false in wT.

continue reading »

Logic of Paradox: a (too) simple paraconsistent logic

7 Stuart_Armstrong 05 April 2012 10:10AM

The logic of paradox (LP) is the simplest, and one of the oldest, of the paraconsistent logics. Instead of assigning truths to statements A, it instead uses relationships binary relationships v(A,1) and v(A,0). "A is true" is encoded by v(A,1); "A is false" is similarly encoded by v(A,0). Each statement A is required to be true or false, but it can be both (in which case we could say that "A is undetermined"). "A is strictly true" means v(A,1) but not v(A,0); strict falsity is the converse.

The usual symbols →, ∨, ∧ and ¬, retain their standard meanings, and a compound statement takes all possible values it could take, seeing all the possible values its components could take. So, for instance if A is (strictly) true, then ¬A is (strictly) false. If A is undetermined, then ¬A is undetermined. If A is undetermined and B is strictly false, then A ∨ B is undetermined and A ∧ B is strictly false - though if B were strictly true, then A ∨ B would be strictly true and A ∧ B undetermined.

These properties make LP quite easy to work with, and one can determine the truths of many statements using truth tables. In fact, it can be seen that every tautology of classical logic is a tautology of LP. This derives from the fact that tautologies are true regardless of the truth values of their components; hence they remain true in LP whether we take undertermined statements to be true or false. Consequently, all of the following are true in LP:

  1. A  → (B → A)
  2. (A ∧ ¬A)  → B
  3. ((A ∨ B)  ∧ ¬A) → B
  4. A ∧ (A → B)  → B

But wait a second. Isn't the second line a statement of the principle of explosion - the fact that we can derive anything from a contradiction? Indeed it is. LP can state the principle of explosion as a (true) theorem - but it can't actually use it as a rule of deduction. Similarly, the third line is a statement of the disjunctive syllogism - a true theorem, but not a valid rule of deduction. That is easy to see: let A be undetermined, and B strictly false. Then A ∨ B is true, and so is ¬A - and yet we cannot deduce that B is true from this information.

So LP can accept contradictions without blowing up, has all the tautologies of classical logic, but lacks some of the rules of inference. "Some" of the rules of inference? LP even lacks modus ponens! As before, let A be undetermined and B strictly false; then A and (A → B) are both true, but B is not.

So while LP is a pleasant logic to play with, it isn't particularly useful. Another weakness is that is still defines the material conditional (A → B) as (¬A ∨ B): false statements still imply anything, and we haven't solved the Löbian problem for UDT. In the next post, I'll look at relevance logics, which have a more restricted use of →, and do allow modus ponens.

Paraconsistency and relevance: avoid logical explosions

9 Stuart_Armstrong 04 April 2012 11:57AM

EDIT: corrected from previous version.

If the moon is made of cheese, then Rafael Delago was elected president of Ecuador in 2005.

If you believe that Kennedy was shot in 1962, then you must believe that Santa Claus is the Egyptian god of the dead.

Both of these are perfectly sound arguments of classical logic. The premise is false, hence the argument is logically correct, no matter what the conclusion is: if A is false, then A→B is true.

It does feel counterintuitive, though, especially because human beliefs do not work in this way. Consider instead the much more intuitive statement:

If you believe that Kennedy was shot in 1962, then you must believe that Lee Harry Oswald was also shot in 1962.

Here there seems to be a connection between the two clauses; we feel A→B is more justified when "→" actually does some work in establishing a relationship between A and B. But can this intuition be formalised?

One way to do so is to use relevance logics, which are a subset of "paraconsistent" logics. Paraconsistent logics are those that avoid the principle of explosion. This is the rule in classical logic that if you accept one single contradiction - one single (A and not-A) - then you can prove everything. This is akin to accepting one false belief that contradict your other beliefs - after that, anything goes. The contradiction explodes and takes everything down with it. But why would we be interested in avoiding either the principle of explosion or unjustified uses of "→"?

continue reading »

Second order logic, in first order set-theory: what gives?

10 Stuart_Armstrong 23 February 2012 12:29PM

With thanks to Paul Christiano

My previous post left one important issue unresolved. Second order logic needed to make use of set theory in order to work its magic, pin down a single copy of the reals and natural numbers, and so on. But set theory is a first order theory, with all the problems that this brings - multiple models, of uncontrollable sizes. How can these two facts be reconciled?

Quite simply, it turns out: for any given model of set theory, the uniqueness proofs still work. Hence the proper statement is:

  • For any model of set theory, there is a unique model of reals and natural numbers obeying the second order axioms.

Often, different models of set theory will have the same model of the reals inside them; but not always. Countable models of set theory, for instance, will have a countable model of the reals. So models of the reals can be divided into three categories:

  1. The standard model of the reals, the unique field that obeys the second order axioms inside standard models of set theory (and some non-standard models of set theory as well).
  2. Non-standard models of the reals, that obey the second order axioms inside non-standard models of set theory.
  3. Non-standard models of the reals that obey the first order axioms, but do not obey the second order axioms in any model of set theory.

And similarly for the natural numbers.

"The Conditional Fallacy in Contemporary Philosophy"

2 gwern 31 January 2012 09:06PM

Split from "Against Utilitarianism: Sobel's attack on judging lives' goodness" for length.

Robert K. Shope, back in his 1978 paper "The Conditional Fallacy in Contemporary Philosophy", identified a kind of argument that us transhumanists will find painfully familiar: you propose idea X, the other person says bad thing Y is a possible counterexample if X were true, so X can't be true - ignoring that Y may not happen, and X can just be modified to deal with Y if it's really that important.

("If we augment our brains, we may forget how to love!" "So don't remove love when you're augmenting, sheesh." "But it might not be possible!" "But wouldn't you agree that augmentation without loss of love would be better than the status quo?")

Excerpts follow:

continue reading »

Bayes Slays Goodman's Grue

0 potato 17 November 2011 10:45AM

This is a first stab at solving Goodman's famous grue problem. I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem. I haven't looked at many proposed solutions to this paradox, besides some of the basic ones in "The New Problem of Induction". So, I apologize now if my solution is wildly unoriginal. I am willing to put you through this dear reader because:

  1. I wanted to see how I would fare against this still largely open, devastating, and classic problem, using only the arsenal provided to me by my minimal Bayesian training, and my regular LW reading.
  2. I wanted the first LW article about the grue problem to attack it from a distinctly Lesswrongian aproach without the benefit of hindsight knowledge of the solutions of non-LW philosophy. 
  3. And lastly, because, even if this solution has been found before, if it is the right solution, it is to LW's credit that its students can solve the grue problem with only the use of LW skills and cognitive tools.

I would also like to warn the savvy subjective Bayesian that just because I think that probabilities model frequencies, and that I require frequencies out there in the world, does not mean that I am a frequentest or a realist about probability. I am a formalist with a grain of salt. There are no probabilities anywhere in my view, not even in minds; but the theorems of probability theory when interpreted share a fundamental contour with many important tools of the inquiring mind, including both, the nature of frequency, and the set of rational subjective belief systems. There is nothing more to probability than that system which produces its theorems. 

Lastly, I would like to say, that even if I have not succeeded here (which I think I have), there is likely something valuable that can be made from the leftovers of my solution after the onslaught of penetrating critiques that I expect form this community. Solving this problem is essential to LW's methods, and our arsenal is fit to handle it. If we are going to be taken seriously in the philosophical community as a new movement, we must solve serious problems from academic philosophy, and we must do it in distinctly Lesswrongian ways.

 


 

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green."

That is the inference that the grue problem threatens, courtesy of Nelson Goodman.  The grue problem starts by defining "grue":

"An object is grue iff it is first observed before time T, and it is green, or it is first observed after time T, and it is blue."

So you see that before time T, from the list of premises:

"The first emerald ever observed was green.
 The second emerald ever observed was green.
 The third emerald ever observed was green.
 … etc.
 The nth emerald ever observed was green."
 (we will call these the green premises)

it follows that:

"The first emerald ever observed was grue.
The second emerald ever observed was grue.
The third emerald ever observed was grue.
… etc.
The nth emerald ever observed was grue."
(we will call these the grue premises)

The proposer of the grue problem asks at this point: "So if the green premises are evidence that the next emerald will be green, why aren't the grue premises evidence for the next emerald being grue?" If an emerald is grue after time T, it is not green. Let's say that the green premises brings the probability of "A new unobserved emerald is green." to 99%. In the skeptic's hypothesis, by symmetry it should also bring the probability of "A new unobserved emerald is grue." to 99%. But of course after time T, this would mean that the probability of observing a green emerald is 99%, and the probability of not observing a green emerald is at least 99%, since these sentences have no intersection, i.e., they cannot happen together, to find the probability of their disjunction we just add their individual probabilities. This must give us a number at least as big as 198%, which is of course, a contradiction of the Komolgorov axioms. We should not be able to form a statement with a probability greater than one.

This threatens the whole of science, because you cannot simply keep this isolated to emeralds and color. We may think of the emeralds as trials, and green as the value of a random variable. Ultimately, every result of a scientific instrument is a random variable, with a very particular and useful distribution over its values. If we can't justify inferring probability distributions over random variables based on their previous results, we cannot justify a single bit of natural science. This, of course, says nothing about how it works in practice. We all know it works in practice. "A philosopher is someone who say's, 'I know it works in practice, I'm  trying to see if it works in principle.'" - Dan Dennett

We may look at an analogous problem. Let's suppose that there is a table and that there are balls being dropped on this table, and that there is an infinitely thin line drawn perpendicular to the edge of the table somewhere which we are unaware of. The problem is to figure out the probability of the next ball being right of the line given the last results. Our first prediction should be that there is a 50% chance of the ball being right of the line, by symmetry. If we get the result that one ball landed right of the line, by Laplace's rule of succession we infer that there is a 2/3ds chance that the next ball will be right of the line. After n trials, if every trial gives a positive result, the probability we should assign to the next trial being positive as well is n+1/n +2.

If this line was placed 2/3ds down the table, we should expect that the ratio of rights to lefts should approach 2:1. This gives us a 2/3ds chance of the next ball being a right, and the fraction of Rights out of trials approaches 2/3ds ever more closely as more trials are performed.

Now let us suppose a grue skeptic approaching this situation. He might make up two terms "reft" and "light". Defined as you would expect, but just in case:

"A ball is reft of the line iff it is right of it before time T when it lands, or if it is left of it after time T when it lands.
 A ball is light of the line iff it is left of the line before time T when it lands, or if it is right of the line after time T when it first lands."

The skeptic would continue:

"Why should we treat the observation of several occurrences of Right, as evidence for 'The next ball will land on the right.' and not as evidence for 'The next ball will land reft of the line.'?"

Things for some reason become perfectly clear at this point for the defender of Bayesian inference, because now we have an easy to imaginable model. Of course, if a ball landing right of the line is evidence for Right, then it cannot possibly be evidence for ~Right; to be evidence for Reft, after time T, is to be evidence for  ~Right, because after time T, Reft is logically identical to ~Right; hence it is not evidence for Reft, after time T, for the same reasons it is not evidence for ~Right. Of course, before time T, any evidence for Reft is evidence for Right for analogous reasons.

But now the grue skeptic can say something brilliant, that stops much of what the Bayesian has proposed dead in its tracks:

"Why can't I just repeat that paragraph back to you and swap every occurrence of 'right' with 'reft' and 'left' with 'light', and vice versa? They are perfectly symmetrical in terms of their logical realtions to one another.
If we take 'reft' and 'light' as primitives, then we have to define 'right' and 'left' in terms of 'reft' and 'light' with the use of time intervals."

What can we possibly reply to this? Can he/she not do this with every argument we propose then? Certainly, the skeptic admits that Bayes, and the contradiction in Right & Reft, after time T, prohibits previous Rights from being evidence of both Right and Reft after time T; where he is challenging us is in choosing Right as the result which it is evidence for, even though "Reft" and "Right" have a completely symmetrical syntactical relationship. There is nothing about the definitions of reft and right which distinguishes them from each other, except their spelling. So is that it? No, this simply means we have to propose an argument that doesn't rely on purely syntactical reasoning. So that if the skeptic performs the swap on our argument, the resulting argument is no longer sound.

What would happen in this scenario if it were actually set up? I know that seems like a strangely concrete question for a philosophy text, but its answer is a helpful hint. What would happen is that after time T, the behavior of the ratio: 'Rights:Lefts' as more trials were added, would proceed as expected, and the behavior of the ratio: 'Refts:Lights' would approach the reciprocal of the ratio: 'Rights:Lefts'. The only way for this to not happen, is for us to have been calling the right side of the table "reft", or for the line to have moved. We can only figure out where the line is by knowing where the balls landed relative to it; anything we can figure out about where the line is from knowing which balls landed Reft and which ones landed Light, we can only figure out because in knowing this and and time, we can know if the ball landed left or right of the line.

To this I know of no reply which the grue skeptic can make. If he/she say's the paragraph back to me with the proper words swapped, it is not true, because  In the hypothetical where we have a table, a line, and we are calling one side right and another side left, the only way for Refts:Lefts behave as expected as more trials are added is to move the line (if even that), otherwise the ratio of Refts to Lights will approach the reciprocal of Rights to Lefts.

This thin line is analogous to the frequency of emeralds that turn out green out of all the emeralds that get made. This is why we can assume that the line will not move, because that frequency has one precise value, which never changes. Its other important feature is reminding us that even if two terms are syntactically symmetrical, they may have semantic conditions for application which are ignored by the syntactical model, e.g., checking to see which side of the line the ball landed on.

 


 

In conclusion:

Every random variable has as a part of it, stored in its definition/code, a frequency distribution over its values. By the fact that somethings happen sometimes, and others happen other times, we know that the world contains random variables, even if they are never fundamental in the source code. Note that "frequency" is not used as a state of partial knowledge, it is a fact about a set and one of its subsets.

The reason that:

"The first emerald ever observed was green.
The second emerald ever observed was green.
The third emerald ever observed was green.
… etc.
The nth emerald ever observed was green.
(conclusion):
There is a very high probability that a never before observed emerald will be green."

is a valid inference, but the grue equivalent isn't, is that grue is not a property that the emerald construction sites of our universe deal with. They are blind to the grueness of their emeralds, they only say anything about whether or not the next emerald will be green. It may be that the rule that the emerald construction sites use to get either a green or non-green emerald change at time T, but the frequency of some particular result out of all trials will never change; the line will not move. As long as we know what symbols we are using for what values, observing many green emeralds is evidence that the next one will be grue, as long as it is before time T, every record of an observation of a green emerald is evidence against a grue one after time T. "Grue" changes meanings from green to blue at time T, 'green'''s meaning stays the same since we are using the same physical test to determine green-hood as before; just as we use the same test to tell whether the ball landed right or left. There is no reft in the universe's source code, and there is no grue. Green is not fundamental in the source code, but green can be reduced to some particular range of quanta states; if you had the universes source code, you couldn't write grue without first writing green; writing green without knowing a thing about grue would be just as hard as while knowing grue. Having a physical test, or primary condition for applicability, is what privileges green over grue after time T; to have a physical consistent test is the same as to reduce to a specifiable range of physical parameters; the existence of such a test is what prevents the skeptic from performing his/her swaps on our arguments.


Take this more as a brainstorm than as a final solution. It wasn't originally but it should have been. I'll write something more organized and consize after I think about the comments more, and make some graphics I've designed that make my argument much clearer, even to myself. But keep those comments coming, and tell me if you want specific credit for anything you may have added to my grue toolkit in the comments.

Edward Nelson claims proof of inconsistency in Peano Arithmetic

13 JoshuaZ 27 September 2011 12:46PM

We've discussed Edward Nelson's beliefs and work before. Now, he claims to have a proof of a contradiction in Peano Arithmetic; which if correct is not that specific to PA but imports itself into much weaker systems. I'm skeptical of the proof but haven't had the time to look at it in detail. There seem to be two possible weakpoints in his approach. His approach is to construct a system Q_0^* which looks almost but not quite a fragment of PA and then show that PA both proves this system's consistency and proves its inconsistency. 

First, he may be mis-applying the Hilbert-Ackermann theorem-when it applies is highly technical and can be subtle. I don't know enough to comment on that in detail. The second issue is that in trying to show that he can use finitary methods to show there's a contradiction in Q_0^* he may have proven something closer to Q_0^* being omega-inconsistent. Right now, I'm extremely skeptical of this result.  

If anyone is going to find an actual contradiction in PA or ZFC it would probably be Nelson. There some clearly interesting material here such as using a formalization of the  surprise examiation/unexpected hanging to get a new proof of of Godel's Second Incompleteness Theorem. The exact conditions which this version of Godel's theorem applies  may be different from the conditions under which the standard theorem can be proven. 

Syntacticism

-3 ec429 23 September 2011 06:49AM

I've mentioned in comments a couple of times that I don't consider formal systems to talk about themselves, and that consequently Gödelian problems are irrelevant.  So what am I actually on about?

It's generally accepted in mathematical logic that a formal system which embodies Peano Arithmetic (PA) is able to talk about itself, by means of Gödel numberings; statements and proofs within the system can be represented as positive integers, at which point "X is a valid proof in the system" becomes equivalent to an arithmetical statement about #X, the Gödel number representing X.  This is then diagonalised to produce the Gödel sentence (roughly, g="There is no proof X such that the last line of X is g", and incompleteness follows.  We can also do things like defining □ ("box") as the function from S to "There is a proof X in PA whose last line is S" (intuitively, □S says "S is provable in PA").  This then also lets us define the Löb sentence, and many other interesting things.

But how do we know that □S ⇔ there is a proof of S in PA?  Only by applying some meta-theory.  And how do we know that statements reached in the meta-theory of the form "thus-and-such is true of PA" are true of PA?  Only by applying a meta-meta-theory.  There is no a-priori justification for the claim that "A formal system is in principle capable of talking about other formal systems", which claim is used by the proof that PA can talk about itself.  (If I remember correctly, to prove that □ does what we think it does, we have to appeal to second-order arithmetic; and how do we know second-order arithmetic applies to PA?  Either by invoking third-order arithmetic to analyse second-order arithmetic, or by recourse to an informal system.)

Note also that the above is not a strange loop through the meta-level; we justify our claims about arithmeticn by appeal to arithmeticn+1, which is a separate thing; we never find ourselves back at arithmeticn.

Thus the claim that formal systems can talk about themselves involves ill-founded recursion, what is sometimes called a "skyhook".  While it may be a theorem of second-order arithmetic that "the strengthened finite Ramsey theorem is unprovable in PA", one cannot conclude from second-order arithmetic alone that the "PA" in that statement refers to PA.  It is however provable in third-order arithmetic that "What second-order arithmetic calls "PA" is PA", but that hasn't gained us much - it only tells us that second- and third-order arithmetic call the same thing "PA", it doesn't tell us whether this "PA" is PA.  Induct on the arithmetic hierarchy to reach the obvious conclusion.  (Though note that none of this prevents the Paris-Harrington Theorem from being a theorem of n-th order arithmetic ∀n≥2)

What, then, is the motivation for the above?  Well, it is a basic principle of my philosophy that the only objects that are real (in a Platonic sense) are formal systems (or rather, syntaxes).  That is to say, my ontology is the setclass of formal systems.  (This is not incompatible with the apparent reality of a physical universe; if this isn't obvious, I'll explain why in another post.)  But if we allow these systems to have semantics, that is, we claim that there is such a thing as a "true statement", we start to have problems with completeness and consistency (namely, that we can't achieve the one and we can't prove the other, assuming PA).  Tarski's undefinability theorem protects us from having to deal with systems which talk about truth in themselves (because they are necessarily inconsistent, assuming some basic properties), but if systems can talk about each other, and if systems can talk about provability within themselves (that is, if analogues to the □ function can be constructed), then nasty Gödelian things end up happening (most of which are, to a Platonist mathematician, deeply unsatisfying).

So instead we restrict the ontology to syntactic systems devoid of any semantics; the statement ""Foo" is true" is meaningless.  There is a fact-of-the-matter as to whether a given statement can be reached in a given formal system, but that fact-of-the-matter cannot be meaningfully talked about in any formal system.  This is a remarkably bare ontology (some consider it excessively so), but is at no risk from contradiction, inconsistency or paradox.  For, what is "P∧¬P" but another, syntactic, sentence?  Of course, applying a system which proves "P∧¬P" to the 'real world' is likely to be problematic, but the paradox or the inconsistency lies in the application of the system, and does not inhere in the system itself.

EDIT: I am actually aiming to get somewhere with this, it's not just for its own sake (although the ontological and epistemological status of mathematics is worth caring about for its own sake).  In particular I want to set up a framework that lets me talk about Eliezer's "infinite set atheism", because I think he's asking a wrong question.

Followed up by: The Apparent Reality of Physics

A potential problem with using Solomonoff induction as a prior

13 JoshuaZ 07 April 2011 07:27PM

There's a problem that has occurred to me that I haven't seen discussed anywhere: I don't think people actually wants to assign zero probability to all hypotheses which are not Turing computable. Consider the following hypothetical: we come up with a theory of everything that seems to explain all the laws of physics but there's a single open parameter (say the fine structure constant). We compute a large number of digits of this constant, and someone notices that when expressed in base 2, the nth digit seems to be 1 iff the nth Turing machine halts on the blank tape for some fairly natural ordering of all Turing machines. If we confirm this for a large number of digits (not necessarily consecutive digits- obviously some of the 0s won't be confirmable) shouldn't we consider the hypothesis the digits really are given by this simple but non-computable function? But if our priors assign zero probability to all non-computable hypotheses then this hypothesis must always be stuck with zero probability.

If the universe is finite we could approximate this function with a function which was instead "Halts within K" steps where K is some large number, but intutively this seems like a more complicated hypothesis than the original one.

I'm not sure what is a reasonable prior in this sort of context that handles this sort of thing. We don't want an uncountable set of priors. It might make sense to use something like hypotheses which are describable in Peano arithmetic or something like that.

 

Certainty estimates in areas outside one's expertise

7 JoshuaZ 27 December 2010 08:56PM

One issue that I've noticed in discussions on Less Wrong is that I'm much less certain about the likely answers to specific questions than some other people on Less Wrong. But the questions where this seems to be most pronounced are mathematical questions that are close to my area of expertise (such as whether P = NP). In areas outside my expertise, my apparent confidence is apparently often higher. Thus, for example at a recent LW meet-up I expressed a much lower probability estimate that cold fusion is real than what others in the conversation estimated. This suggests that I may be systematically overestimating  my confidence in areas that I don't study as much, essentially a variant of the Dunning-Krueger effect. Have other people here experienced the same pattern with their own confidence estimates?

Bayesian Doomsday Argument

-5 DanielLC 17 October 2010 10:14PM

First, if you don't already know it, Frequentist Doomsday Argument:

There's some number of total humans. There's a 95% chance that you come after the last 5%. There's been about 60 to 120 billion people so far, so there's a 95% chance that the total will be less than 1.2 to 2.4 trillion.

I've modified it to be Bayesian.

First, find the priors:

Do you think it's possible that the total number of sentients that have ever lived or will ever live is less than a googolplex? I'm not asking if you're certain, or even if you think it's likely. Is it more likely than one in infinity? I think it is too. This means that the prior must be normalizable.

If we take P(T=n) ∝ 1/n, where T is the total number of people, it can't be normalized, as 1/1 + 1/2 + 1/3 + ... is an infinite sum. If it decreases faster, it can at least be normalized. As such, we can use 1/n as an upper limit.

Of course, that's just the limit of the upper tail, so maybe that's not a very good argument. Here's another one:

We're not so much dealing with lives as life-years. Year is a pretty arbitrary measurement, so we'd expect the distribution to be pretty close for the majority of it if we used, say, days instead. This would require the 1/n distribution.

After that,

T = total number of people

U = number you are

P(T=n) ∝ 1/n
U = m
P(U=m|T=n) ∝ 1/n
P(T=n|U=m) = P(U=m|T=n) * P(T=n) / P(U=m)
= (1/n^2) / P(U=m)
P(T>n|U=m) = ∫P(T=n|U=m)dn
= (1/n) / P(U=m)
And to normalize:
P(T>m|U=m) = 1
= (1/m) / P(U=m)
m = 1/P(U=m)
P(T>n|U=m) = (1/n)*m
P(T>n|U=m) = m/n

So, the probability of there being a total of 1 trillion people total if there's been 100 billion so far is 1/10.

There's still a few issues with this. It assumes P(U=m|T=n) ∝ 1/n. This seems like it makes sense. If there's a million people, there's a one-in-a-million chance of being the 268,547th. But if there's also a trillion sentient animals, the chance of being the nth person won't change that much between a million and a billion people. There's a few ways I can amend this.

First: a = number of sentient animals. P(U=m|T=n) ∝ 1/(a+n). This would make the end result P(T>n|U=m) = (m+a)/(n+a).

Second: Just replace every mention of people with sentients.

Third: Take this as a prediction of the number of sentients who aren't humans who have lived so far.

The first would work well if we can find the number of sentient animals without knowing how many humans there will be. Assuming we don't take the time to terreform every planet we come across, this should work okay.

The second would work well if we did tereform every planet we came across.

The third seems a bit wierd. It gives a smaller answer than the other two. It gives a smaller answer than what you'd expect for animals alone. It does this because it combines it for a Doomsday Argument against animals being sentient. You can work that out separately. Just say T is the total number of humans, and U is the total number of animals. Unfortunately, you have to know the total number of humans to work out how many animals are sentient, and vice versa. As such, the combined argument may be more useful. It won't tell you how many of the denizens of planets we colonise will be animals, but I don't think it's actually possible to tell that.

One more thing, you have more information. You have a lifetime of evidence, some of which can be used in these predictions. The lifetime of humanity isn't obvious. We might make it to the heat death of the universe, or we might just kill each other off in a nuclear or biological war in a few decades. We also might be annihilated by a paperclipper somewhere in between. As such, I don't think the evidence that way is very strong.

The evidence for animals is stronger. Emotions aren't exclusively intelligent. It doesn't seem animals would have to be that intelligent to be sentient. Even so, how sure can you really be. This is much more subjective than the doomsday part, and the evidence against their sentience is staggering. I think so anyway, how many animals are there at different levels of intelligence?

Also, there's the priors for total human population so far. I've read estimates vary between 60 and 120 billion. I don't think a factor of two really matters too much for this discussion.

So, what can we use for these priors?

Another issue is that this is for all of space and time, not just Earth.

Consider that you're the mth person (or sentient) from the lineage of a given planet. l(m) is the number of planets with a lineage of at least m people. N is the total number of people ever, n is the number on the average planet, and p is the number of planets.

l(m)/N
=l(m)/(n*p)
=(l(m)/p)/n

l(m)/p is the portion of planets that made it this far. This increases with n, so this weakens my argument, but only to a limited extent. I'm not sure what that is, though. Instinct is that l(m)/p is 50% when m=n, but the mean is not the median. I'd expect a left-skew, which would make l(m)/p much lower than that. Even so, if you placed it at 0.01%, this would mean that it's a thousand times less likely at that value. This argument still takes it down orders of magnitude than what you'd think, so that's not really that significant.

Also, a back-of-the-envolope calculation:

Assume, against all odds, there are a trillion times as many sentient animals as humans, and we happen to be the humans. Also, assume humans only increase their own numbers, and they're at the top percentile for the populations you'd expect. Also, assume 100 billion humans so far.

n = 1,000,000,000,000 * 100,000,000,000 * 100

n = 10^12 * 10^11 * 10^2

n = 10^25

Here's more what I'd expect:

Humanity eventually puts up a satilite to collect solar energy. Once they do one, they might as well do another, until they have a dyson swarm. Assume 1% efficiency. Also, assume humans still use their whole bodies instead of being a brain in a vat. Finally, assume they get fed with 0.1% efficiency. And assume an 80-year lifetime.

n = solar luminosity * 1% / power of a human * 0.1% * lifetime of Sun / lifetime of human

n = 4 * 10^26 Watts * 0.01 / 100 Watts * 0.001 * 5,000,000,000 years / 80 years

n = 2.5 * 10^27

By the way, the value I used for power of a human is after the inefficiencies of digesting.

Even with assumptions that extreme, we couldn't use this planet to it's full potential. Granted, that requires mining pretty much the whole planet, but with a dyson sphere you can do that in a week, or two years with the efficiency I gave.

It actually works out to about 150 tons of Earth per person. How much do you need to get the elements to make a person?

Incidentally, I rewrote the article, so don't be surprised if some of the comments don't make sense.