Confound it! Correlation is (usually) not causation! But why not?
It is widely understood that statistical correlation between two variables ≠ causation. But despite this admonition, people are routinely overconfident in claiming correlations to support particular causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimating the prevalence of confounds/commoncausation. I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in DAGs, the imbalance also explains overconfidence.
I’ve noticed I seem to be unusually willing to bite the correlation≠causation bullet, and I think it’s due to an idea I had some time ago about the nature of reality.
1.1 The Problem
One of the constant problems I face in my reading is that I constantly want to know about causal relationships but usually I only have correlational data, and as we all know, correlation≠causation. If the general public naively thinks correlation=causation, then most geeks know better and that correlation≠causation, but then some go meta and point out that correlation and causation do tend to correlate and so correlation weakly implies causation. But how much evidence…? If I suspect that A→B, and I collect data and establish beyond doubt that A&B correlates r=0.7, how much evidence do I have that A→B?
Now, the correlation could be an illusory correlation thrown up by all the standard statistical problems we all know about, such as toosmall n, false positive from sampling error (A & B just happened to sync together due to randomness), multiple testing, phacking, data snooping, selection bias, publication bias, misconduct, inappropriate statistical tests, etc. I’ve read about those problems at length, and despite knowing about all that, there still seems to be a problem: I don’t think those issues explain away all the correlations which turn out to be confounds  correlation too often ≠ causation.
To measure this directly you need a clear set of correlations which are proposed to be causal, randomized experiments to establish what the true causal relationship is in each case, and both categories need to be sharply delineated in advance to avoid issues of cherrypicking and retroactively confirming a correlation. Then you’d be able to say something like ‘11 out of the 100 proposed A→B causal relationships panned out’, and start with a prior of 11% that in your case, A→B. This sort of dataset is pretty rare, although the few examples I’ve found from medicine tend to indicate that our prior should be under 10%. Not great. Why are our best guesses at causal relationships are so bad?
We’d expect that the a priori odds are good: 1/3! After all, you can divvy up the possibilities as:
 A causes B
 B causes A
 both A and B are caused by a C (possibly in a complex way like Berkson’s paradox or conditioning on unmentioned variables, like a phonebased survey inadvertently generating conclusions valid only for the phoneusing part of the population, causing amusing pseudocorrelations)
If it’s either #1 or #2, we’re good and we’ve found a causal relationship; it’s only outcome #3 which leaves us baffled & frustrated. Even if we were guessing at random, you’d expect us to be right at least 33% of the time, if not much more often because of all the knowledge we can draw on. (Because we can draw on other knowledge, like temporal order or biological plausibility. For example, in medicine you can generally rule out some of the relationships this way: if you find a correlation between taking superdupertetrohydracyline™ and pancreas cancer remission, it seems unlikely that #2 curing pancreas cancer causes a desire to take superdupertetrohydracyline™ so the causal relationship is probably either #1 superdupertetrohydracyline™ cures cancer or #3 a common cause like ‘doctors prescribe superdupertetrohydracyline™ to patients who are getting better’.)
I think a lot of people tend to put a lot of weight on any observed correlation because of this intuition that a causal relationship is normal & probable because, well, “how else could this correlation happen if there’s no causal connection between A & B‽” And fair enough  there’s no grand cosmic conspiracy arranging matters to fool us by always putting in place a C factor to cause scenario #3, right? If you question people, of course they know correlation doesn’t necessarily mean causation  everyone knows that  since there’s always a chance of a lurking confound, and it would be great if you had a randomized experiment to draw on; but you think with the data you have, not the data you wish you had, and can’t let the perfect be the enemy of the better. So when someone finds a correlation between A and B, it’s no surprise that suddenly their language & attitude change and they seem to place great confidence in their favored causal relationship even if they piously acknowledge “Yes, correlation is not causation, but… [obviously hanging out with fat people can be expected to make you fat] [surely giving babies antibiotics will help them] [apparently femalenamed hurricanes increase death tolls] etc etc”.
So, correlations tend to not be causation because it’s almost always #3, a shared cause. This commonness is contrary to our expectations, based on a simple & unobjectionable observation that of the 3 possible relationships, 2 are causal; and so we often reason as though correlation were strong evidence for causation. This leaves us with a paradox: experimental results seem to contradict intuition. To resolve the paradox, I need to offer a clear account of why shared causes/confounds are so common, and hopefully motivate a different set of intuitions.
1.2 What a Tangled Net We Weave When First We Practice to Believe
Here’s where Bayes nets & causal networks (seen previously on LW & Michael Nielsen) come up. When networks are inferred on realworld data, they often start to look pretty gnarly: tons of nodes, tons of arrows pointing all over the place. Daphne Koller early on in her Probabilistic Graphical Models course shows an example from a medical setting where the network has like 600 nodes and you can’t understand it at all. When you look at a biological causal network like this:
You start to appreciate how everything might be correlated with everything, but not cause each other.
This is not too surprising if you step back and think about it: life is complicated, we have limited resources, and everything has a lot of moving parts. (How many discrete parts does an airplane have? Or your car? Or a single cell? Or think about a chess player analyzing a position: ‘if my bishop goes there, then the other pawn can go here, which opens up a move there or here, but of course, they could also do that or try an en passant in which case I’ll be down in material but up on initiative in the center, which causes an overall shift in tempo…’) Fortunately, these networks are still simple compared to what they could be, since most nodes aren’t directly connected to each other, which tamps down on the combinatorial explosion of possible networks. (How many different causal networks are possible if you have 600 nodes to play with? The exact answer is complicated but it’s much larger than 2^{600}  so very large!)
One interesting thing I managed to learn from PGM (before concluding it was too hard for me and I should try it later) was that in a Bayes net even if two nodes were not in a simple direct correlation relationship A→B, you could still learn a lot about A from setting B to a value, even if the two nodes were ‘way across the network’ from each other. You could trace the influence flowing up and down the pathways to some surprisingly distant places if there weren’t any blockers.
The bigger the network, the more possible combinations of nodes to look for a pairwise correlation between them (eg If there are 10 nodes/variables and you are looking at bivariate correlations, then you have 10 choose 2
= 45 possible comparisons, and with 20, 190, and 40, 780. 40 variables is not that much for many realworld problems.) A lot of these combos will yield some sort of correlation. But does the number of causal relationships go up as fast? I don’t think so (although I can’t prove it).
If not, then as causal networks get bigger, the number of genuine correlations will explode but the number of genuine causal relationships will increase slower, and so the fraction of correlations which are also causal will collapse.
(Or more concretely: suppose you generated a randomly connected causal network with x nodes and y arrows perhaps using the algorithm in Kuipers & Moffa 2012, where each arrow has some random noise in it; count how many pairs of nodes are in a causal relationship; now, n times initialize the root nodes to random values and generate a possible state of the network & storing the values for each node; count how many pairwise correlations there are between all the nodes using the n samples (using an appropriate significance test & alpha if one wants); divide # of causal relationships by # of correlations, store; return to the beginning and resume with x+1 nodes and y+1 arrows… As one graphs each value of x against its respective estimated fraction, does the fraction head toward 0 as x increases? My thesis is it does. Or, since there must be at least as many causal relationships in a graph as there are arrows, you could simply use that as an upper bound on the fraction.)
It turns out, we weren’t supposed to be reasoning ‘there are 3 categories of possible relationships, so we start with 33%’, but rather: ‘there is only one explanation “A causes B”, only one explanation “B causes A”, but there are many explanations of the form “C_{1} causes A and B”, “C_{2} causes A and B”, “C_{3} causes A and B”…’, and the more nodes in a field’s true causal networks (psychology or biology vs physics, say), the bigger this last category will be.
The real world is the largest of causal networks, so it is unsurprising that most correlations are not causal, even after we clamp down our data collection to narrow domains. Hence, our prior for “A causes B” is not 50% (it’s either true or false) nor is it 33% (either A causes B, B causes A, or mutual cause C) but something much smaller: the number of causal relationships divided by the number of pairwise correlations for a graph, which ratio can be roughly estimated on a fieldbyfield basis by looking at existing work or directly for a particular problem (perhaps one could derive the fraction based on the properties of the smallest inferrable graph that fits large datasets in that field). And since the larger a correlation relative to the usual correlations for a field, the more likely the two nodes are to be close in the causal network and hence more likely to be joined causally, one could even give causality estimates based on the size of a correlation (eg. an r=0.9 leaves less room for confounding than an r of 0.1, but how much will depend on the causal network).
This is exactly what we see. How do you treat cancer? Thousands of treatments get tried before one works. How do you deal with poverty? Most programs are not even wrong. Or how do you fix societal woes in general? Most attempts fail miserably and the higherquality your studies, the worse attempts look (leading to Rossi’s Metallic Rules). This even explains why ‘everything correlates with everything’ and Andrew Gelman’s dictum about how coefficients are never zero: the reason datasets like those mentioned by Cohen or Meehl find most of their variables to have nonzero correlations (often reaching statisticalsignificance) is because the data is being drawn from large complicated causal networks in which almost everything really is correlated with everything else.
And thus I was enlightened.
1.3 Comment
Since I know so little about causal modeling, I asked our local causal researcher Ilya Shpitser to maybe leave a comment about whether the above was trivially wrong / alreadyproven / wellknown folklore / etc; for convenience, I’ll excerpt the core of his comment:
But does the number of causal relationships go up just as fast? I don’t think so (although at the moment I can’t prove it).
I am not sure exactly what you mean, but I can think of a formalization where this is not hard to show. We say A “structurally causes” B in a DAG G if and only if there is a directed path from A to B in G. We say A is “structurally dependent” with B in a DAG G if and only if there is a marginal dconnecting path from A to B in G.
A marginal dconnecting path between two nodes is a path with no consecutive edges of the form * > * < * (that is, no colliders on the path). In other words all directed paths are marginal dconnecting but the opposite isn’t true.
The justification for this definition is that if A “structurally causes” B in a DAG G, then if we were to intervene on A, we would observe B change (but not vice versa) in “most” distributions that arise from causal structures consistent with G. Similarly, if A and B are “structurally dependent” in a DAG G, then in “most” distributions consistent with G, A and B would be marginally dependent (e.g. what you probably mean when you say ‘correlations are there’).
I qualify with “most” because we cannot simultaneously represent dependences and independences by a graph, so we have to choose. People have chosen to represent independences. That is, if in a DAG G some arrow is missing, then in any distribution (causal structure) consistent with G, there is some sort of independence (missing effect). But if the arrow is not missing we cannot say anything. Maybe there is dependence, maybe there is independence. An arrow may be present in G, and there may still be independence in a distribution consistent with G. We call such distributions “unfaithful” to G. If we pick distributions consistent with G randomly, we are unlikely to hit on unfaithful ones (subset of all distributions consistent with G that is unfaithful to G has measure zero), but Nature does not pick randomly.. so unfaithful distributions are a worry. They may arise for systematic reasons (maybe equilibrium of a feedback process in bio?)
If you accept above definition, then clearly for a DAG with n vertices, the number of pairwise structural dependence relationships is an upper bound on the number of pairwise structural causal relationships. I am not aware of anyone having worked out the exact combinatorics here, but it’s clear there are many many more paths for structural dependence than paths for structural causality.
But what you actually want is not a DAG with n vertices, but another type of graph with n vertices. The “Universe DAG” has a lot of vertices, but what we actually observe is a very small subset of these vertices, and we marginalize over the rest. The trouble is, if you start with a distribution that is consistent with a DAG, and you marginalize over some things, you may end up with a distribution that isn’t well represented by a DAG. Or “DAG models aren’t closed under marginalization.”
That is, if our DAG is A > B < H > C < D, and we marginalize over H because we do not observe H, what we get is a distribution where no DAG can properly represent all conditional independences. We need another kind of graph.
In fact, people have come up with a mixed graph (containing > arrows and <> arrows) to represent margins of DAGs. Here > means the same as in a causal DAG, but <> means “there is some sort of common cause/confounder that we don’t want to explicitly write down.” Note: <> is not a correlative arrow, it is still encoding something causal (the presence of a hidden common cause or causes). I am being loose here – in fact it is the absence of arrows that means things, not the presence.
I do a lot of work on these kinds of graphs, because these are graphs are the sensible representation of data we typically get – drawn from a marginal of a joint distribution consistent with a big unknown DAG.
But the combinatorics work out the same in these graphs – the number of marginal dconnected paths is much bigger than the number of directed paths. This is probably the source of your intuition. Of course what often happens is you do have a (weak) causal link between A and B, but a much stronger noncausal link between A and B through an unobserved common parent. So the causal link is hard to find without “tricks.”
1.4 Heuristics & Biases
Now assuming the foregoing to be right (which I’m not sure about; in particular, I’m dubious that correlations in causal nets really do increase much faster than causal relations do), what’s the psychology of this? I see a few major ways that people might be incorrectly reasoning when they overestimate the evidence given by a correlation:

they might be aware of the imbalance between correlations and causation, but underestimate how much more common correlation becomes compared to causation.
This could be shown by giving causal diagrams and seeing how elicited probability changes with the size of the diagrams: if the probability is constant, then the subjects would seem to be considering the relationship in isolation and ignoring the context.
It might be remediable by showing a network and jarring people out of a simplistic comparison approach. 
they might not be reasoning in a causalnet framework at all, but starting from the naive 33% baserate you get when you treat all 3 kinds of causal relationships equally.
This could be shown by eliciting estimates and seeing whether the estimates tend to look like base rates of 33% and modifications thereof.
Sterner measures might be needed: could we draw causal nets with not just arrows showing influence but also another kind of arrow showing correlations? For example, the arrows could be drawn in black, inverse correlations drawn in red, and regular correlations drawn in green. The picture would be rather messy, but simply by comparing how few black arrows there are to how many green and red ones, it might visually make the case that correlation is much more common than causation. 
alternately, they may really be reasoning causally and suffer from a truly deep & persistent cognitive illusion that when people say ‘correlation’ it’s really a kind of causation and don’t understand the technical meaning of ‘correlation’ in the first place (which is not as unlikely as it may sound, given examples like David Hestenes’s demonstration of the persistence of Aristotelian folkphysics in physics students as all they had learned was guessing passwords; on the test used, see eg Halloun & Hestenes 1985 & Hestenes et al 1992); in which cause it’s not surprising that if they think they’ve been told a relationship is ‘causation’, then they’ll think the relationship is causation. Ilya remarks:
Pearl has this hypothesis that a lot of probabilistic fallacies/paradoxes/biases are due to the fact that causal and not probabilistic relationships are what our brain natively thinks about. So e.g. Simpson’s paradox is surprising because we intuitively think of a conditional distribution (where conditioning can change anything!) as a kind of “interventional distribution” (no Simpson’s type reversal under interventions: “Understanding Simpson’s Paradox”, Pearl 2014 [see also Pearl’s comments on Nielsen’s blog)).
This hypothesis would claim that people who haven’t looked into the math just interpret statements about conditional probabilities as about “interventional probabilities” (or whatever their intuitive analogue of a causal thing is).
This might be testable by trying to identify simple examples where the two approaches diverge, similar to Hestenes’s quiz for diagnosing belief in folkphysics.
This was originally posted to an open thread but due to the favorable response I am posting an expanded version here.
Very Basic Model Theory
In this post I'll discuss some basic results of model theory. It may be helpful to read through my previous post if you haven't yet. Model Theory is an implicit context for the Heavily Advanced Epistemology sequence and for a few of the recent MIRI papers, so casual readers may find this brief introduction useful. And who knows, maybe it will pique your interest:
A tale of two logics
propositional logic is the "easy logic", built from basic symbols and the connectives "and" and "not". Remember that all other connectives can be built from these two: With Enough NAND Gates You Can Rule The World and all that. Propositional logic is sometimes called the "sentential logic", because it's not like any other logics are "of or relating to sentences" (/sarcasm).
first order logic is the "nice logic". It has quantifiers ("there exists", "for all") and an internal notion of equality. Its sentences contain constants, functions, and relations. This lets you say lots of cool stuff that you can't say in propositional logic. First order logic turns out to be quite friendly (as we'll see below). However, it's not strong enough to talk about certain crazy/contrived ideas that humans cook up (such as "the numbers").
There are many other logics available (second order logic AKA "the heavy guns", ωlogic AKA "please just please can I talk about numbers", and many more). In this post we'll focus on propositional and firstorder logics.
Mental Context for Model Theory
I'm reviewing the books on the MIRI course list. After my first four book reviews I took a week off, followed up on some dangling questions, and upkept other side projects. Then I dove into Model Theory, by Chang and Keisler.
It has been three weeks. I have gained a decent foundation in model theory (by my own assessment), but I have not come close to completing the textbook. There are a number of other topics I want to touch upon before December, so I'm putting Model Theory aside for now. I'll be revisiting it in either January or March to finish the job.
In the meantime, I do not have a complete book review for you. Instead, this is the first of three posts on my experience with model theory thus far.
This post will give you some framing and context for model theory. I had to hop a number of conceptual hurdles before model theory started making sense — this post will contain some pointers that I wish I'd had three weeks ago. These tips and realizations are somewhat general to learning any logic or math; hopefully some of you will find them useful.
Shortly, I'll post a summary of what I've learned so far. For the casual reader, this may help demystify some heavily advanced parts of the Heavily Advanced Epistemology sequence (if you find it mysterious), and it may shed some light on some of the recent MIRI papers. On a personal note, there's a lot I want to write down & solidify before moving on.
In followup post, I'll discuss my experience struggling to learn something difficult on my own — model theory has required significantly more cognitive effort than did the previous textbooks.
A Voting Puzzle, Some Political Science, and a Nerd Failure Mode
In grade school, I read a series of books titled Sideways Stories from Wayside School by Louis Sachar, who you may know as the author of the novel Holes which was made into a movie in 2003. The series included two books of math problems, Sideways Arithmetic from Wayside School and More Sideways Arithmetic from Wayside School, the latter of which included the following problem (paraphrased):
The students have Mrs. Jewl's class have been given the privilege of voting on the height of the school's new flagpole. She has each of them write down what they think would be the best hight for the flagpole. The votes are distributed as follows:
 1 student votes for 6 feet.
 1 student votes for 10 feet.
 7 students vote for 25 feet.
 1 student votes for 30 feet.
 2 students vote for 50 feet.
 2 students vote for 60 feet.
 1 student votes for 65 feet.
 3 students vote for 75 feet.
 1 student votes for 80 feet, 6 inches.
 4 students vote for 85 feet.
 1 student votes for 91 feet.
 5 students vote for 100 feet.
At first, Mrs. Jewls declares 25 feet the winning answer, but one of the students who voted for 100 feet convinces her there should be a runoff between 25 feet and 100 feet. In the runoff, each student votes for the height closest to their original answer. But after that round of voting, one of the students who voted for 85 feet wants their turn, so 85 feet goes up against the winner of the previous round of voting, and the students vote the same way, with each student voting for the height closest to their original answer. Then the same thing happens again with the 50 foot option. And so on, with each number, again and again, "very much like a game of tether ball."
Question: if this process continues until it settles on an answer that can't be beaten by any other answer, how tall will the new flagpole be?
Answer (rot13'd): fvkglsvir srrg, orpnhfr gung'f gur zrqvna inyhr bs gur bevtvany frg bs ibgrf. Naq abj lbh xabj gur fgbel bs zl svefg rapbhagre jvgu gur zrqvna ibgre gurberz.
Why am I telling you this? There's a minor reason and a major reason. The minor reason is that this shows it is possible to explain littleknown academic concepts, at least certain ones, in a way that grade schoolers will understand. It's a data point that fits nicely with what Eliezer has written about how to explain things. The major reason, though, is that a month ago I finished my systematic readthrough of the sequences and while I generally agree that they're awesome (perhaps moreso than most people; I didn't see the problem with the metaethics sequence), I thought the minidiscussion of political parties and voting was on reflection weak and indicative of a broader nerd failure mode.
TLDR (courtesy of lavalamp):
 Politicians probably conform to the median voter's views.
 Most voters are not the median, so most people usually dislike the winning politicians.
 But people dislike the politicians for different reasons.
 Nerds should avoid giving advice that boils down to "behave optimally". Instead, analyze the reasons for the current failure to behave optimally and give more targeted advice.
Robust Cooperation in the Prisoner's Dilemma
I'm proud to announce the preprint of Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic, a joint paper with Mihaly Barasz, Paul Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire (me), and Eliezer Yudkowsky.
This paper was one of three projects to come out of the 2nd MIRI Workshop on Probability and Reflection in April 2013, and had its genesis in ideas about formalizations of decision theory that have appeared on LessWrong. (At the end of this post, I'll include links for further reading.)
Below, I'll briefly outline the problem we considered, the results we proved, and the (many) open questions that remain. Thanks in advance for your thoughts and suggestions!
Background: Writing programs to play the PD with source code swap
(If you're not familiar with the Prisoner's Dilemma, see here.)
The paper concerns the following setup, which has come up in academic research on game theory: say that you have the chance to write a computer program X, which takes in one input and returns either Cooperate or Defect. This program will face off against some other computer program Y, but with a twist: X will receive the source code of Y as input, and Y will receive the source code of X as input. And you will be given your program's winnings, so you should think carefully about what sort of program you'd write!
Of course, you could simply write a program that defects regardless of its input; we call this program DefectBot, and call the program that cooperates on all inputs CooperateBot. But with the wealth of information afforded by the setup, you might wonder if there's some program that might be able to achieve mutual cooperation in situations where DefectBot achieves mutual defection, without thereby risking a sucker's payoff. (Douglas Hofstadter would call this a perfect opportunity for superrationality...)
Previously known: CliqueBot and FairBot
And indeed, there's a way to do this that's been known since at least the 1980s. You can write a computer program that knows its own source code, compares it to the input, and returns C if and only if the two are identical (and D otherwise). Thus it achieves mutual cooperation in one important case where it intuitively ought to: when playing against itself! We call this program CliqueBot, since it cooperates only with the "clique" of agents identical to itself.
There's one particularly irksome issue with CliqueBot, and that's the fragility of its cooperation. If two people write functionally analogous but syntactically different versions of it, those programs will defect against one another! This problem can be patched somewhat, but not fully fixed. Moreover, mutual cooperation might be the best strategy against some agents that are not even functionally identical, and extending this approach requires you to explicitly delineate the list of programs that you're willing to cooperate with. Is there a more flexible and robust kind of program you could write instead?
As it turns out, there is: in a 2010 post on LessWrong, cousin_it introduced an algorithm that we now call FairBot. Given the source code of Y, FairBot searches for a proof (of less than some large fixed length) that Y returns C when given the source code of FairBot, and then returns C if and only if it discovers such a proof (otherwise it returns D). Clearly, if our proof system is consistent, FairBot only cooperates when that cooperation will be mutual. But the really fascinating thing is what happens when you play two versions of FairBot against each other. Intuitively, it seems that either mutual cooperation or mutual defection would be stable outcomes, but it turns out that if their limits on proof lengths are sufficiently high, they will achieve mutual cooperation!
The proof that they mutually cooperate follows from a bounded version of Löb's Theorem from mathematical logic. (If you're not familiar with this result, you might enjoy Eliezer's Cartoon Guide to Löb's Theorem, which is a correct formal proof written in much more intuitive notation.) Essentially, the asymmetry comes from the fact that both programs are searching for the same outcome, so that a short proof that one of them cooperates leads to a short proof that the other cooperates, and vice versa. (The opposite is not true, because the formal system can't know it won't find a contradiction. This is a subtle but essential feature of mathematical logic!)
Generalization: Modal Agents
Unfortunately, FairBot isn't what I'd consider an ideal program to write: it happily cooperates with CooperateBot, when it could do better by defecting. This is problematic because in real life, the world isn't separated into agents and nonagents, and any natural phenomenon that doesn't predict your actions can be thought of as a CooperateBot (or a DefectBot). You don't want your agent to be making concessions to rocks that happened not to fall on them. (There's an important caveat: some things have utility functions that you care about, but don't have sufficient ability to predicate their actions on yours. In that case, though, it wouldn't be a true Prisoner's Dilemma if your values actually prefer the outcome (C,C) to (D,C).)
However, FairBot belongs to a promising class of algorithms: those that decide on their action by looking for short proofs of logical statements that concern their opponent's actions. In fact, there's a really convenient mathematical structure that's analogous to the class of such algorithms: the modal logic of provability (known as GL, for GödelLöb).
So that's the subject of this preprint: what can we achieve in decision theory by considering agents defined by formulas of provability logic?
Tiling Agents for SelfModifying AI (OPFAI #2)
An early draft of publication #2 in the Open Problems in Friendly AI series is now available: Tiling Agents for SelfModifying AI, and the Lobian Obstacle. ~20,000 words, aimed at mathematicians or the highly mathematically literate. The research reported on was conducted by Yudkowsky and Herreshoff, substantially refined at the November 2012 MIRI Workshop with Mihaly Barasz and Paul Christiano, and refined further at the April 2013 MIRI Workshop.
Abstract:
We model selfmodication in AI by introducing 'tiling' agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring's goals). Constructing a formalism in the most straightforward way produces a Godelian difficulty, the Lobian obstacle. By technical methods we demonstrate the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed. We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for selfmodifying probabilistic agents remains open.
Commenting here is the preferred venue for discussion of the paper. This is an early draft and has not been reviewed, so it may contain mathematical errors, and reporting of these will be much appreciated.
The overall agenda of the paper is introduce the conceptual notion of a selfreproducing decision pattern which includes reproduction of the goal or utility function, by exposing a particular possible problem with a tiling logical decision pattern and coming up with some partial technical solutions. This then makes it conceptually much clearer to point out the even deeper problems with "We can't yet describe a probabilistic way to do this because of nonmonotonicity" and "We don't have a good bounded way to do this because maximization is impossible, satisficing is too weak and Schmidhuber's swapping criterion is underspecified." The paper uses firstorder logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most realworld environments outside a humanconstructed computer chip with thermodynamically expensive crisp variable states.
As further background, the idea that somethinglikeproof might be relevant to Friendly AI is not about achieving some chimera of absolute safetyfeeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each selfmodification, and that selfmodification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for selfmodifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals). Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent perstep failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of selfmodification during some developmental stages.
Thus the content of the paper is very far off from how a realistic AI would work, but conversely, if you can't even answer the kinds of simple problems posed within the paper (both those we partially solve and those we only pose) then you must be very far off from being able to build a stable selfmodifying AI. Being able to say how to build a theoretical device that would play perfect chess given infinite computing power, is very far off from the ability to build Deep Blue. However, if you can't even say how to play perfect chess given infinite computing power, you are confused about the rules of the chess or the structure of chessplaying computation in a way that would make it entirely hopeless for you to figure out how to build a bounded chessplayer. Thus "In real life we're always bounded" is no excuse for not being able to solve the much simpler unbounded form of the problem, and being able to describe the infinite chessplayer would be substantial and useful conceptual progress compared to not being able to do that. We can't be absolutely certain that an analogous situation holds between solving the challenges posed in the paper, and realistic selfmodifying AIs with stable goal systems, but every line of investigation has to start somewhere.
Parts of the paper will be easier to understand if you've read Highly Advanced Epistemology 101 For Beginners including the parts on correspondence theories of truth (relevant to section 6) and modeltheoretic semantics of logic (relevant to 3, 4, and 6), and there are footnotes intended to make the paper somewhat more accessible than usual, but the paper is still essentially aimed at mathematically sophisticated readers.
Reflection in Probabilistic Logic
Paul Christiano has devised a new fundamental approach to the "Löb Problem" wherein Löb's Theorem seems to pose an obstacle to AIs building successor AIs, or adopting successor versions of their own code, that trust the same amount of mathematics as the original. (I am currently writing up a more thorough description of the question this preliminary technical report is working on answering. For now the main online description is in a quick Summit talk I gave. See also Benja Fallenstein's description of the problem in the course of presenting a different angle of attack. Roughly the problem is that mathematical systems can only prove the soundness of, aka 'trust', weaker mathematical systems. If you try to write out an exact description of how AIs would build their successors or successor versions of their code in the most obvious way, it looks like the mathematical strength of the proof system would tend to be stepped down each time, which is undesirable.)
Paul Christiano's approach is inspired by the idea that whereof one cannot prove or disprove, thereof one must assign probabilities: and that although no mathematical system can contain its own truth predicate, a mathematical system might be able to contain a reflectively consistent probability predicate. In particular, it looks like we can have:
∀a, b: (a < P(φ) < b) ⇒ P(a < P('φ') < b) = 1
∀a, b: P(a ≤ P('φ') ≤ b) > 0 ⇒ a ≤ P(φ) ≤ b
Suppose I present you with the human and probabilistic version of a Gödel sentence, the Whitely sentence "You assign this statement a probability less than 30%." If you disbelieve this statement, it is true. If you believe it, it is false. If you assign 30% probability to it, it is false. If you assign 29% probability to it, it is true.
Paul's approach resolves this problem by restricting your belief about your own probability assignment to within epsilon of 30% for any epsilon. So Paul's approach replies, "Well, I assign almost exactly 30% probability to that statement  maybe a little more, maybe a little less  in fact I think there's about a 30% chance that I'm a tiny bit under 0.3 probability and a 70% chance that I'm a tiny bit over 0.3 probability." A standard fixedpoint theorem then implies that a consistent assignment like this should exist. If asked if the probability is over 0.2999 or under 0.30001 you will reply with a definite yes.
SecondOrder Logic: The Controversy
Followup to: Godel's Completeness and Incompleteness Theorems
"So the question you asked me last time was, 'Why does anyone bother with firstorder logic at all, if secondorder logic is so much more powerful?'"
Right. If firstorder logic can't talk about finiteness, or distinguish the size of the integers from the size of the reals, why even bother?
"The first thing to realize is that firstorder theories can still have a lot of power. Firstorder arithmetic does narrow down the possible models by a lot, even if it doesn't narrow them down to a single model. You can prove things like the existence of an infinite number of primes, because every model of the firstorder axioms has an infinite number of primes. Firstorder arithmetic is never going to prove anything that's wrong about the standard numbers. Anything that's true in all models of firstorder arithmetic will also be true in the particular model we call the standard numbers."
Even so, if firstorder theory is strictly weaker, why bother? Unless secondorder logic is just as incomplete relative to thirdorder logic, which is weaker than fourthorder logic, which is weaker than omegaorder logic 
"No, surprisingly enough  there's tricks for making secondorder logic encode any proposition in thirdorder logic and so on. If there's a collection of thirdorder axioms that characterizes a model, there's a collection of secondorder axioms that characterizes the same model. Once you make the jump to secondorder logic, you're done  so far as anyone knows (so far as I know) there's nothing more powerful than secondorder logic in terms of which models it can characterize."
Then if there's one spoon which can eat anything, why not just use the spoon?
"Well... this gets into complex issues. There are mathematicians who don't believe there is a spoon when it comes to secondorder logic."
Like there are mathematicians who don't believe in infinity?
"Kind of. Look, suppose you couldn't use secondorder logic  you belonged to a species that doesn't have secondorder logic, or anything like it. Your species doesn't have any native mental intuition you could use to construct the notion of 'all properties'. And then suppose that, after somebody used firstorder set theory to prove that firstorder arithmetic had many possible models, you stood around shouting that you believed in only one model, what you called the standard model, but you couldn't explain what made this model different from any other model "
Well... a lot of times, even in math, we make statements that genuinely mean something, but take a while to figure out how to define. I think somebody who talked about 'the numbers' would mean something even before secondorder logic was invented.
"But here the hypothesis is that you belong to a species that can't invent secondorder logic, or think in secondorder logic, or anything like it."
Then I suppose you want me to draw the conclusion that this hypothetical alien is just standing there shouting about standardness, but its words don't mean anything because they have no way to pin down one model as opposed to another one. And I expect this species is also magically forbidden from talking about all possible subsets of a set?
"Yeah. They can't talk about the largest powerset, just like they can't talk about the smallest model of Peano arithmetic."
Then you could arguably deny that shouting about the 'standard' numbers would mean anything, to the members of this particular species. You might as well shout about the 'fleem' numbers, I guess.
"Right. Even if all the members of this species did have a builtin sense that there was a special model of firstorder arithmetic that was fleemer than any other model, if that fleemness wasn't bound to anything else, it would be meaningless. Just a floating word. Or if all you could do was define fleemness as floobness and floobness as fleemness, you'd have a loop of floating words; and that might give you the impression that each particular word had a meaning, but the loop as a whole wouldn't be connected to reality. That's why it doesn't help to say that the standard model of numbers is the smallest among all possible models of Peano arithmetic, if you can't define 'smallest possible' any more than you can define 'connected chain' or 'finite number of predecessors'."
But secondorder logic does seem to have consequences firstorder logic doesn't. Like, what about all that Godelian stuff? Doesn't secondorder arithmetic semantically imply... its own Godel statement? Because the unique model of secondorder arithmetic doesn't contain any number encoding a proof of a contradiction from secondorder arithmetic? Wait, now I'm confused.
"No, that's correct. It's not paradoxical, because there's no effective way of finding all the semantic implications of a collection of secondorder axioms. There's no analogue of Godel's Completeness Theorem for secondorder logic  no syntactic system for deriving all the semantic consequences. Secondorder logic is sound, in the sense that anything syntactically provable from a set of premises, is true in any model obeying those premises. But secondorder logic isn't complete; there are semantic consequences you can't derive. If you take secondorder logic at face value, there's no effectively computable way of deriving all the consequences of what you say you 'believe'... which is a major reason some mathematicians are suspicious of secondorder logic. What does it mean to believe something whose consequences you can't derive?"
But secondorder logic clearly has some experiential consequences firstorder logic doesn't. Suppose I build a Turing machine that looks for proofs of a contradiction from firstorder Peano arithmetic. If PA's consistency isn't provable in PA, then by the Completeness Theorem there must exist nonstandard models of PA where this machine halts after finding a proof of a contradiction. So firstorder logic doesn't tell me that this machine runs forever  maybe it has nonstandard halting times, i.e., it runs at all standard N, but halts at 2* somewhere along a disconnected chain. Only secondorder logic tells me there's no proof of PA's inconsistency and therefore this machine runs forever. Only secondorder logic tells me I should expect to see this machine keep running, and not expect  note falsifiability  that the machine ever halts.
"Sure, you just used a secondorder theory to derive a consequence you didn't get from a firstorder theory. But that's not the same as saying that you can only get that consequence using secondorder logic. Maybe another firstorder theory would give you the same prediction."
Like what?
"Well, for one thing, firstorder set theory can prove that firstorder arithmetic has a model. ZermeloFraenkel set theory can prove the existence of a set such that all the firstorder Peano axioms are true about that set. It follows within ZF that sound reasoning on firstorder arithmetic will never prove a contradiction. And since ZF can prove that the syntax of classical logic is sound "
What does it mean for set theory to prove that logic is sound!?
"ZF can quote formulas as structured, and encode models as sets, and then represent a finite ZFformula which says whether or not a set of quoted formulas is true about a model. ZF can then prove that no step of classical logic goes from premises that are true inside a setmodel, to premises that are false inside a setmodel. In other words, ZF can represent the idea 'formula X is semantically true in model Y' and 'these syntactic rules never produce semantically false statements from semantically true statements'."
Wait, then why can't ZF prove itself consistent? If ZF believes in all the axioms of ZF, and it believes that logic never produces false statements from true statements 
"Ah, but ZF can't prove that there exists any set which is a model of ZF, so it can't prove the ZFaxioms are consistent. ZF shouldn't be able to prove that some set is a model of ZF, because that's not true in all models. Many models of ZF don't contain any individual set wellpopulated enough for that one set to be a model of ZF all by itself."
I'm kind of surprised in a Godelian sense that ZF can contain sets as large as the universe of ZF. Doesn't any given set have to be smaller than the whole universe?
"Inside any particular model of ZF, every set within that model is smaller than that model. But not all models of ZF are the same size; in fact, models of ZF of every size exist, by the LowenheimSkolem theorem. So you can have some models of ZF  some universes in which all the elements collectively obey the ZFrelations  containing individual sets which are larger than other entire models of ZF. A set that large is called a Grothendieck universe and assuming it exists is equivalent to assuming the existence of 'strongly inaccessible cardinals', sizes so large that you provably can't prove inside set theory that anything that large exists."
Whoa.
(Pause.)
But...
"But?"
I agree you've shown that one experiential consequence of secondorder arithmetic  namely that a machine looking for proofs of inconsistency from PA, won't be seen to halt  can be derived from firstorder set theory. Can you get all the consequences of secondorder arithmetic in some particular firstorder theory?
"You can't get all the semantic consequences of secondorder logic, taken at face value, in any theory or any computable reasoning. What about the halting problem? Taken at face value, it's a semantic consequence of secondorder logic that any given Turing machine either halts or doesn't halt "
Personally I find it rather intuitive to imagine that a Turing machine either halts or doesn't halt. I mean, if I'm walking up to a machine and watching it run, telling me that its halting or nothalting 'isn't entailed by my axioms' strikes me as not describing any actual experience I can have with the machine. Either I'll see it halt eventually, or I'll see it keep running forever.
"My point is that the statements we can derive from the syntax of current secondorder logic is limited by that syntax. And by the halting problem, we shouldn't ever expect a computable syntax that gives us all the semantic consequences. There's no possible theory you can actually use to get a correct advance prediction about whether an arbitrary Turing machine halts."
Okay. I agree that no computable reasoning, on secondorder logic or anything else, should be able to solve the halting problem. Unless time travel is possible, but even then, you shouldn't be able to solve the expanded halting problem for machines that use time travel.
"Right, so the syntax of secondorder logic can't prove everything. And in fact, it turns out that, in terms of what you can prove syntactically using the standard syntax, secondorder logic is identical to a manysorted firstorder logic."
Huh?
"Suppose you have a firstorder logic  one that doesn't claim to be able to quantify over all possible predicates  which does allow the universe to contain two different sorts of things. Say, the logic uses lowercase letters for all type1 objects and uppercase letters for all type2 objects. Like, '∀x: x = x' is a statement over all type1 objects, and '∀Y: Y = Y' is a statement over all type2 objects. But aside from that, you use the same syntax and proof rules as before."
Okay...
"Now add an elementrelation x∈Y, saying that x is an element of Y, and add some firstorder axioms for making the type2 objects behave like collections of type1 objects, including axioms for making sure that most describable type2 collections exist  i.e., the collection X containing just x is guaranteed to exist, and so on. What you can prove syntactically in this theory will be identical to what you can prove using the standard syntax of secondorder logic  even though the theory doesn't claim that all possible collections of type1s are type2s, and the theory will have models where many 'possible' collections are missing from the type2s."
Wait, now you're saying that secondorder logic is no more powerful than firstorder logic?
"I'm saying that the supposed power of secondorder logic derives from interpreting it a particular way, and taking on faith that when you quantify over all properties, you're actually talking about all properties."
But then secondorder arithmetic is no more powerful than firstorder arithmetic in terms of what it can actually prove?
"2ndorder arithmetic is way more powerful than firstorder arithmetic. But that's because firstorder set theory is more powerful than arithmetic, and adding the syntax of secondorder logic corresponds to adding axioms with settheoretic properties. In terms of which consequences can be syntactically proven, secondorder arithmetic is more powerful than firstorder arithmetic, but weaker than firstorder set theory. Firstorder set theory can prove the existence of a model of secondorder arithmetic  ZF can prove there's a collection of numbers and sets of numbers which models a manysorted logic with syntax corresponding to secondorder logic  and so ZF can prove secondorder arithmetic consistent."
But firstorder logic, including firstorder set theory, can't even talk about the standard numbers!
"Right, but firstorder set theory can syntactically prove more statements about 'numbers' than secondorder arithmetic can prove. And when you combine that with the semantic implications of secondorder arithmetic not being computable, and with any secondorder logic being syntactically identical to a manysorted firstorder logic, and firstorder logic having neat properties like the Completeness Theorem... well, you can see why some mathematicians would want to give up entirely on this whole secondorder business."
But if you deny secondorder logic you can't even say the word 'finite'. You would have to believe the word 'finite' was a meaningless noise.
"You'd define finiteness relative to whatever firstorder model you were working in. Like, a set might be 'finite' only on account of the model not containing any onetoone mapping from that set onto a smaller subset of itself "
But that set wouldn't actually be finite. There wouldn't actually be, like, only a billion objects in there. It's just that all the mappings which could prove the set was infinite would be mysteriously missing.
"According to some other model, maybe. But since there is no one true model "
How is this not crazy talk along the lines of 'there is no one true reality'? Are you saying there's no really smallest set of numbers closed under succession, without all the extra infinite chains? Doesn't talking about how these theories have multiple possible models, imply that those possible models are logical thingies and one of them actually does contain the largest powerset and the smallest integers?
"The mathematicians who deny secondorder logic would see that reasoning as invoking an implicit background universe of set theory. Everything you're saying makes sense relative to some particular model of set theory, which would contain possible models of Peano arithmetic as sets, and could look over those sets and pick out the smallest in that model. Similarly, that set theory could look over a proposed model for a manysorted logic, and say whether there were any subsets within the larger universe which were missing from the manysorted model. Basically, your brain is insisting that it lives inside some particular model of set theory. And then, from that standpoint, you could look over some other set theory and see how it was missing subsets that your theory had."
Argh! No, damn it, I live in the set theory that really does have all the subsets, with no mysteriously missing subsets or mysterious extra numbers, or denumerable collections of all possible reals that could like totally map onto the integers if the mapping that did it hadn't gone missing in the Australian outback 
"But everybody says that."
Okay...
"Yeah?"
Screw set theory. I live in the physical universe where when you run a Turing machine, and keep watching forever in the physical universe, you never experience a time where that Turing machine outputs a proof of the inconsistency of Peano Arithmetic. Furthermore, I live in a universe where space is actually composed of a real field and space is actually infinitely divisible and contains all the points between A and B, rather than space only containing a denumerable number of points whose existence can be proven from the firstorder axiomatization of the real numbers. So to talk about physics  forget about mathematics  I've got to use secondorder logic.
"Ah. You know, that particular response is not one I have seen in the previous literature."
Yes, well, I'm not a pure mathematician. When I ask whether I want an Artificial Intelligence to think in secondorder logic or firstorder logic, I wonder how that affects what the AI does in the actual physical universe. Here in the actual physical universe where times are followed by successor times, I strongly suspect that we should only expect to experience standard times, and not experience any nonstandard times. I think time is connected, and global connectedness is a property I can only talk about using secondorder logic. I think that every particular time is finite, and yet time has no upper bound  that there are all finite times, but only finite times  and that's a property I can only characterize using secondorder logic.
"But if you can't ever tell the difference between standard and nonstandard times? I mean, local properties of time can be described using firstorder logic, and you can't directly see global properties like 'connectedness' "
But I can tell the difference. There are only nonstandard times where a proofchecking machine, running forever, outputs a proof of inconsistency from the Peano axioms. So I don't expect to experience seeing a machine do that, since I expect to experience only standard times.
"Set theory can also prove PA consistent. If you use set theory to define time, you similarly won't expect to see a time where PA is proven inconsistent  those nonstandard integers don't exist in any model of ZF."
Why should I anticipate that my physical universe is restricted to having only the nonstandard times allowed by a more powerful set theory, instead of nonstandard times allowed by firstorder arithmetic? If I then talk about a nonstandard time where a proofenumerating machine proves ZF inconsistent, will you say that only nonstandard times allowed by some still more powerful theory can exist? I think it's clear that the way you're deciding which experimental outcomes you'll have to excuse, is by secretly assuming that only standard times exist regardless of which theory is required to narrow it down.
"Ah... hm. Doesn't physics say this universe is going to run out of negentropy before you can do an infinite amount of computation? Maybe there's only a bounded amount of time, like it stops before googolplex or something. That can be characterized by firstorder theories."
We don't know that for certain, and I wouldn't want to build an AI that just assumed lifespans had to be finite, in case it was wrong. Besides, should I use a different logic than if I'd found myself in Conway's Game of Life, or something else really infinite? Wouldn't the same sort of creatures evolve in that universe, having the same sort of math debates?
"Perhaps no universe like that can exist; perhaps only finite universes can exist, because only finite universes can be uniquely characterized by firstorder logic."
You just used the word 'finite'! Furthermore, taken at face value, our own universe doesn't look like it has a finite collection of entities related by firstorder logical rules. Space and time both look like infinite collections of points  continuous collections, which is a secondorder concept  and then to characterize the size of that infinity we'd need secondorder logic. I mean, by the LowenheimSkolem theorem, there aren't just denumerable models of firstorder axiomatizations of the reals, there's also unimaginably large cardinal infinities which obey the same premises, and that's a possibility straight out of H. P. Lovecraft. Especially if there are any things hiding in the invisible cracks of space."
"How could you tell if there were inaccessible cardinal quantities of points hidden inside a straight line? And anything that locally looks continuous each time you try to split it at a describable point, can be axiomatized by a firstorder schema for continuity."
That brings up another point: Who'd really believe that the reason Peano arithmetic works on physical times, is because that whole infinite axiom schema of induction, containing for every Φ a separate rule saying...
(Φ(0) ∧ (∀x: Φ(x) → Φ(Sx))) → (∀n: Φ(n))
...was used to specify our universe? How improbable would it be for an infinitely long list of rules to be true, if there wasn't a unified reason for all of them? It seems much more likely that the real reason firstorder PA works to describe time, is that all properties which are true at a starting time and true of the successor of any time where they're true, are true of all later times; and this generalization over properties makes induction hold for firstorder formulas as a special case. If my native thought process is firstorder logic, I wouldn't see the connection between each individual formula in the axiom schema  it would take separate evidence to convince me of each one  they would feel like independent mathematical facts. But after doing scientific induction over the fact that many properties true at zero, with succession preserving truth, seem to be true everywhere  after generalizing the simple, compact secondorder theory of numbers and times  then you could invent an infinite firstorder theory to approximate it.
"Maybe that just says you need to adjust whatever theory of scientific induction you're using, so that it can more easily induct infinite axiom schemas."
But why the heck would you need to induct infinite axiom schemas in the first place, if Reality natively ran on firstorder logic? Isn't it far more likely that the way we ended up with these infinite lists of axioms was that Reality was manufactured  forgive the anthropomorphism  that Reality was manufactured using an underlying schema in which time is a connected series of events, and space is a continuous field, and these are properties which happen to require secondorder logic for humans to describe? I mean, if you picked out firstorder theories at random, what's the chance we'd end up inside an infinitely long axiom schema that just happened to look like the projection of a compact secondorder theory? Aren't we ignoring a sort of hint?
"A hint to what?"
Well, I'm not that sure myself, at this depth of philosophy. But I would currently say that finding ourselves in a physical universe where times have successor times, and space looks continuous, seems like a possible hint that the Tegmark Level IV multiverse  or the way Reality was manufactured, or whatever  might look more like causal universes characterizable by compact secondorder theories than causal universes characterizable by firstorder theories.
"But since any secondorder theory can just as easily be interpreted as a manysorted firstorder theory with quantifiers that can range over either elements or sets of elements, how could using secondorder syntax actually improve an Artificial Intelligence's ability to handle a reality like that?"
Good question. One obvious answer is that the AI would be able to induct what you would call an infinite axiom schema, as a single axiom  a simple, finite hypothesis.
"There's all sorts of obvious hacks to scientific induction of firstorder axioms which would let you assign nonzero probability to computable infinite sequences of axioms "
Sure. So beyond that... I would currently guess that the basic assumption behind 'behaving as if' secondorder logic is true, says that the AI should act as if only the 'actually smallest' numbers will ever appear in physics, relative to some 'true' mathematical universe that it thinks it lives in, but knows it can't fully characterize. Which is roughly what I'd say human mathematicians do when they take secondorder logic at face value; they assume that there's some true mathematical universe in the background, and that secondorder logic lets them talk about it.
"And what behaviorally, experimentally distinguishes the hypothesis, 'I live in the true ultimate math with fully populated powersets' from the hypothesis, 'There's some random model of firstorder settheory axioms I happen to be living in'?"
Well... one behavioral consequence is suspecting that your time obeys an infinitely long list of firstorder axioms with induction schemas for every formula. And then moreover believing that you'll never experience a time when a proofchecking machine outputs a proof that ZermeloFraenkel set theory is inconsistent  even though there's provably some models with times like that, which fit the axiom schema you just inducted. That sounds like secretly believing that there's a background 'true' set of numbers that you think characterizes physical time, and that it's the smallest such set. Another suspicious behavior is that as soon as you suspect ZermeloFraenkel set theory is consistent, you suddenly expect not to experience any physical time which ZF proves isn't a standard number. You don't think you're in the nonstandard time of some weaker theory like Peano arithmetic. You think you're in the minimal time expressible by any and all theories, so as soon as ZF can prove some number doesn't exist in the minimal set, you think that 'real time' lacks such a number. All of these sound like behaviors you'd carry out if you thought there was a single 'true' mathematical universe that provided the best model for describing all physical phenomena, like time and space, which you encounter  and believing that this 'true' backdrop used the largest powersets and smallest numbers.
"How exactly do you formalize all that reasoning, there? I mean, how would you actually make an AI reason that way?"
Er... I'm still working on that part.
"That makes your theory a bit hard to criticize, don't you think? Personally, I wouldn't be surprised if any such formalized reasoning looked just like believing that you live inside a firstorder set theory."
I suppose I wouldn't be too surprised either  it's hard to argue with the results on manysorted logics. But if comprehending the physical universe is best done by assuming that real phenomena are characterized by a 'true' mathematics containing the powersets and the natural numbers  and thus expecting that no mathematical model we can formulate will ever contain any larger powersets or smaller numbers than those of the 'true' backdrop to physics  then I'd call that a moral victory for secondorder logic. In firstorder logic we aren't even supposed to be able to talk about such things.
"Really? To me that sounds like believing you live inside a model of firstorder set theory, and believing that all models of any theories you can invent must also be sets in the larger model. You can prove the Completeness Theorem inside ZF plus the Axiom of Choice, so ZFC already proves that all consistent theories have models which are sets, although of course it can't prove that ZFC itself is such a theory. So  anthropomorphically speaking  no model of ZFC expects to encounter a theory that has fewer numbers or larger powersets than itself. No model of ZFC expects to encounter any quotedmodel, any set that a quoted theory entails, which contains larger powersets than the ones in its own Powerset Axiom. A firstorder set theory can even make the leap from the finite statement, 'P is true of all my subsets of X', to believing, 'P is true of all my subsets of X that can be described by this denumerable collection of formulas'  it can encompass the jump from a single axiom over 'all my subsets', to a quoted axiom schema over formulas. I'd sum all that up by saying, 'secondorder logic is how firstorder set theory feels from the inside'."
Maybe. Even in the event that neither human nor superintelligent cognition will ever be able to 'properly talk about' unbounded finite times, global connectedness, particular infinite cardinalities, or true spatial continuity, it doesn't follow that Reality is similarly limited in which physics it can privilege.
Part of the sequence Highly Advanced Epistemology 101 for Beginners
Previous post: "Godel's Completeness and Incompleteness Theorems"
Godel's Completeness and Incompleteness Theorems
Followup to: Standard and Nonstandard Numbers
So... last time you claimed that using firstorder axioms to rule out the existence of nonstandard numbers  other chains of numbers besides the 'standard' numbers starting at 0  was forever and truly impossible, even unto a superintelligence, no matter how clever the firstorder logic used, even if you came up with an entirely different way of axiomatizing the numbers.
"Right."
How could you, in your finiteness, possibly know that?
"Have you heard of Godel's Incompleteness Theorem?"
Of course! Godel's Theorem says that for every consistent mathematical system, there are statements which are true within that system, which can't be proven within the system itself. Godel came up with a way to encode theorems and proofs as numbers, and wrote a purely numerical formula to detect whether a proof obeyed proper logical syntax. The basic trick was to use prime factorization to encode lists; for example, the ordered list <3, 7, 1, 4> could be uniquely encoded as:
2^{3} * 3^{7} * 5^{1} * 7^{4}
And since prime factorizations are unique, and prime powers don't mix, you could inspect this single number, 210,039,480, and get the unique ordered list <3, 7, 1, 4> back out. From there, going to an encoding for logical formulas was easy; for example, you could use the 2 prefix for NOT and the 3 prefix for AND and get, for any formulas Φ and Ψ encoded by the numbers #Φ and #Ψ:
¬Φ = 2^{2} * 3^{#Φ}
Φ ∧ Ψ = 2^{3} * 3^{#Φ} * 5^{#Ψ}
It was then possible, by dint of crazy amounts of work, for Godel to come up with a gigantic formula of Peano Arithmetic [](p, c) meaning, 'P encodes a valid logical proof using firstorder Peano axioms of C', from which directly followed the formula []c, meaning, 'There exists a number P such that P encodes a proof of C' or just 'C is provable in Peano arithmetic.'
Godel then put in some further clever work to invent statements which referred to themselves, by having them contain subrecipes that would reproduce the entire statement when manipulated by another formula.
And then Godel's Statement encodes the statement, 'There does not exist any number P such that P encodes a proof of (this statement) in Peano arithmetic' or in simpler terms 'I am not provable in Peano arithmetic'. If we assume firstorder arithmetic is consistent and sound, then no proof of this statement within firstorder arithmetic exists, which means the statement is true but can't be proven within the system. That's Godel's Theorem.
"Er... no."
No?
"No. I've heard rumors that Godel's Incompleteness Theorem is horribly misunderstood in your Everett branch. Have you heard of Godel's Completeness Theorem?"
Is that a thing?
"Yes! Godel's Completeness Theorem says that, for any collection of firstorder statements, every semantic implication of those statements is syntactically provable within firstorder logic. If something is a genuine implication of a collection of firstorder statements  if it actually does follow, in the models pinned down by those statements  then you can prove it, within firstorder logic, using only the syntactical rules of proof, from those axioms."
Standard and Nonstandard Numbers
Followup to: Logical Pinpointing
"Oh! Hello. Back again?"
Yes, I've got another question. Earlier you said that you had to use secondorder logic to define the numbers. But I'm pretty sure I've heard about something called 'firstorder Peano arithmetic' which is also supposed to define the natural numbers. Going by the name, I doubt it has any 'secondorder' axioms. Honestly, I'm not sure I understand this secondorder business at all.
"Well, let's start by examining the following model:"
"This model has three properties that we would expect to be true of the standard numbers  'Every number has a successor', 'If two numbers have the same successor they are the same number', and '0 is the only number which is not the successor of any number'. All three of these statements are true in this model, so in that sense it's quite numberlike "
And yet this model clearly is not the numbers we are looking for, because it's got all these mysterious extra numbers like C and 2*. That C thing even loops around, which I certainly wouldn't expect any number to do. And then there's that infiniteinbothdirections chain which isn't corrected to anything else.
"Right, so, the difference between firstorder logic and secondorder logic is this: In firstorder logic, we can get rid of the ABC  make a statement which rules out any model that has a loop of numbers like that. But we can't get rid of the infinite chain underneath it. In secondorder logic we can get rid of the extra chain."
View more: Next