## New forum for MIRI research: Intelligent Agent Foundations Forum

Today, the Machine Intelligence Research Institute is launching a new forum for research discussion: the Intelligent Agent Foundations Forum! It's already been seeded with a bunch of new work on MIRI topics from the last few months.

We've covered most of the (what, why, how) subjects on the forum's new welcome post and the How to Contribute page, but this post is an easy place to comment if you have further questions (or if, maths forbid, there are technical issues *with* the forum instead of *on* it).

But before that, go ahead and check it out!

(Major thanks to Benja Fallenstein, Alice Monday, and Elliott Jin for their work on the forum code, and to all the contributors so far!)

**EDIT 3/22:** Jessica Taylor, Benja Fallenstein, and I wrote forum digest posts summarizing and linking to recent work (on the IAFF and elsewhere) on reflective oracle machines, on corrigibility, utility indifference, and related control ideas, and on updateless decision theory and the logic of provability, respectively! These are pretty excellent resources for reading up on those topics, in my biased opinion.

## Dark Arts 101: Be rigorous, on average

I'm reading George Steiner's 1989 book on literary theory, *Real Presences*. Steiner is a literary theorist who achieved the trifecta of having appointments at Oxford, Cambridge, and Harvard. His book demonstrates an important Dark Arts method of argument.

So far, Steiner's argument appears to be:

- Human language is an undecidable symbol-system.
- Every sentence therefore carries with it an infinite amount of meaning, the accumulation of all connotations, contexts, and historical associations invoked, and invoked by those invocations, etc. Alternately, every sentence contains no meaning at all, since none of those words can refer to things in the world.
- The meaning of a sentence, therefore, is not finite or analyzable, but transcendent.
- The transcendent is the search for God.
- Therefore, all good literature is a search for God.

The critics quoted on the back of the book, and its reviews on Amazon, praise Steiner's rigor and learning. It is impressive. Within a single paragraph he may show the relationship between Homer, 12th-century theological works, Racine, Shakespeare, and Schoenberg. And his care and precision with words is exemplary; I have the impression, even when he speaks of meaning in music or other qualia-laden subjects, that I know exactly what he means.

He was intelligent enough to trace the problems he was grappling with out past the edges of his domain of expertise. The key points of his argument lie not in literary theory, but in information theory, physics, artificial intelligence, computability theory, linguistics, and transfinite math.

Unfortunately, he knows almost nothing about any of those fields, and his language is precise enough to be wrong, which he is when he speaks on any of those subjects. How did he get away with it?

Answer: He took a two-page argument about things he knew little about, spread it across 200 pages, and filled the gaps with tangential statements of impressive rigor and thoroughness on things he was expert in.

## Logics for Mind-Building Should Have Computational Meaning

**The Workshop**

Late in July I organized and held MIRIx Tel-Aviv with the goal of investigating the currently-open (to my knowledge) Friendly AI problem called "logical probability": the issue of assigning probabilities to formulas in a first-order proof system, in order to use the reflective consistency of the probability predicate to get past the Loebian Obstacle to building a self-modifying reasoning agent that will trust itself and its successors. Vadim Kosoy, Benjamin and Joshua Fox, and myself met at the Tel-Aviv Makers' Insurgence for six hours, and each presented our ideas. I spent most of it sneezing due to my allergies to TAMI's resident cats.

My idea was to go with the proof-theoretic semantics of logic and attack computational construction of logical probability via the Curry-Howard Isomorphism between programs and proofs: this yields a rather direct translation between computational constructions of logical probability and the learning/construction of an optimal function from sensory inputs to actions required by Updateless Decision Theory.

The best I can give as a mathematical result is as follows:

** **

The capital is a set of hypotheses/axioms/assumptions, and the English letters are metasyntactic variables (like "foo" and "bar" in programming lessons). The lower-case letters denote proofs/programs, and the upper-case letters denote propositions/types. The turnstile just means "deduces": the *judgement* can be read here as "an agent whose set of beliefs is denoted will believe that the evidence *a* proves the proposition *A*." The performs a "reversed" substitution, with the result reading: "for all *y* proving/of-type *B*, substitute *x* for *y* in *a*". This means that we algorithmically build a new proof/construction/program from *a* in which any and all constructions proving the proposition *B* are replaced with the logically-equivalent hypothesis *x*, which we have added to our hypothesis-set .

Thus the first equation reads, "the probability of *a* proving *A* conditioned on *b* proving *B* equals the probability of *a* proving *A* when we assume the truth of *B* as a hypothesis." The second equation then uses this definition of conditional probability to give the normal Product Rule of probabilities for the logical product (the operator), defined proof-theoretically. I strongly believe I could give a similar equation for the normal Sum Rule of probabilities for the logical sum (the operator) if I could only access the relevant paywalled paper, in which the λμ-calculus acting as an algorithmic interpretation of the natural-deduction system for classical propositional logic (rather than intuitionistic) is given.

The third item given there is an inference rule, which reads, "if *x* is a free variable/hypothesis imputed to have type/prove proposition *A*, not bound in the hypothesis-set , then the probability with which we believe *x* proves *A* is given by the Solomonoff Measure of type *A* in the λμ-calculus". We can define that measure simply as the summed Solomonoff Measure of every program/proof possessing the relevant type, and I don't think going into the details of its construction here would be particularly productive. Free variables in λ-calculus are isomorphic to unproven hypotheses in natural deduction, and so a probabilistic proof system could learn how much to believe in some free-standing hypothesis via Bayesian evidence rather than algorithmic proof.

The final item given here is trivial: anything assumed has probability 1.0, that of a logical tautology.

The upside to invoking the strange, alien λμ-calculus instead of the more normal, friendly λ-calculus is that we thus reason inside classical logic rather than intuitionistic, which means we can use the classical axioms of probability rather than intuitionistic Bayesianism. We *need* classical logic here: if we switch to intuitionistic logics (Heyting algebras rather than Boolean algebras) we do get to make computational decidability a first-class citizen of our logic, but the cost is that we can then believe *only* computationally provable propositions. As Benjamin Fox pointed out to me at the workshop, Loeb's Theorem then becomes a triviality, with real self-trust rendered no easier.

**The Apologia**

My motivation and core idea for all this was very simple: I am a devout computational trinitarian, believing that logic must be set on foundations which describe reasoning, truth, and evidence in a non-mystical, non-Platonic way. The study of first-order logic and *especially* of incompleteness results in metamathematics, from Goedel on up to Chaitin, *aggravates* me in its relentless Platonism, and especially in the way Platonic mysticism about logical incompleteness so often leads to the belief that minds are mystical. (It aggravates other people, too!)

The slight problem which I ran into is that there's a shit-ton I don't know about logic. I am now working to remedy this grievous hole in my previous education. Also, this problem is really deep, actually.

I thus apologize for ending the rigorous portion of this write-up here. Everyone expecting proper rigor, you may now pack up and go home, if you were ever paying attention at all. Ritual seppuku will duly be committed, followed by hors d'oeuvre. My corpse will be duly recycled to make paper-clips, in the proper fashion of a failed LessWrongian.

**The Parts I'm Not Very Sure About**

With any luck, that previous paragraph got rid of all the serious people.

I do, however, still think that the (beautiful) equivalence between computation and logic can yield some insights here. After all, the whole reason for the strange incompleteness results in first-order logic (shown by Boolos in his textbook, I'm told) is that first-order logic, as a reasoning system, contains sufficient computational machinery to encode a Universal Turing Machine. The bidirectionality of this reduction (Hilbert and Gentzen both have given computational descriptions of first-order proof systems) is just another demonstration of the equivalence.

In fact, it seems to me (right now) to yield a rather intuitively satisfying explanation of why the Gaifman-Carnot Condition (that every instance we see of provides Bayesian evidence in favor of ) for logical probabilities is not computably approximable. What would we need to interpret the Gaifman Condition from an algorithmic, type-theoretic viewpoint? From this interpretation, we would need a proof of our universal generalization. This would have to be a dependent product of form , a function taking any construction to a construction of type , which itself has type **Prop**. To learn such a dependent function from the examples would be to search for an optimal (simple, probable) construction (program) constituting the relevant proof object: effectively, an individual act of Solomonoff Induction. Solomonoff Induction, however, is already only semicomputable, which would then make a Gaifman-Hutter distribution (is there another term for these?) doubly semicomputable, since even generating it involves a semiprocedure.

The *benefit* of using the constructive approach to probabilistic logic here is that we know perfectly well that however incomputable Solomonoff Induction and Gaifman-Hutter distributions might be, both existing humans and existing proof systems succeed in building proof-constructions for quantified sentences *all the time*, even in higher-order logics such as Coquand's Calculus of Constructions (the core of a popular constructive proof assistant) or Luo's Logic-Enriched Type Theory (the core of a popular dependently-typed programming language and proof engine based on classical logic). Such logics and their proof-checking algorithms constitute, going all the way back to Automath, the first examples of computational "agents" which acquire specific "beliefs" in a mathematically rigorous way, subject to human-proved theorems of soundness, consistency, and programming-language-theoretic completeness (rather than meaning that every true proposition has a proof, this means that every program which does not become operationally stuck has a type and is thus the proof of some proposition). If we want our AIs to believe in accordance with soundness and consistency properties we can prove *before* running them, while being composed of computational artifacts, I personally consider this the foundation from which to build.

Where we *can* acquire probabilistic evidence in a sound and computable way, as noted above in the section on free variables/hypotheses, we can do so for propositions which we cannot algorithmically prove. This would bring us closer to our actual goals of using logical probability in Updateless Decision Theory or of getting around the Loebian Obstacle.

**Some of the Background Material I'm Reading**

Another reason why we should use a Curry-Howard approach to logical probability is one of the simplest possible reasons: the burgeoning field of probabilistic programming is already being built on it. The Computational Cognitive Science lab at MIT is publishing papers showing that their languages are universal for computable and semicomputable probability distributions, and getting strong results in the study of human general intelligence. Specifically: they are hypothesizing that we can dissolve "learning" into "inducing probabilistic programs via hierarchical Bayesian inference", "thinking" into "simulation" into "conditional sampling from probabilistic programs", and "uncertain inference" into "approximate inference over the distributions represented by probabilistic programs, conditioned on some fixed quantity of sampling that has been done."

In fact, one might even look at these ideas and think that, perhaps, an agent which could find some way to sample quickly and more accurately, or to learn probabilistic programs more efficiently (in terms of training data), than was built into its original "belief engine" could then rewrite its belief engine to use these new algorithms to perform strictly better inference and learning. Unless I'm as completely wrong as I usually am about these things (that is, very extremely completely wrong based on an utterly unfounded misunderstanding of the whole topic), it's a potential engine for recursive self-improvement.

They also have been studying how to implement statistical inference techniques for their generate modeling languages which do not obey Bayesian soundness. While most of machine learning/perception works according to error-rate minimization rather than Bayesian soundness (exactly because Bayesian methods are *often* too computationally expensive for real-world use), I would prefer someone at least study the implications of employing unsound inference techniques for more general AI and cognitive-science applications in terms of how often such a system would "misbehave".

Many of MIT's models are currently dynamically typed and appear to leave type soundness (the logical rigor with which agents come to believe things by deduction) to future research. And yet: they got to this problem first, so to speak. We really ought to be collaborating with them, with the full-time grant-funded academic researchers, rather than trying to armchair-reason our way to a full theory of logical probability as a large group of amateurs or part-timers and only a small core cohort of full-time MIRI and FHI staff investigating AI safety issues.

(I admit to having a nerd crush, and I am actually planning to go visit the Cocosci Lab this coming week, and want/intend to apply to their PhD program.)

They have also uncovered something else I find highly interesting: human learning of both concepts and causal frameworks seems to take place via hierarchical Bayesian inference, gaining a "blessing of abstraction" to countermand the "curse of dimensionality". The natural interpretation of these abstractions in terms of constructions and types would be that, as in dependently-typed programming languages, constructions have types, and types are constructions, but for hierarchical-learning purposes, it would be useful to suppose that *types* have specific, structured types more informative than **Prop** or **Type**_{n} (for some universe level *n*). Inference can then proceed from giving constructions or type-judgements as evidence at the bottom level, up the hierarchy of types and meta-types to give probabilistic belief-assignments to very general knowledge. Even very different objects could have similar meta-types at some level of the hierarchy, allowing hierarchical inference to help transfer Bayesian evidence between seemingly different domains, giving insight into how efficient general intelligence can work.

**Just-for-fun Postscript**

If we really buy into the model of thinking as conditional simulation, we can use that to dissolve the modalities "possible" and "impossible". We arrive at (by my count) three different ways of considering the issue computationally:

- Conceivable/imaginable: the generative models which constitute my current beliefs do or do not yield a path to make some logical proposition true or to make some causal event happen (planning can be done as inference, after all), with or without some specified level of probability.
- Sensibility/absurdity: the generative models which constitute my current beliefs place a desirably high or undesirably low probability on the known path(s) by which a proposition might be true or by which an event might happen. The level which constitutes "desirable" could be set as the value for a hypothesis test, or some other value determined decision-theoretically. This could relate to Pascal's Mugging: how probable must something be before I consider it
*real*rather than an artifact of my own hypothesis space? - Consistency or Contradiction: the generative models which constitute my current beliefs, plus the hypothesis that some proposition is true or some event can come about, do or do not yield a logical contradiction with some probability (that is, we should believe the contradiction exists only to the degree we believe in our existing models in the first place!).

I mostly find this fun because it lets us talk rigorously about when we should "shut up and do the 1,2!impossible" and when something is very definitely 3!impossible.

## Biases of Intuitive and Logical Thinkers

Any intuition-dominant thinker who's struggled with math problems or logic-dominant thinker who's struggled with small-talk knows how difficult and hopeless the experience feels like. For a long time I was an intuition thinker, then I developed a logical thinking style and soon it ended up dominating -- granting me the luxury of experiencing both kinds of struggles. I eventually learned to apply the thinking style better optimized for the problem I was facing. Looking back, I realized why I kept sticking to one extreme.

I hypothesize that one-sided thinkers develop biases and tendencies that prevent them from improving their weaker mode of thinking. These biases cause a positive feedback loop that further skews thinking styles in the same direction.

The reasons why one style might be overdeveloped and the other underdeveloped vary greatly. Genes have a strong influence, but environment also plays a large part. A teacher may have inspired you to love learning science at a young age, causing you to foster to a thinking style better for learning science. Or maybe you grew up very physically attractive and found socializing with your peers a lot more rewarding than studying after school, causing you to foster a thinking style better for navigating social situations. Environment can be changed to help develop certain thinking styles, but it should be supplementary to exposing and understanding the biases you already have. Entering an environment that penalizes your thinking style can be uncomfortable, stressful and frustrating without being prepared. (Such a painful experience is part of why these biases cause a positive feedback loop, by making us avoid environments that require the opposite thinking style.)

Despite genetic predisposition and environmental circumstances, there's room for improvement and exposing these biases and learning to account for them is a great first step.

Below is a list of a few biases that worsen our ability to solve a certain class of problems and keep us from improving our underdeveloped thinking style.

## Intuition-dominant Biases

**Overlooking crucial details**

Details matter in order to understand technical concepts. Overlooking a word or sentence structure can cause complete misunderstanding -- a common blunder for intuition thinkers.

Intuition is really good at making fairly accurate predictions without complete information, enabling us to navigate the world without having a deep understanding of it. As a result, intuition trains us to experience the feeling we understand something without examining every detail. In most situations, paying close attention to detail is unnecessary and sometimes dangerous. When learning a technical concept, every detail matters and the premature feeling of understanding stops us from examining them.

This bias is one that's more likely to go away once you realize it's there. You often don't know what details you're missing after you've missed them, so merely remembering that you tend to miss important details should prompt you to take closer examinations in the future. **Expecting solutions to sound a certain way**

The Internship has a great example of this bias (and a few others) in action. The movie is about two middle-aged unemployed salesmen (intuition thinkers) trying to land an internship with Google. Part of Google's selection process has the two men participate in several technical challenges. One challenge required the men and their team to find a software bug. In a flash of insight, Vince Vaughn's character, Billy, shouts "Maybe the answer is in the question! Maybe it has something to do with the word bug. A fly!" After enthusiastically making several more word associations, he turns to his team and insists they take him seriously.

Why is it believable to the audience that Billy can be so confident about his answer?

Billy's intuition made an association between the challenge question and riddle-like questions he's heard in the past. When Billy used his intuition to find a solution, his confidence in a riddle-like answer grew. Intuition recklessly uses irrelevant associations as reasons for narrowing down the space of possible solutions to technical problems. When associations pop in your mind, it's a good idea to legitimize those associations with supporting reasons. **Not recognizing precise language**

Intuition thinkers are multi-channel learners -- all senses, thoughts and emotions are used to construct a complex database of clustered knowledge to predict and understand the world. With robust information-extracting ability, correct grammar/word-usage is, more often than not, unnecessary for meaningful communication.

Communicating technical concepts in a meaningful way requires precise language. Connotation and subtext are stripped away so words and phrases can purely represent meaningful concepts inside a logical framework. Intuition thinkers communicate with imprecise language, gathering meaning from context to compensate. This makes it hard for them to recognize when to turn off their powerful information extractors.

This bias explains part of why so many intuition thinkers dread math "word problems". Introducing words and phrases rich with meaning and connotation sends their intuition running wild. It's hard for them to find correspondences between words in the problem and variables in the theorems and formulas they've learned.

The noise intuition brings makes it hard to think clearly. It's hard for intuition thinkers to tell whether their automatic associations should be taken seriously. Without a reliable way to discern, wrong interpretations of words go undetected. For example, without any physics background, an intuition thinker may read the statement "Matter can have both wave and particle properties at once" and believe they completely understand it. Unrelated associations of what matter, wave and particle mean, blindly take precedence over technical definitions.

The slightest uncertainty about what a sentence means should raise a red flag. Going back and finding correspondence between each word and how it fits into a technical framework will eliminate any uncertainty. **Believing their level of understanding is deeper than what it is**

Intuition works on an unconscious level, making intuition thinkers unaware of how they know what they know. Not surprisingly, their best tool to learn what it means to understand is intuition. The concept "understanding" is a collection of associations from experience. You may have learned that part of understanding something means being able to answer questions on a test with memorized factoids, or knowing what to say to convince people you understand, or just knowing more facts than your friends. These are not good methods for gaining a deep understanding of technical concepts.

When intuition thinkers optimize for understanding, they're really optimizing for a fuzzy idea of what they think understanding means. This often leaves them believing they understand a concept when all they've done is memorize some disconnected facts. Not knowing what it feels like to have deeper understanding, they become conditioned to always expect some amount of surprise. They can feel max understanding with less confidence than logical thinkers when they feel max understanding. This lower confidence disincentivizes intuition thinkers to invest in learning technical concepts, further keeping their logical thinking style underdeveloped.

One way I overcame this tendency was to constantly ask myself "why" questions, like a curious child bothering their parents. The technique helped me uncover what used to be unknown unknowns that made me feel overconfident in my understanding.

**Logic-dominant Biases**

**Ignoring information they cannot immediately fit into a framework**

Logical thinkers have and use intuition -- problem is they don't feed it enough. They tend to ignore valuable intuition-building information if it doesn't immediately fit into a predictive model they deeply understand. While intuition thinkers don't filter out enough noise, logical thinkers filter too much.

For example, if a logical thinker doesn't have a good framework for understanding human behavior, they're more likely to ignore visual input like body language and fashion, or auditory input like tone of voice and intonation. Human behavior is complicated, there's no framework to date that can make perfectly accurate predictions about it. Intuition can build powerful models despite working with many confounding variables.

Bayesian probability enables logical thinkers to build predictive models from noisy data without having to use intuition. But even then, the first step of making a Bayesian update is data collection.

Combatting this tendency requires you to pay attention to input you normally ignore. Supplement your broader attentional scope with a researched framework as a guide. Say you want to learn how storytelling works. Start by grabbing resources that teach storytelling and learn the basics. Out in the real-world, pay close attention to sights, sounds, and feelings when someone starts telling a story and try identifying sensory input to the storytelling elements you've learned about. Once the basics are subconsciously picked up by habit, your conscious attention will be freed up to make new and more subtle observations. **Ignoring their emotions**

Emotional input is difficult to factor, especially because you're emotional at the time. Logical thinkers are notorious for ignoring this kind of messy data, consequently starving their intuition of emotional data. Being able to "go with your gut feelings" is a major function of intuition that logical thinkers tend to miss out on.

Your gut can predict if you'll get along long-term with a new SO, or what kind of outfit would give you more confidence in your workplace, or if learning tennis in your free time will make you happier, or whether you prefer eating a cheeseburger over tacos for lunch. Logical thinkers don't have enough data collected about their emotions to know what triggers them. They tend to get bogged down and mislead with objective, yet trivial details they manage to factor out. A weak understanding of their own emotions also leads to a weaker understanding of other's emotions. You can become a better empathizer by better understanding yourself.

You could start from scratch and build your own framework, but self-assessment biases will impede productivity. Learning an existing framework is a more realistic solution. You can find resources with some light googling and I'm sure CFAR teaches some good ones too. You can improve your gut feelings too. One way is making sure you're always consciously aware of the circumstances you're in when experiencing an emotion.**Making rules too strict**

Logical thinkers build frameworks in order to understand things. When adding a new rule to a framework, there's motivation to make the rule strict. The stricter the rule, the more predictive power, the better the framework. When the domain you're trying to understand has multivariable chaotic phenomena, strict rules are likely to break. The result is something like the current state of macroeconomics: a bunch of logical thinkers preoccupied by elegant models and theories that can only exist when useless in practice.

Following rules that are too strict can have bad consequences. Imagine John the salesperson is learning how to make better first impressions and has built a rough framework so far. John has a rule that smiling always helps make people feel welcomed the first time they meet him. One day he makes a business trip to Russia to meet with a prospective client. The moment he meet his russian client, he flashes a big smile and continues to smile despite negative reactions. After a few hours of talking, his client reveals she felt he wasn't trustworthy at first and almost called off the meeting. Turns out that in Russia smiling to strangers is a sign of insincerity. John's strict rule didn't account for cultural differences, blindsiding him from updating on his clients reaction, putting him in a risky situation.

The desire to hold onto strict rules can make logical thinkers susceptible to confirmation bias too. If John made an exception to his smiling rule, he'd feel less confident about his knowledge of making first impressions, subsequently making him feel bad. He may also have to amend some other rule that relates to the smiling rule, which would further hurt his framework and his feelings.

When feeling the urge to add on a new rule, take note of circumstances in which the evidence for the rule was found in. Add exceptions that limit the rule's predictive power to similar circumstances. Another option is to entertain multiple conflicting rules simultaneously, shifting weight from one to the other after gathering more evidence.

## Robust Cooperation in the Prisoner's Dilemma

I'm proud to announce the preprint of Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic, a joint paper with Mihaly Barasz, Paul Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire (me), and Eliezer Yudkowsky.

This paper was one of three projects to come out of the 2nd MIRI Workshop on Probability and Reflection in April 2013, and had its genesis in ideas about formalizations of decision theory that have appeared on LessWrong. (At the end of this post, I'll include links for further reading.)

Below, I'll briefly outline the problem we considered, the results we proved, and the (many) open questions that remain. Thanks in advance for your thoughts and suggestions!

## Background: Writing programs to play the PD with source code swap

(If you're not familiar with the Prisoner's Dilemma, see here.)

The paper concerns the following setup, which has come up in academic research on game theory: say that you have the chance to write a computer program **X**, which takes in one input and returns either *Cooperate* or *Defect*. This program will face off against some other computer program **Y**, but with a twist: **X** will receive the source code of **Y** as input, and **Y** will receive the source code of **X** as input. And you will be given your program's winnings, so you should think carefully about what sort of program you'd write!

Of course, you could simply write a program that defects regardless of its input; we call this program **DefectBot**, and call the program that cooperates on all inputs **CooperateBot**. But with the wealth of information afforded by the setup, you might wonder if there's some program that might be able to achieve mutual cooperation in situations where **DefectBot** achieves mutual defection, without thereby risking a sucker's payoff. (Douglas Hofstadter would call this a perfect opportunity for superrationality...)

## Previously known: CliqueBot and FairBot

And indeed, there's a way to do this that's been known since at least the 1980s. You can write a computer program that knows its own source code, compares it to the input, and returns *C* if and only if the two are identical (and *D* otherwise). Thus it achieves mutual cooperation in one important case where it intuitively ought to: when playing against itself! We call this program **CliqueBot**, since it cooperates only with the "clique" of agents identical to itself.

There's one particularly irksome issue with **CliqueBot**, and that's the fragility of its cooperation. If two people write functionally analogous but syntactically different versions of it, those programs will defect against one another! This problem can be patched somewhat, but not fully fixed. Moreover, mutual cooperation might be the best strategy against some agents that are not even functionally identical, and extending this approach requires you to explicitly delineate the list of programs that you're willing to cooperate with. Is there a more flexible and robust kind of program you could write instead?

As it turns out, there is: in a 2010 post on LessWrong, cousin_it introduced an algorithm that we now call **FairBot**. Given the source code of **Y**, **FairBot** searches for a proof (of less than some large fixed length) that **Y** returns *C* when given the source code of **FairBot**, and then returns *C* if and only if it discovers such a proof (otherwise it returns *D*). Clearly, if our proof system is consistent, **FairBot** only cooperates when that cooperation will be mutual. But the really fascinating thing is what happens when you play two versions of **FairBot** against each other. Intuitively, it seems that *either* mutual cooperation or mutual defection would be stable outcomes, but it turns out that if their limits on proof lengths are sufficiently high, they will achieve mutual cooperation!

The proof that they mutually cooperate follows from a bounded version of Löb's Theorem from mathematical logic. (If you're not familiar with this result, you might enjoy Eliezer's Cartoon Guide to Löb's Theorem, which is a correct formal proof written in much more intuitive notation.) Essentially, the asymmetry comes from the fact that both programs are searching for the same outcome, so that a short proof that one of them cooperates leads to a short proof that the other cooperates, and vice versa. (The opposite is not true, because the formal system can't know it won't find a contradiction. This is a subtle but essential feature of mathematical logic!)

## Generalization: Modal Agents

Unfortunately, **FairBot** isn't what I'd consider an ideal program to write: it happily cooperates with **CooperateBot**, when it could do better by defecting. This is problematic because in real life, the world isn't separated into agents and non-agents, and any natural phenomenon that doesn't predict your actions can be thought of as a **CooperateBot** (or a **DefectBot**). You don't want your agent to be making concessions to rocks that happened not to fall on them. (There's an important caveat: some things have utility functions that you care about, but don't have sufficient ability to predicate their actions on yours. In that case, though, it wouldn't be a true Prisoner's Dilemma if your values actually prefer the outcome (*C*,*C*) to (*D*,*C*).)

However, **FairBot** belongs to a promising class of algorithms: those that decide on their action by looking for short proofs of logical statements that concern their opponent's actions. In fact, there's a really convenient mathematical structure that's analogous to the class of such algorithms: the modal logic of provability (known as GL, for Gödel-Löb).

So that's the subject of this preprint: **what can we achieve in decision theory by considering agents defined by formulas of provability logic?**

## Second-Order Logic: The Controversy

**Followup to**: Godel's Completeness and Incompleteness Theorems

"So the question you asked me last time was, 'Why does anyone bother with first-order logic at all, if second-order logic is so much more powerful?'"

Right. If first-order logic can't talk about finiteness, or distinguish the size of the integers from the size of the reals, why even bother?

"The first thing to realize is that first-order theories can still have a *lot *of power. First-order arithmetic does narrow down the possible models by a lot, even if it doesn't narrow them down to a *single *model. You can prove things like the existence of an infinite number of primes, because *every *model of the first-order axioms has an infinite number of primes. First-order arithmetic is never going to prove anything that's *wrong *about the standard numbers. Anything that's true in *all *models of first-order arithmetic will also be true in the *particular *model we call the standard numbers."

Even so, if first-order theory is strictly weaker, why bother? Unless second-order logic is just as incomplete relative to third-order logic, which is weaker than fourth-order logic, which is weaker than omega-order logic -

"No, surprisingly enough - there's tricks for making second-order logic encode any proposition in third-order logic and so on. If there's a collection of third-order axioms that characterizes a model, there's a collection of second-order axioms that characterizes the same model. Once you make the jump to second-order logic, you're *done *- so far as anyone knows (so far as I know) there's *nothing *more powerful than second-order logic in terms of *which models it can characterize*."

Then if there's one spoon which can eat anything, why not just use the spoon?

"Well... this gets into complex issues. There are mathematicians who don't believe there *is *a spoon when it comes to second-order logic."

Like there are mathematicians who don't believe in infinity?

"Kind of. Look, suppose you *couldn't* use second-order logic - you belonged to a species that doesn't have second-order logic, or anything like it. Your species doesn't have any native mental intuition you could use to construct the notion of 'all properties'. And then suppose that, after somebody used first-order set theory to prove that first-order arithmetic had many possible models, you stood around shouting that you believed in only *one *model, what you called the *standard *model, but you couldn't explain what made this model different from any other model -"

Well... a lot of times, even in math, we make statements that genuinely mean something, but take a while to figure out how to define. I think somebody who talked about 'the numbers' would mean something even before second-order logic was invented.

"But here the hypothesis is that you belong to a species that *can't* invent second-order logic, or think in second-order logic, or anything like it."

Then I suppose you want me to draw the conclusion that this hypothetical alien is just standing there shouting about standardness, but its words don't mean anything because they have no way to pin down one model as opposed to another one. And I expect this species is also magically forbidden from talking about all possible subsets of a set?

"Yeah. They can't talk about the largest powerset, just like they can't talk about the smallest model of Peano arithmetic."

Then you could arguably deny that shouting about the 'standard' numbers would mean anything, to the members of this particular species. You might as well shout about the 'fleem' numbers, I guess.

"Right. Even if all the members of this species *did *have a built-in sense that there was a special model of first-order arithmetic that was fleemer than any other model, if that fleem-ness wasn't bound to anything else, it would be meaningless. Just a floating word. Or if all you could do was define fleemness as floobness and floobness as fleemness, you'd have a loop of floating words; and that might give you the impression that each particular word had a meaning, but the loop as a whole wouldn't be connected to reality. That's why it doesn't help to say that the standard model of numbers is the smallest among all possible models of Peano arithmetic, if you can't define 'smallest possible' any more than you can define 'connected chain' or 'finite number of predecessors'."

But second-order logic *does *seem to have consequences first-order logic doesn't. Like, what about all that Godelian stuff? Doesn't second-order arithmetic semantically imply... its own Godel statement? Because the unique model of second-order arithmetic doesn't contain any number encoding a proof of a contradiction from second-order arithmetic? Wait, now I'm confused.

"No, that's correct. It's not paradoxical, because there's no effective way of finding all the *semantic *implications of a collection of second-order axioms. There's no analogue of Godel's Completeness Theorem for second-order logic - no *syntactic* system for deriving *all *the semantic consequences. Second-order logic is *sound*, in the sense that anything syntactically provable from a set of premises, is true in any model obeying those premises. But second-order logic isn't *complete*; there are semantic consequences you can't derive. If you take second-order logic at face value, there's no effectively computable way of deriving all the consequences of what you say you 'believe'... which is a major reason some mathematicians are suspicious of second-order logic. What does it mean to believe something whose consequences you can't derive?"

But second-order logic clearly has *some *experiential consequences first-order logic doesn't. Suppose I build a Turing machine that looks for proofs of a contradiction from first-order Peano arithmetic. If PA's consistency isn't provable in PA, then by the Completeness Theorem there must exist nonstandard models of PA where this machine halts after finding a proof of a contradiction. So first-order logic doesn't tell me that this machine runs forever - maybe it has nonstandard halting times, i.e., it runs at all standard N, but halts at -2* somewhere along a disconnected chain. Only second-order logic tells me there's no proof of PA's inconsistency *and therefore* this machine runs forever. Only second-order logic tells me I should *expect to see this machine keep runnin*g, and *not expect* - note falsifiability - that the machine ever halts.

"Sure, you just used a second-order theory to derive a consequence you didn't get from a first-order theory. But that's not the same as saying that you can *only *get that consequence using second-order logic. Maybe another first-order theory would give you the same prediction."

Like what?

"Well, for one thing, first-order *set theory* can prove that first-order *arithmetic *has a model. Zermelo-Fraenkel set theory can prove the existence of a set such that all the first-order Peano axioms are true about that set. It follows within ZF that sound reasoning on first-order arithmetic will never prove a contradiction. And since ZF can prove that the syntax of classical logic is sound -"

What does it mean for *set theory* to prove that *logic *is sound!?

"ZF can quote formulas as structured, and encode models as sets, and then represent a finite ZF-formula which says whether or not a set of quoted formulas is true about a model. ZF can then prove that no step of classical logic goes from premises that are true inside a set-model, to premises that are false inside a set-model. In other words, ZF can represent the idea 'formula X is semantically true in model Y' and 'these syntactic rules never produce semantically false statements from semantically true statements'."

Wait, then why can't ZF prove *itself *consistent? If ZF believes in all the axioms of ZF, and it believes that logic never produces false statements from true statements -

"Ah, but ZF can't prove that there exists any set which is a model of ZF, so it can't prove the ZF-axioms are consistent. ZF *shouldn't* be able to prove that some set is a model of ZF, because that's not true in all models. Many models of ZF don't contain any *individual *set well-populated enough for that one set to be a model of ZF all by itself."

I'm kind of surprised in a Godelian sense that ZF *can *contain sets as large as the universe of ZF. Doesn't any given set have to be smaller than the whole universe?

"Inside *any particular model* of ZF, every set *within *that model is smaller than that model. But not all models of ZF are the same size; in fact, models of ZF of *every *size exist, by the Lowenheim-Skolem theorem. So you can have *some *models of ZF - some universes in which all the elements collectively obey the ZF-relations - containing individual sets which are larger than *other *entire models of ZF. A set that large is called a *Grothendieck universe* and assuming it exists is equivalent to assuming the existence of 'strongly inaccessible cardinals', sizes so large that you provably can't prove inside set theory that anything that large exists."

Whoa.

(Pause.)

But...

"But?"

I agree you've shown that *one *experiential consequence of second-order arithmetic - namely that a machine looking for proofs of inconsistency from PA, won't be seen to halt - can be derived from first-order set theory. Can you get *all *the consequences of second-order arithmetic in some particular first-order theory?

"You can't get all the *semantic *consequences of second-order logic, taken at face value, in *any *theory or *any *computable reasoning. What about the halting problem? Taken at face value, it's a semantic consequence of second-order logic that any given Turing machine either halts or doesn't halt -"

Personally I find it rather intuitive to imagine that a Turing machine either halts or doesn't halt. I mean, if I'm walking up to a machine and watching it run, telling me that its halting or not-halting 'isn't entailed by my axioms' strikes me as not describing any actual experience I can have with the machine. Either I'll see it halt eventually, or I'll see it keep running forever.

"My point is that the statements we *can *derive from the syntax of current second-order logic is limited by that syntax. And by the halting problem, we shouldn't ever expect a computable syntax that gives us *all *the semantic consequences. There's no possible theory you can *actually use* to get a correct advance prediction about whether an arbitrary Turing machine halts."

Okay. I agree that no computable reasoning, on second-order logic or anything else, should be able to solve the halting problem. Unless time travel is possible, but even then, you shouldn't be able to solve the expanded halting problem for machines that use time travel.

"Right, so the *syntax *of second-order logic can't prove everything. And in fact, it turns out that, in terms of what you can *prove syntactically* using the standard syntax, second-order logic is identical to a many-sorted first-order logic."

Huh?

"Suppose you have a first-order logic - one that doesn't claim to be able to quantify over all possible predicates - which does allow the universe to contain two different sorts of things. Say, the logic uses lower-case letters for all type-1 objects and upper-case letters for all type-2 objects. Like, '∀x: x = x' is a statement over all type-1 objects, and '∀Y: Y = Y' is a statement over all type-2 objects. But aside from that, you use the same syntax and proof rules as before."

Okay...

"Now add an element-relation x∈Y, saying that x is an element of Y, and add some first-order axioms for making the type-2 objects behave like collections of type-1 objects, including axioms for making sure that most describable type-2 collections exist - i.e., the collection X containing just x is guaranteed to exist, and so on. What you can *prove syntactically* in this theory will be identical to what you can prove using the standard syntax of second-order logic - even though the theory doesn't claim that *all possible* collections of type-1s are type-2s, and the theory will have models where many 'possible' collections are missing from the type-2s."

Wait, now you're saying that second-order logic is no more powerful than first-order logic?

"I'm saying that the supposed power of second-order logic derives from *interpreting* it a particular way, and taking on faith that when you quantify over *all properties*, you're actually talking about all properties."

But then second-order arithmetic is no more powerful than first-order arithmetic in terms of what it can actually *prove*?

"2nd-order arithmetic is *way *more powerful than first-order arithmetic. But that's because first-order set theory is more powerful than arithmetic, and adding the syntax of second-order logic corresponds to adding axioms with set-theoretic properties. In terms of which consequences can be *syntactically *proven, second-order arithmetic is more powerful than first-order arithmetic, but *weaker* than first-order set theory. First-order set theory can prove the existence of a model of second-order arithmetic - ZF can prove there's a collection of numbers and sets of numbers which models a many-sorted logic with syntax corresponding to second-order logic - and so ZF can prove second-order arithmetic consistent."

But first-order logic, including first-order set theory, can't even *talk about* the standard numbers!

"Right, but first-order set theory can *syntactically prove *more statements about 'numbers' than second-order arithmetic can prove. And when you combine that with the *semantic *implications of second-order arithmetic not being computable, and with any second-order logic being syntactically identical to a many-sorted first-order logic, and first-order logic having neat properties like the Completeness Theorem... well, you can see why some mathematicians would want to give up entirely on this whole second-order business."

But if you deny second-order logic you *can't even say the word 'finite'*. You would have to believe the word 'finite' was a *meaningless noise*.

"You'd define finiteness relative to whatever first-order model you were working in. Like, a set might be 'finite' only on account of the model not containing any one-to-one mapping from that set onto a smaller subset of itself -"

But that set wouldn't *actually *be finite. There wouldn't actually be, like, only a billion objects in there. It's just that all the mappings which could *prove *the set was infinite would be mysteriously missing.

"According to some *other *model, maybe. But since there is no one true model -"

How is this not crazy talk along the lines of 'there is no one true reality'? Are you saying there's no *really *smallest set of numbers closed under succession, without all the extra infinite chains? Doesn't talking about how these theories have multiple possible models, imply that those possible models are *logical thingies* and one of them actually *does *contain the largest powerset and the smallest integers?

"The mathematicians who deny second-order logic would see that reasoning as invoking an implicit background universe of set theory. Everything you're saying makes sense *relative *to some *particular model* of set theory, which would contain possible models of Peano arithmetic as sets, and could look over those sets and pick out the smallest *in that model*. Similarly, that set theory could look over a proposed model for a many-sorted logic, and say whether there were any subsets *within *the larger universe which were missing from the many-sorted model. Basically, your brain is insisting that it lives inside some *particular *model of set theory. And then, from that standpoint, you could look over some *other *set theory and see how it was missing subsets that *your* theory had."

Argh! No, damn it, I live in the set theory that *really does* have all the subsets, with no mysteriously missing subsets or mysterious extra numbers, or denumerable collections of all possible reals that could like totally map onto the integers if the mapping that did it hadn't gone missing in the Australian outback -

"But *everybody *says that."

Okay...

"Yeah?"

Screw set theory. I live in the *physical universe* where when you run a Turing machine, and keep watching forever *in the physical universe*, you never *experience a time* where that Turing machine outputs a proof of the inconsistency of Peano Arithmetic. Furthermore, I live in a universe where space is *actually *composed of a real field and space is *actually *infinitely divisible and contains *all *the points between A and B, rather than space only containing a denumerable number of points whose existence can be proven from the first-order axiomatization of the real numbers. So to talk about *physics *- forget about mathematics - I've got to use second-order logic.

"Ah. You know, that particular response is not one I have seen in the previous literature."

Yes, well, I'm not a pure mathematician. When I ask whether I want an Artificial Intelligence to think in second-order logic or first-order logic, I wonder how that affects what the AI does in *the actual physical universe*. Here in the *actual physical universe* where times are followed by successor times, I *strongly suspect* that we should only expect to experience *standard *times, and not experience any nonstandard times. I think time is *connected*, and global connectedness is a property I can only talk about using second-order logic. I think that every *particular* time is finite, and yet time has no upper bound - that there are all finite times, but only finite times - and that's a property I can only characterize using second-order logic.

"But if you can't ever tell the difference between standard and nonstandard times? I mean, *local *properties of time can be described using first-order logic, and you can't directly *see *global properties like 'connectedness' -"

But I *can *tell the difference. There are only nonstandard times where a proof-checking machine, running forever, outputs a proof of inconsistency from the Peano axioms. So I don't expect to experience seeing a machine do that, since I expect to experience only standard times.

"Set theory can also prove PA consistent. If you use set theory to define time, you similarly won't expect to see a time where PA is proven inconsistent - those nonstandard integers don't exist in any model of ZF."

Why should I anticipate that my physical universe is restricted to having only the nonstandard times allowed by a *more *powerful set theory, instead of nonstandard times allowed by first-order arithmetic? If I then talk about a nonstandard time where a proof-enumerating machine proves ZF inconsistent, will you say that only nonstandard times allowed by some still more powerful theory can exist? I think it's clear that the way you're deciding which experimental outcomes you'll have to excuse, is by secretly assuming that *only standard times* exist regardless of which theory is required to narrow it down.

"Ah... hm. Doesn't physics say this universe is going to run out of negentropy before you can do an infinite amount of computation? Maybe there's only a bounded amount of time, like it stops before googolplex or something. That can be characterized by first-order theories."

We don't know that for certain, and I wouldn't want to build an AI that just *assumed* lifespans had to be finite, in case it was wrong. Besides, should I use a different *logic *than if I'd found myself in Conway's Game of Life, or something else really infinite? Wouldn't the same sort of creatures evolve in that universe, having the same sort of math debates?

"Perhaps no universe like that *can *exist; perhaps only finite universes can exist, because only finite universes can be uniquely characterized by first-order logic."

You just used the word 'finite'! Furthermore, taken at face value, our own universe *doesn't* look like it has a finite collection of entities related by first-order logical rules. Space and time both look like infinite collections of points - continuous collections, which is a second-order concept - and then to characterize the *size *of that infinity we'd need second-order logic. I mean, by the Lowenheim-Skolem theorem, there aren't just *denumerable *models of first-order axiomatizations of the reals, there's also *unimaginably large cardinal infinities* which obey the same premises, and that's a possibility straight out of H. P. Lovecraft. Especially if there are any *things *hiding in the *invisible cracks of space*."

"How could *you *tell if there were inaccessible cardinal quantities of points hidden inside a straight line? And anything that *locally *looks continuous each time you try to split it at a describable point, can be axiomatized by a first-order schema for continuity."

That brings up another point: Who'd *really *believe that the reason Peano arithmetic works on physical times, is because that whole infinite axiom *schema *of induction, containing for every Φ a *separate *rule saying...

(Φ(0) ∧ (∀x: Φ(x) → Φ(Sx))) → (∀n: Φ(n))

...was used to specify our universe? How improbable would it be for an* infinitely long* list of rules to be true, if there wasn't a unified reason for all of them? It seems much more likely that the *real *reason first-order PA works to describe time, is that all *properties *which are true at a starting time and true of the successor of any time where they're true, are true of all later times; and this generalization over *properties* makes induction hold for *first-order formulas* as a special case. If my native thought process is first-order logic, I wouldn't see the connection between each individual formula in the axiom schema - it would take separate evidence to convince me of each one - they would feel like independent mathematical facts. But after doing *scientific *induction over the fact that many *properties *true at zero, with succession preserving truth, seem to be true everywhere - after generalizing the simple, compact *second-order* theory of numbers and times - *then *you could invent an infinite first-order theory to approximate it.

"Maybe that just says you need to adjust whatever theory of scientific induction you're using, so that it can more easily induct infinite axiom schemas."

But why the heck would you *need *to induct infinite axiom schemas in the first place, if Reality *natively *ran on first-order logic? Isn't it far more likely that the way we ended up with these infinite lists of axioms was that Reality was manufactured - forgive the anthropomorphism - that Reality was manufactured using an underlying schema in which time is a *connected *series of events, and space is a *continuous* field, and these are properties which happen to require second-order logic for humans to describe? I mean, if you picked out first-order theories at random, what's the chance we'd end up inside an infinitely long axiom schema that just *happened *to look like the projection of a compact second-order theory? Aren't we ignoring a sort of *hint*?

"A hint to what?"

Well, I'm not that sure myself, at this depth of philosophy. But I would currently say that finding ourselves in a physical universe where times have successor times, and space looks continuous, seems like a possible *hint *that the Tegmark Level IV multiverse - or the way Reality was manufactured, or whatever - might look more like *causal universes characterizable by compact second-order theories* than *causal universes characterizable by first-order theories*.

"But since any second-order theory can just as easily be *interpreted *as a many-sorted first-order theory with quantifiers that can range over either elements or sets of elements, how could using second-order syntax actually *improve *an Artificial Intelligence's ability to handle a reality like that?"

Good question. One obvious answer is that the AI would be able to induct what *you* would call an infinite axiom schema, as a single axiom - a simple, finite hypothesis.

"There's all *sorts *of obvious hacks to scientific induction of first-order axioms which would let you assign nonzero probability to computable infinite sequences of axioms -"

Sure. So beyond that... I would currently guess that the basic assumption behind 'behaving as if' second-order logic is true, says that the AI should act as if only the 'actually smallest' numbers will ever appear in physics, relative to some 'true' mathematical universe that it thinks it lives in, but knows it can't fully characterize. Which is roughly what I'd say human mathematicians do when they take second-order logic at face value; they assume that there's some *true *mathematical universe in the background, and that second-order logic lets them talk about it.

"And what behaviorally, experimentally distinguishes the hypothesis, 'I live in the true ultimate math with fully populated powersets' from the hypothesis, 'There's some random model of first-order set-theory axioms I happen to be living in'?"

Well... one behavioral consequence is suspecting that your time obeys an infinitely long list of first-order axioms with induction schemas for *every formula*. And then moreover believing that you'll never experience a time when a proof-checking machine outputs a proof that Zermelo-Fraenkel set theory is inconsistent - even though there's *provably *some models with times like that, which fit the axiom schema you just inducted. That sounds like secretly believing that there's a background 'true' set of numbers that you think characterizes physical time, and that it's the *smallest *such set. Another suspicious behavior is that as soon as you suspect Zermelo-Fraenkel set theory is consistent, you suddenly expect not to experience any *physical *time which ZF proves isn't a standard number. You don't think you're in the nonstandard time of some weaker theory like Peano arithmetic. You think you're in the minimal time expressible by *any and all theories*, so as soon as ZF can prove some number doesn't exist in the minimal set, you think that 'real time' lacks such a number. All of these sound like behaviors you'd carry out if you thought there was a single 'true' mathematical universe that provided the best model for describing all *physical *phenomena, like time and space, which you encounter - and believing that this 'true' backdrop used the *largest *powersets and *smallest *numbers.

"How *exactly *do you formalize all that reasoning, there? I mean, how would you *actually *make an AI reason that way?"

Er... I'm still working on that part.

"That makes your theory a bit hard to criticize, don't you think? Personally, I wouldn't be surprised if any such *formalized* reasoning looked just like believing that you live inside a first-order set theory."

I suppose I wouldn't be *too *surprised either - it's hard to argue with the results on many-sorted logics. But if comprehending the physical universe is best done by assuming that real phenomena are characterized by a 'true' mathematics containing *the *powersets and *the *natural numbers - and thus expecting that no mathematical model we can formulate will ever contain any larger powersets or smaller numbers than those of the 'true' backdrop to physics - then I'd call that a moral victory for second-order logic. In first-order logic we aren't even supposed to be able to talk about such things.

"Really? To me that sounds like believing you live inside a model of first-order set theory, and believing that all models of any theories *you *can invent must *also *be sets in the larger model. You can prove the Completeness Theorem inside ZF plus the Axiom of Choice, so ZFC already proves that all consistent theories have models which are sets, although of course it can't prove that ZFC itself is such a theory. So - anthropomorphically speaking - no model of ZFC *expects *to encounter a theory that has fewer numbers or larger powersets than itself. No model of ZFC expects to encounter any quoted-model, any set that a quoted theory entails, which contains larger powersets than the ones in its own Powerset Axiom. A first-order set theory can even make the leap from the finite statement, 'P is true of all my subsets of X', to believing, 'P is true of all my subsets of X that can be described by this denumerable collection of formulas' - it can encompass the jump from a single axiom over 'all my subsets', to a quoted axiom *schema *over formulas. I'd sum all that up by saying, 'second-order logic is how first-order set theory feels from the inside'."

Maybe. Even in the event that neither human nor superintelligent cognition will ever be able to 'properly talk about' unbounded finite times, global connectedness, particular infinite cardinalities, or true spatial continuity, it doesn't follow that Reality is similarly limited in which physics it can privilege.

Part of the sequence *Highly Advanced Epistemology 101 for Beginners*

Previous post: "Godel's Completeness and Incompleteness Theorems"

## Godel's Completeness and Incompleteness Theorems

**Followup to**: Standard and Nonstandard Numbers

So... last time you claimed that using first-order axioms to rule out the existence of nonstandard numbers - other chains of numbers besides the 'standard' numbers starting at 0 - was *forever and truly impossible*, even unto a superintelligence, no matter *how *clever the first-order logic used, even if you came up with an entirely different way of axiomatizing the numbers.

"Right."

How could you, in your finiteness, possibly know that?

"Have you heard of Godel's Incompleteness Theorem?"

Of course! Godel's Theorem says that for every consistent mathematical system, there are statements which are *true *within that system, which can't be *proven* within the system itself. Godel came up with a way to encode theorems and proofs as numbers, and wrote a purely numerical formula to detect whether a proof obeyed proper logical syntax. The basic trick was to use prime factorization to encode lists; for example, the ordered list <3, 7, 1, 4> could be uniquely encoded as:

2^{3} * 3^{7} * 5^{1} * 7^{4}

And since prime factorizations are unique, and prime powers don't mix, you could inspect this single number, 210,039,480, and get the unique ordered list <3, 7, 1, 4> back out. From there, going to an encoding for logical formulas was easy; for example, you could use the 2 prefix for NOT and the 3 prefix for AND and get, for any formulas Φ and Ψ encoded by the numbers #Φ and #Ψ:

¬Φ = 2^{2} * 3^{#Φ}

Φ ∧ Ψ = 2^{3} * 3^{#Φ} * 5^{#Ψ}

It was then possible, by dint of crazy amounts of work, for Godel to come up with a gigantic formula of Peano Arithmetic [](p, c) meaning, 'P encodes a valid logical proof using first-order Peano axioms of C', from which directly followed the formula []c, meaning, 'There exists a number P such that P encodes a proof of C' or just 'C is provable in Peano arithmetic.'

Godel then put in some *further *clever work to invent statements which referred to *themselves*, by having them contain sub-recipes that would reproduce the entire statement when manipulated by another formula.

And then Godel's Statement encodes the statement, 'There does not exist any number P such that P encodes a proof of (this statement) in Peano arithmetic' or in simpler terms 'I am not provable in Peano arithmetic'. If we assume first-order arithmetic is consistent and sound, then no *proof *of this statement *within *first-order arithmetic exists, which means the statement is *true *but can't be proven within the system. That's Godel's Theorem.

"Er... no."

No?

"No. I've heard rumors that Godel's Incompleteness Theorem is horribly misunderstood in your Everett branch. Have you heard of Godel's *Completeness *Theorem?"

Is that a thing?

"Yes! Godel's Completeness Theorem says that, for any collection of first-order statements, *every semantic implication of those statements is syntactically provable within first-order logic*. If something is a genuine implication of a collection of first-order statements - if it actually *does *follow, in the models pinned down by those statements - then you can *prove *it, *within *first-order logic, using *only* the syntactical rules of proof, from those axioms."

## Standard and Nonstandard Numbers

**Followup to**: Logical Pinpointing

"Oh! Hello. Back again?"

Yes, I've got another question. Earlier you said that you *had *to use second-order logic to define the numbers. But I'm pretty sure I've heard about something called 'first-order Peano arithmetic' which is also supposed to define the natural numbers. Going by the name, I doubt it has any 'second-order' axioms. Honestly, I'm not sure I understand this second-order business at all.

"Well, let's start by examining the following model:"

"This model has three properties that we would expect to be true of the standard numbers - 'Every number has a successor', 'If two numbers have the same successor they are the same number', and '0 is the only number which is not the successor of any number'. All three of these statements are true in this model, so in that sense it's quite numberlike -"

And yet this model clearly is *not* the numbers we are looking for, because it's got all these mysterious extra numbers like C and -2*. That C thing even loops around, which I certainly wouldn't expect any number to do. And then there's that infinite-in-both-directions chain which isn't corrected to anything else.

"Right, so, the difference between first-order logic and second-order logic is this: In first-order logic, we can get rid of the ABC - make a statement which *rules out* any model that has a loop of numbers like that. But we can't get rid of the infinite chain underneath it. In second-order logic we can get rid of the extra chain."

## Logical Pinpointing

**Followup to**: Causal Reference, Proofs, Implications and Models

The fact that one apple added to one apple invariably gives two apples helps in the teaching of arithmetic, but has no bearing on the truth of the proposition that 1 + 1 = 2.

-- James R. Newman, The World of Mathematics

*Previous meditation 1:* If we can only meaningfully talk about parts of the universe that can be pinned down by chains of cause and effect, where do we find the fact that 2 + 2 = 4? Or did I just make a meaningless noise, there? Or if you claim that "2 + 2 = 4"*isn't* meaningful or true, then what alternate property does the sentence "2 + 2 = 4" have which makes it so much more useful than the sentence "2 + 2 = 3"?

*Previous **meditation** 2:* It has been claimed that logic and mathematics is the study of which conclusions follow from which premises. But when we say that 2 + 2 = 4, are we really just *assuming* that? It seems like 2 + 2 = 4 was true well before anyone was around to assume it, that two apples equalled two apples before there was anyone to count them, and that we couldn't make it 5 just by assuming differently.

Speaking conventional English, we'd say the sentence 2 + 2 = 4 is "true", and anyone who put down "false" instead on a math-test would be marked wrong by the schoolteacher (and not without justice).

But what can *make* such a belief true, what is the belief *about,* what is the truth-condition of the belief which can make it true or alternatively false? The sentence '2 + 2 = 4' is true if and only if... what?

In the previous post I asserted that the study of logic is the study of which conclusions follow from which premises; and that although this sort of inevitable implication is sometimes called "true", it could more specifically be called "valid", since checking for inevitability seems quite different from comparing a belief to our own universe. And you could claim, accordingly, that "2 + 2 = 4" is 'valid' because it is an inevitable implication of the axioms of Peano Arithmetic.

And yet thinking about 2 + 2 = 4 doesn't really *feel* that way. Figuring out facts about the natural numbers doesn't feel like the operation of making up assumptions and then deducing conclusions from them. It feels like the numbers are just *out* there, and the only point of making up the axioms of Peano Arithmetic was to *allow* mathematicians to talk about them. The Peano axioms might have been convenient for *deducing* a set of theorems like 2 + 2 = 4, but really all of those theorems were true *about *numbers to begin with. Just like "The sky is blue" is true about the sky, regardless of whether it follows from any particular assumptions.

So comparison-to-a-standard does seem to be at work, just as with *physical* truth... and yet this notion of 2 + 2 = 4 seems different from "stuff that makes stuff happen". Numbers don't occupy space or time, they don't arrive in any order of cause and effect, there are no *events* in numberland.

**Meditation: **What are we talking *about* when we talk about numbers? We can't navigate to them by following causal connections - so how do we get there from here?

## Proofs, Implications, and Models

**Followup to**: Causal Reference

From a math professor's blog:

One thing I discussed with my students here at HCSSiM yesterday is the question of what is a proof.

They’re smart kids, but completely new to proofs, and they often have questions about whether what they’ve written down constitutes a proof. Here’s what I said to them.

A proof is a social construct – it is what we need it to be in order to be convinced something is true. If you write something down and you want it to count as a proof, the only real issue is whether you’re completely convincing.

This is not quite the definition I would give of what constitutes "proof" in mathematics - perhaps because I am so used to isolating arguments that are convincing, but ought not to be.

Or here again, from "An Introduction to Proof Theory" by Samuel R. Buss:

There are two distinct viewpoints of what a mathematical proof is. The first view is that proofs are social conventions by which mathematicians convince one another of the truth of theorems. That is to say, a proof is expressed in natural language plus possibly symbols and figures, and is sufficient to convince an expert of the correctness of a theorem. Examples of social proofs include the kinds of proofs that are presented in conversations or published in articles. Of course, it is impossible to precisely define what constitutes a valid proof in this social sense; and, the standards for valid proofs may vary with the audience and over time. The second view of proofs is more narrow in scope: in this view, a proof consists of a string of symbols which satisfy some precisely stated set of rules and which prove a theorem, which itself must also be expressed as a string of symbols. According to this view, mathematics can be regarded as a 'game' played with strings of symbols according to some precisely defined rules. Proofs of the latter kind are called "formal" proofs to distinguish them from "social" proofs.

In modern mathematics there is a much better answer that could be given to a student who asks, "What exactly is a proof?", which does not match *either* of the above ideas. So:

Meditation: What distinguishes a correct mathematical proof from an incorrect mathematical proof - what does it mean for a mathematical proof to be good? And why, in the real world, would anyone ever be interested in a mathematical proof of this type, or obeying whatever goodness-rule you just set down? How could you use your notion of 'proof' to improve the real-world efficacy of an Artificial Intelligence?

View more: Next