As an exercise, I take a scrap of argumentation, expand it into a tree diagram (using FreeMind), and then formalize the argument (in Automath). This towards the goal of creating  "rationality augmentation" software. In the short term, my suspicion is that such software would look like a group of existing tools glued together with human practices.

About my choice of tools: I investigated Araucaria, Rationale, Argumentative, and Carneades. With the exception of Rationale, they're not as polished graphically as FreeMind, and the rigid argumentation-theory structure was annoying in the early stages of analysis. Using a general-purpose mapping/outlining tool may not be ideal, but it's easy to obtain. The primary reason I used Automath to formalize the argument was because I'm somewhat familiar with it. Another reason is that it's easy to obtain and build (at least, on GNU/Linux).

Automath is an ancient and awesomely flexible proof checker. (Of course, other more modern proof-checkers are often just as flexible, maybe more flexible, and may be more useable.) The amount of "proof checking" done in this example is trivial - roughly, what the checker is checking is: "after assuming all of these bits and pieces of opaque human reasoning, do they form some sort of tree?" - but cutting down a powerful tool leaves a nice upgrade path, in case people start using exotic forms of logic.  However, the argument checkers built into the various argumentation-theory tools do not have such upgrade paths, and so are not really credible as candidates to formalize the arguments on this site.

Here's a piece of argumentation, abstracted from something that I was really thinking about at work:

There aren't any memory leaks in this method, but how would I argue it? If I had tested it with a tool like Valgrind or mtrace, I would have some justification - but I didn't. By eye, it doesn't look like it does any allocations from the heap. Of course, if a programmer violated coding standards, they could conceal an allocation from casual inspection. However, the author of the code wouldn't violate the coding standard that badly. Why do I believe that they wouldn't violate the coding standard? Well, I can't think of a reason for them to violate the coding standard deliberately, and they're competent enough to avoid making such an egregious mistake by accident.

In order to apply the argument diagramming technique, try to form a tree structure, with claims supported by independent reasons. Reasons are small combinations of claims, which do not support the conclusion alone, but do support it together. This is what I came up with (png, svg, FreeMind's native mm format).

Some flaws in this analysis:

1. The cursory inspection might be an independent reason for believing there are no allocations.

2. There isn't any mention in the analysis of the egregiousness of the mistake.

3. The treatment of benevolence and competence is asymmetrical.

(My understanding with argument diagramming so far is that bouncing between "try to diagram the argument as best you can" and "try to explain in what ways the diagram is still missing aspects of the original text" is helpful.)

In order to translate this informal argument into Automath, I used a very simple logical framework. Propositions are a kind of thing, and each justification is of some proposition. There are no general-purpose inference rules like modus ponens or "From truth of a proposition, conclude necessity of that proposition". This means that every reason in the argument needs to assume its own special-case inference rule, warranting that step.

# A proposition is a kind of thing, by assumption.
* proposition
  : TYPE
  = PRIM

# a justification of a proposition is a kind of thing, by assumption
* [p : proposition]
  justification
  : TYPE
  = PRIM

To translate a claim in this framework, we create (by assuming) a the proposition, with a chunk like this:

* claim_foo
: proposition
= PRIM

To translate a reason for a claim "baz" from predecessor claims "foo" and "bar", first we create (by assuming) the inference rule, with a chunk like this:

* [p:justification(claim_foo)]
[q:justification(claim_bar)]
reason_baz
: justification(claim_baz)
= PRIM

Secondly, we actually use the inference rule (exactly once), with a chunk like this:

* justification_baz
: justification(claim_baz)
= reason_baz(justification_foo, justification_bar)

My formalization of the above scrap of reasoning is here. Running "aut" is really easy. Do something like:

aut there_are_no_leaks.aut

Where to go next:

  • Formalize more arguments, and bigger arguments.
  • Investigate more tools (a more modern proof checker?).
  • The translation from Freemind diagram to Automath is probably automatable via FreeMind's XSLT export.
  • How do we incorporate other tools into the workflow? For example Google, Genie&Smile, or some sort of checklist tool?
  • What argument schemes (reflective inconsistency? don't be continually surprised?) do we actually use?
  • Try to get some experimental evidence that formalization actually helps rationality somehow.
New Comment
22 comments, sorted by Click to highlight new comments since:

Strongly in support. There are probably benefits we aren't yet imagining from knowing the things that formal tools like this would help us know. (As a special case: Intelligence enhancement as existential risk mitigation.)

Seconding the recommendation to learn about Bayesian inference engines like SMILE, PyMC (use with ipython in pylab mode for interactive analysis and plotting), the Matlab Bayes Net Toolbox, VIBES, JAGS, or the Open Source Probabilistic Networks Library, and play with interfaces like GeNIe, the Matlab interface to BUGS, the R interface to BUGS, BUGS itself (if you have an experienced user to look over your shoulder and explain which buttons need to be pushed in what order after clicking on which model source code windows or typing variable names into which boxes), or the Netica demo version.

It will take a long time to build a community that knows how to use these formal tools. And most arguments can't be represented on their object level in simple Bayesian networks or even Bayesian decision networks. Some forms of evidence or reasoning (especially assertions of expert opinion or human judgment) can fail in correlated ways which break independence assumptions. To formalize this would require hyperprior models for the reliability of each kind of possible connection from reality to observation, and possibly also hierarchical models of this reliability, to transfer evidence about reliability from arguments to similar arguments. In some arguments, some of the evidence could have been faked or selectively presented; to formalize this would require game theory. Other arguments are about public policy and would need game theory to model the incentives that policies would create. Some arguments might be too complex to formalize completely or compute formal conclusions to; these might require meta-arguments about reliability of approximations. A user community will need experience with many arguments before it can know which formalizations can be used in what situations.

Also related: David Brin's disputation arenas essay, and the extensive hierarchically organized resources list on the EU Mapping Controversies on Science for Politics project-funded Mapping Controversies web site.

Reading this comment makes me think that the problem of building formal tools to aid inference is um, not exactly a million miles away from the object-level problem of building AGI. Hierarchical models, bayes nets, meta-arguments about reliability of approximations. Perhaps the next thing we'll be asking for is some way to do interventions in the real world to support causal reasoning?

What evidence would convince you personally that some purported Seed AGI was in fact Friendly?

Sound arguments are central to Friendly AGI research. The arguments cannot be too long, either. If someone hands you a giant proof, and you accept it as evidence without understanding it, then your implicit argument is something like "from expert testimony" or "long proofs that look valid to spot-checks are likely to be sound".

Perhaps the next thing we'll be asking for is some way to do interventions in the real world to support causal reasoning?

Do you mean formal arguments that someone should do an experiment because knowledge of the results would improve future decision-making? These would be a special case of formal arguments that someone should do something.

Do you mean that if someone automated the experiments there might be an AI danger? I know a lot of arguments for ways there could be dangers similar to AI dangers from these kinds of tools, but they are mostly about automation of the reasoning.

Do you mean formal arguments that someone should do an experiment because knowledge of the results would improve future decision-making?

yes, for example.

I was just pointing out that the two are similar problems: improving human group decisionmaking using formal tools vs. making a thinking machine. The implications of this are complex, and could include both dangers and benefits. One benefit is that as AI gets more advanced, better decisionmaking tools will become available. For example, the notion of a "cognitive bias" probably depends causally upon us having a notion of normative rationality to compare humans with.

This is really interesting. I had the idea to do something like this about 10 years ago (when I was 15). I never did get anywhere with it though, and I haven't spent very much time at all working on it. Seems that you've gotten a lot farther than I have. In fact, I think this article, about practical guidelines for debating productively, is much more useful than anything I've done on formalization.

I think the central idea of the project is reducing arguments to their constituent parts. I call the basic parts "atoms". Practically, there's no way to completely reduce them to first principles, the way it's done in formalizations of mathematical systems (like the libraries of Isabelle, Coq, or Metamath). Doing that would probably be GAI-complete. So the best we can do is come up with some practical rules of thumb, some set of criteria that can be applied relatively unambiguously to determine whether some assertion or whatnot makes a good atom, or needs to be decomposed further. And further analysis, looking deeper into the issue, can always uncover the need for decomposition where it was previously not recognized. In fact, such further decomposition would probably be a central part of explicating new arguments in the system.

I think the first main hurdle, and maybe the last one, is determining what types of atoms are necessary, and what types of relationships between them, and how they correspond to the rules of probability. If you want automated checking, each atom has to be related by a rather basic, simple relationship to other atoms. Otherwise you can't formalize it. So what are the relationship types? What are the atom types?

The system has to have some way of representing facts/beliefs, but those can be disputed. People assign different probabilities to them. So how do you deal with that? You can't automate that away. Maybe you have some way of letting the user fill in their own belief ratios for all the tree's leaf nodes (i.e. the premises) and then propagate those up the tree. Perhaps you give the premises default values, given by the argument's author, or a consensus number, or both.

How do you handle the extra-logical parts of the argument, those relating to semantics? For instance, special definitions of terms used in other atoms. A lot of argument comes down to arguing over definitions.

Your initial piece of argumentation was more readable than any of the later formalizations (either plaintext or graphical), but for more complex arguments the approach may have merit. I was immediately reminded of this ongoing attempt to map William Lane Craig's argument for theism.

I'd like to emphasize that readability is not the goal - drawing correct conclusions is.

An argument based on a colorful analogy might be readable, but not sound. (In particular, I'm thinking of neural nets, genetic algorithms, and simulated annealing, each of which are motivated by a colorful analogy).

Suppose someone gives you a thousand-line piece of code, never previously compiled or run. Many pieces of code crash when compiled and run for the first time. Which is stronger evidence, a well-written textual argument that it will not crash, or running it once?

With logical links as weak as those in your example, most arguments longer than 10 steps will reach incorrect conclusions anyway.

Agreed.

I have some notion that an argument tree could be translated or incorporated into a Bayes net model. There's an intuition (which we share) that, given links with a particular imperfect strength, arguments consisting of a few long chains are weaker than arguments that are "bushy" (offering many independent reasons for the conclusion). A Bayes net model would quantify that intuition.

In a perfect world, bushiness would indeed imply high reliability. Unfortunately in our world the different branches of the bush can have hidden dependencies, either accidental or maliciously inserted - they could even all be subtly different rewordings of one same argument - and the technique won't catch that. So ultimately I don't think we have invented a substitute for common sense just yet.

Such tools can be useful. Here's a link: http://experimentalphilosophy.typepad.com/experimental_philosophy/2009/09/philosophers-just-aint-twitterers.html

"n a new paper with Justin Sytsma, Adam Feltz, and Edouard Machery, we attack the problem from a different angle: Rather than looking at what philosophers do, we examine who philosophers are, and rather than confronting philosophical methodology, we consider philosophical temperament."

The paper uses Tetrad to check their models of correlation and causality, and decides that while their data doesn't rule out a more complex set of interactions, it does support a simple 'philosophical training improves critical thinking' conclusion. Which is interesting, I think.

This web service, although not implemented yet, seems relevant: https://docs.google.com/document/d/1GTTFO7RgEAJxy8HRUprCIKZYpmF4KJiVAGRHXF_Sa70

Anyone interested in contacting me about this or talking with me about it, feel free to get in touch (contact info can be found at the bottom of the document).

[-]DS3618-20

I think you're re-inventing the wheel here.

"This towards the goal of creating "rationality augmentation" software. In the short term, my suspicion is that such software would look like a group of existing tools glued together with human practices."

Look at current work in AI, automated reasoning systems, and automated theorem proving.

Yes, these are old ideas. Argument diagramming (in a more complex form) was advocated by Wigmore in 1931, and N.G. de Bruijn developed Automath in 1968.

My guess is that many people read and write arguments (e.g. on this site) using text, and only bring out a scrap of paper or a calculator app on special occasions.

There are very capable research tools like SOAR, Twelf, or Prover9 (and this list is nowhere near exhaustive). If you have a workflow for using a more modern automated reasoning tool to understand/improve natural-language, "philosophical" arguments, please do post it.

See cousin_it's previous post on the formalization of mathematics. As I stated in a comment in that thread, formalized mathematics tools are still extremely primitive from a HCI standpoint.

Unfortunately, philosophical arguments are not likely to be amenable to formalized reasoning. Paul Graham has pointed this out in one of his essays:

Philosophy is like math's ne'er-do-well brother. It was born when Plato and Aristotle looked at the works of their predecessors and said in effect "why can't you be more like your brother?" ... Math is the precise half of the most abstract ideas, and philosophy the imprecise half. Bad math is merely boring, whereas bad philosophy is nonsense. And yet there are some good ideas in the imprecise half.

These tools are definitely very primitive. If you look at it from a "glass is half full" standpoint, that is a good thing - it means nobody else has gotten there first! :)

[+]DS3618-140