I've always wondered why there isn't more interest in automated provers. What happens when you make a term-rewriting system with a taste for symmetric, interrelated systems of theorems and have it just search for proofs (and shorter versions of those proofs) for a few years? Do fully artificial mathematicians ever come up with novel theorems? Are they any good at condensing proofs enough that they become comprehensible to humans?
Edit: Had a cursory look through lesswrong history
A post by JonahSinick from Jul 2013, asking the very question Progress on automated mathematical theorem proving?
2015, JoshuaZ: I predict that within 20 years a major conjecture will be made that is essentially found by a computer with no human intervention. Note that this has already happened in some limited contexts for minor math problems. https://cs.uwaterloo.ca/journals/JIS/colton/joisol.html
2013, Kaj: There's apparently an annual automated theorem proving competition, looking at the kinds of problems there might be useful.
The competition still seems to be active. Last year's winners were Satallax 3.3, Vampire 4.3 (in three categories), iProver 2.8, and MaLARea 0.6
This is a great post! Thank you in particular for the Stephen Boyd recommendation.
~~~~
In terms of frameworks and ML, you might be interested in this blog post announcing the release of the DiffEqFlux.jl package in the Julia programming language. I was reminded of this because it seems to be an example of merging already-mature frameworks, specifically a machine learning framework (Flux) and a differential equation framework (DifferentialEquations.jl).
A recent paper called Neural Ordinary Differential Equations explored the problem of parameterizing a term in an ODE instead of specifying a sequence of layers, using ML techniques. The benefit of the combined framework is that it allows us to incorporate what we already know about the structure of the problem via the ODE, and apply neural networks to the parts we don't.
I'm not deeply familiar with either framework, but I have been following DifferentialEquations.jl for a while because I had hoped it would let me tackle a few problems I have had my eye on.
~~~~
I feel like Geometric Algebra is pointing in a similar direction to what you describe. It is not directly an applied framework, but rather a kind of ur-framework for teaching and developing other frameworks on top of it. The idea is that they make geometric objects elements of the algebra, which really boils down to making the math match the structure of space and in this way preserve geometric intuitions throughout. Its advocates hope it will serve as a unified framework for physics and engineering applications. The primary theorist is David Hestenes, and his argument for why this was necessary is found here; there are surveys/tutorials from other people who have done work in the field here and here.
Geometric Algebra and its extension Geometric Calculus are both still pretty firmly in the 'reorganizing math' kind of endeavor, with lots of priority put on proving results and developing algorithms. But there are a few areas where they are growing into an applied framework of the kind you describe, specifically robotics, computer graphics, and computer vision.
It was through the robotics applications that I found it; they promised an intuitive explanation of quaternions, which otherwise were just this scary way to calculate motion.
Geometric algebra is really neat, thanks for the links. I've been looking for something like that since I first encountered Pauli matrices back in quantum. I would describe it as an improved language for talking about lots of physical phenomena; that makes it a component of a potentially-better interface layer for many different mathematical frameworks. That's really the key to a successful declarative framework: having an interface layer/language which makes it easy to recognize and precisely formulate the kinds of problems the framework can handle.
I'm generally suspicious of anything combining neural nets and nonlinear DEs. As James Mickens would say, putting a numerical solver for a chaotic system in the middle of another already-notoriously-finicky-and-often-unstable-system is like asking Godzilla to prevent Mega-Godzilla from terrorizing Japan. This does not lead to rising property values in Tokyo! That said, it does seem like something along those lines will have to work inside learning frameworks sooner or later, so it's cool to see a nice implementation that puts everything under one roof.
It’s interesting that you mention that; I had never even considered applying the combination blindly to a new problem. I figured it would only really be useful in cases where you knew what kind of physical problem you were working with already.
I can think of two examples for which ML is currently not very valuable but this new trick opens up. First, the blog post gave the example of Maxwell’s Equations - in EE there has been some exploration of things like genetic algorithms for circuit and antenna design, which boils down to a different form of searching the problem space for a solution. The hitch is, we can specify exactly what things to vary and what to keep constant; if we were to use ML it would always be starting from scratch, which is very inefficient. Until now, anyway.
The second one is also from engineering, but on the applied math side. The field of Continuum Mechanics has a certain generalization to thermodynamics called Rational Thermodynamics, which uses a method of moments. The idea there is simply to add another differential term to the field equation for every behavior you want to describe (heat, stress, deformation, etc). As of the 1980’s they had proved both that Navier Stokes is a special case and also that including many new terms does not significantly outperform Navier Stokes. The guess at that time was that it might take hundreds of thousands of terms to get a much better description, which of course was infeasible with analysis by hand. But having a lot of differentials to figure out seems exactly like what this would be good for.
Those are great examples! That's exactly the sort of thing I see the tools currently associated with neural nets being most useful for long term - applications which aren't really neural nets at all. Automated differentiation and optimization aren't specific to neural nets, they're generic mathematical tools. The neural network community just happens to be the main group developing them.
I really look forward to the day when I can bust out a standard DE solver, use it to estimate the frequency of some stable nonlinear oscillator, and then compute the sensitivity of that frequency to each of the DE's parameters with an extra two lines of code.
To your request for examples, my impression is that Black Box Variational Inference is slowly but surely becoming the declarative replacement to MCMC for a lot of generative modeling stuff.
Users don’t need to know what’s going on under the hood; the algorithms and proofs generally “just work” without the user needing to worry about the details. The user’s job is to understand the language of the framework, the interface, and translate their own problems into that language.
Interesting, my gut reaction to this approach as applied to math was "ugh, that sounds horrible, I don't want to ignore the under the hood details, the whole point of math is understanding what's going on" .
Yet when I consider the same approach to programming and computer sciency stuff my reaction is "well duh, of course we're trying to find good abstractions to package away as much of the nitty gritty details as possible, otherwise you can't make/build really big interesting stuff."
I'll think more about why these feel different.
Perhaps the difference is what you're imagining as "under the hood". Nobody wants to think about the axiom of choice when solving a differential equation.
of course we're trying to find good abstractions to package away as much of the nitty gritty details as possible, otherwise you can't make/build really big interesting stuff."
As a user, when chrome wheeled out it's new "you can search bookmarks from the same place you type URLs" "feature" the first response was "where can I disable this in settings, when I don't want it on". When it became apparent they didn't do that, the second response was "how do we change what shows up in this manner" - it may be useful to bookmark this page, for instance, but we might not want it showing up everytime we start typing in lesswrong, and just want to go to the main page. When it became apparent they didn't do that, the third response was "let's try some other browsers".
Users do want to be able to crack open the hood in at least one place - settings, or advanced settings.
Programmers generally distinguish between “imperative” languages in which you specify what to do (e.g. C) versus “declarative” languages in which you specify what you want, and let the computer figure out how to do it (e.g. SQL). Over time, we generally expect programming to become more declarative, as more of the details are left to the compiler/interpreter. Good examples include the transition to automated memory management and, more recently, high-level tools for concurrent/parallel programming.
It’s hard to say what programming languages will look like in twenty or fifty years, but it’s a pretty safe bet that they’ll be a lot more declarative.
I expect that applied mathematics will also become much more declarative, for largely the same reasons: as computers grow in power and software expands its reach, there will be less and less need for (most) humans to worry about the details of rote computation.
What does this look like? Well, let’s start with a few examples of “imperative” mathematics:
Contrast to the declarative counterparts:
In the declarative case, most of the work is in formulating the problem, figuring out what questions to ask, and translating it all into a language which a computer can work with - numbers, or matrices, or systems of equations.
This is all pretty standard commentary at the level of mathematics education, but the real importance is in shaping the goals of applied mathematics. For the past century, the main objectives of mathematical research programs would be things like existence & uniqueness, or exhaustive classification of some objects, or algorithms for solving some problem (a.k.a. constructive solution/proof). With the shift toward declarative mathematics, there will be more focus on building declarative frameworks for solving various kinds of problems.
The best example I know of is convex analysis, in the style taught by Stephen Boyd (course, book). Boyd’s presentation is the user’s guide to convex optimization: it addresses what kinds of questions can be asked/answered, how to recognize relevant applications in the wild, how to formulate problems, what guarantees are offered in terms of solutions, and of course a firehose of examples from a wide variety of fields. In short, it includes exactly the pieces needed to use the tools of convex analysis as a declarative framework. By contrast, the internals of optimization algorithms are examined only briefly, with little depth and a focus on things which a user might need to tweak. Complicated proofs are generally omitted altogether, the relevant results simply stated as tools available for use.
This is what a mature declarative mathematical framework looks like: it provides a set of tools for practitioners to employ on practical problems. Users don’t need to know what’s going on under the hood; the algorithms and proofs generally “just work” without the user needing to worry about the details. The user’s job is to understand the language of the framework, the interface, and translate their own problems into that language. Once they’ve expressed what they want, the tools take over and handle the rest.
That’s the big goal of future mathematical disciplines: provide a practical framework which practitioners can use to solve real-world problems in the wild, without having to know all the little details and gotchas under the hood.
One last example, which is particularly relevant to me and to ML/AI research. One of the overarching goals of probability/statistics/ML is to be able to code up a generative model, pass it into a magical algorithm, and get back parameter estimates and uncertainties. The “language” of generative models is very intuitive and generally easy to work with, making it an excellent interface to a declarative mathematical toolkit. Unfortunately, the behind-the-scenes part of the toolkit remains relatively finicky and inefficient. As of today, the “magical algorithm” part is usually MCMC, which is great in terms of universality but often super-exponentially slow for multimodal problems, especially in high dimensions, and can converge very slowly even in simple unimodal problems. It’s not really reliable enough to use without thinking about what’s under the hood. Better mathematical tools and guarantees are needed before this particular framework fully matures.
If anyone has other examples of maturing or up-and-coming declarative mathematical frameworks, I’d be very interested to hear about them.