The following is sorta-kinda carried on from a recent comments thread, where I was basically saying I wasn't gonna yack about what I'm thinking until I spent the time to fully formalized it. Well, Luke got interested in it, and I spewed the entire sketch and intuition to him, and he asked me to put it up where others can participate. So the following is it.
Basically, Algorithmic Information Theory as started by Solomonoff and Kolmogorov, and then continued by Chaitin, contains a theorem called Chaitin's Incompleteness Theorem, which says (in short, colloquial terms) "you can't prove a 20kg theorem with 10kg of axioms". Except it says this in fairly precise mathematical terms, all of which are based in the undecidability of the Halting Problem. To possess "more kilograms" of axioms is mathematically equivalent to being able to computationally decide the halting behavior of "more kilograms" of Turing Machines, or to be able to compress strings to smaller sizes.
Now consider the Curry-Howard Isomorphism, which says that logical systems as computation machines and logical systems as mathematical logics are, in certain precise ways, the same thing. Now consider ordinal logic as started in Turing's PhD thesis, which starts with ordinary first-order logic and extends it with axioms saying "First-order logic is consistent", "First-order logic extended with the previous axiom is consistent", all the way up to the limiting countable infinity Omega (and then, I believe but haven't checked, further into the transfinite ordinals).
In a search problem with partial information, as you gain more information you're closing in on a smaller and smaller portion of your search space. Thus, Turing's ordinal logics don't violate Goedel's Second Incompleteness Theorem: they specify more axioms, and therefore specify a smaller "search space" of models that are, up to any finite ordinal level, standard models of first-order arithmetic (and therefore genuinely consistent up to precisely that finite ordinal level). Goedel's Completeness Theorem says that theorems of a first-order theory/language are provable iff they are true in every model of that first-order theory/language. The clearest, least mystical, presentation of Goedel's First Incompleteness Theorem is: nonstandard models of first-order arithmetic exist, in which Goedel Sentences are false. The corresponding statement of Goedel's Second Incompleteness Theorem follows: nonstandard models of first-order arithmetic, which are inconsistent, exist. To capture only the consistent standard models of first-order arithmetic, you need to specify the additional axiom "First-order arithmetic is consistent", and so on up the ordinal hierarchy.
Back to learning and AIT! Your artificial agent, let us say, starts with a program 10kg large. Through learning, it acquires, let us say, 10kg of empirical knowledge, giving it 20kg of "mass" in total. Depending on how precisely we can characterize the bound involved in Chaitin's Incompleteness Theorem (he just said, "there exists a constant L which is a function of the 10kg", more or less), we would then have an agent whose empirical knowledge enables it to reason about a 12kg agent. It can't reason about the 12kg agent plus the remaining 8kg of empirical knowledge, because that would be 20kg and it's only a 20kg agent now even with its strongest empirical data, but it can formally prove universally-quantified theorems about how the 12kg agent will behave as an agent (ie: its goal functions, the soundness of its reasoning under empirical data, etc.). So it can then "trust" the 12kg agent, hand its 10kg of empirical data over, and shut itself down, and then "come back online" as the new 12kg agent and learn from the remaining 8kg of data, thus being a smarter, self-improved agent. The hope is that the 12kg agent, possessing a stronger mathematical theory, can generalize more quickly from its sensory data, thus enabling it to accumulate empirical knowledge more quickly and generalize more precisely than its predecessor, thus speeding it through the process of compressing all available information provided by its environment and achieving the reasoning power of something like a Solomonoff Inducer (ie: which has a Turing Oracle to give accurate Kolmogorov complexity numbers).
This is the sketch and the intuition. As a theory, it does one piece of very convenient work: it explains why we can't solve the Halting Problem in general (we do not possess correct formal systems of infinite size with which to reason about halting), but also explains precisely why we appear to be able to solve it in so many of the cases we "care about" (namely: we are reasoning about programs small enough that our theories are strong enough to decide their halting behavior -- and we discover new formal axioms to describe our environment).
So yeah. I really have to go now. Mathematical input and criticism is very welcomed; the inevitable questions to clear things up for people feeling confusion about what's going on will be answered eventually.
Think more like an inductive human being than a deductive formal system: "x doesn't ever halt" would require proof, but "x really seems not to halt for a very long time and seems to contain an infinitely-growing recursion there" gives us very valid reason to believe "x doesn't halt", and guides us in trying to assemble a proof.
(Likewise, as a side note, it can actually be quite difficult to prove "x does halt (on all inputs)", because merely being observed to halt on many inputs is no guarantee at all. Unless the looping/recursion is trivially structural on the input, proving "x does halt" to a type system's termination checker can be quite difficult!)
I think that considering "computable" or "uncomputable" to be ontological properties of universes is... contrary to the philosophical viewpoint being taken here? At least from my point of view here, you're not breaking an Ontological Law of the Platonic Plane of Maths if you "decide the undecidable"; you're breaking a law of thermodynamics (namely, you're trying to act as if you possessed infinite entropy with which to compute). It's physically impossible, not Platonically impossible.
If I may invoke a phrase, I'm basically taking the point of view that a formal system is a map of some mathematical territory (which Turing machines halt), but that it is possible to have increasingly detailed and accurate maps (systems with stronger axioms) of the same territory (increasing the quantity of Turing machines whose behavior we can decide). The hard nut to crack is: how do we acquire better maps?
Further, the universe doesn't necessarily have to be computable in any ontological sense. To talk about the physical, empirical universe, we only need to make predictions about events which take place a finite period of time into the future. If we're composing programs whose termination we want to guarantee, we can use inductive and coinductive type systems to guarantee termination and stepwise termination where we need them, and add hard time limits where we can't be sure.
As Schmidhuber put it regarding Goedel Machines: there may be very rare situations in which some part of reality is just Platonically, ontologically incomputable, but a human being is bound by the same laws of entropy as anything else, so the Platonic, ontological level of things doesn't really matter for purposes of doing real-world math or AI. (We find it convenient to pretend it does matter, but actually we only care about the theorems we can demonstrate to be true among real-world human beings (which includes Goedel Statements for Peano Arithmetic) rather than about Platonic Forms of Pure Randomness.)
Busy beavers seem not to halt for a very long time and spend most of their time in a seemingly infinitely-growing recursion, and then they halt.
You could argue that busy beavers are pathological examp... (read more)