I'm pleased to announce a new paper from MIRI: Questions of Reasoning Under Logical Uncertainty.
Abstract:
A logically uncertain reasoner would be able to reason as if they know both a programming language and a program, without knowing what the program outputs. Most practical reasoning involves some logical uncertainty, but no satisfactory theory of reasoning under logical uncertainty yet exists. A better theory of reasoning under logical uncertainty is needed in order to develop the tools necessary to construct highly reliable artificial reasoners. This paper introduces the topic, discusses a number of historical results, and describes a number of open problems.
Following Corrigibility and Toward Idealized Decision Theory, this is the third in a series of six papers motivating MIRI's technical research agenda. This paper mostly motivates and summarizes the state of the field, and contains one very minor new technical result. Readers looking for more technical meat can find it in Paul Christiano's paper Non-Omniscience, Probabilistic Inference, and Metamathematics, published mid-2014. This paper is instead intended to motivate the study of logical uncertainty as relevant to the design of highly reliable smarter-than-human systems. The introduction runs as follows:
Consider a black box with one input chute and two output chutes. The box is known to take a ball in the input chute and then (via some complex Rube Goldberg machine) deposit the ball in one of the output chutes.
An environmentally uncertain reasoner does not know which Rube Goldberg machine the black box implements. A logically uncertain reasoner may know which machine the box implements, and may understand how the machine works, but does not (for lack of computational resources) know how the machine behaves.
Standard probability theory is a powerful tool for reasoning under environmental uncertainty, but it assumes logical omniscience: once a probabilistic reasoner has determined precisely which Rube Goldberg machine is in the black box, they are assumed to know which output chute will take the ball. By contrast, realistic reasoners must operate under logical uncertainty: we often know how a machine works, but not precisely what it will do.
General intelligence, at the human level, mostly consists of reasoning that involves logical uncertainty. Reasoning about the output of a computer program, the behavior of other actors in the environment, or the implications of a surprising observation are all done under logical (in addition to environmental) uncertainty. This would also be true of smarter-than-human systems: constructing a completely coherent Bayesian probability distribution in a complex world is intractable. Any artificially intelligent system writing software or evaluating complex plans must necessarily perform some reasoning under logical uncertainty.
When constructing smarter-than-human systems, the stakes are incredibly high: superintelligent machines could have an extraordinary impact upon humanity (Bostrom 2014), and if that impact is not beneficial, the results could be catastrophic (Yudkowsky 2008). If that system is to attain superintelligence by way of self-modification, logically uncertain reasoning will be critical to its reliability. The initial system's ability must reason about the unknown behavior of a known program (the contemplated self-modification) in order to understand the result of modifying itself.
In order to pose the question of whether a practical system reasons well under logical uncertainty, it is first necessary to gain a theoretical understanding of logically uncertain reasoning. Yet, despite significant research started by Los (1995) and Gaifman (1964), and continued by Halpern (2003), Hutter (2013), Demski (2012), Christiano (2014a) and many, many others, this theoretical understanding does not yet exist.
It is natural to consider extending standard probability theory to include the consideration of worlds which are "logically impossible" (such as where a deterministic Rube Goldberg machine behaves in a way that it doesn't). This gives rise to two questions: What, precisely, are logically impossible possibilities? And, given some means of reasoning about impossible possibilities, what is a reasonable prior probability distribution over them?
This paper discusses the field of reasoning under logical uncertainty. At present, study into logically uncertain reasoning is largely concerned with the problem of reasoning probabilistically about sentences of logic. Sections 2 and 3 discuss the two problems posed above in that context. Ultimately, our understanding of logical uncertainty will need to move beyond the domain of logical sentences; this point is further explored in Section 4. Section 5 concludes by relating these problems back to the design of smarter-than-human systems which are reliably aligned with human interests.
The optimal "A" in my question is a way of assigning probabilities to hypotheses of the form "word x belongs to NP-language X" which are efficiently computable (also, I believe it is possible to extend this at least to languages in L^NP).
Consider a sentence phi in some formal theory T. For any given k, the hypothesis that phi is provable in T by a proof of length at most k is of the above form. The same is true of not-phi. This way we can assign phi the probability
P_k(phi) = P(phi has a proof of length <= k) / (P(phi has a proof of length <= k) + P(not-phi has a proof of length <= k))
It is not hard to see that the properties of A imply that P_k(phi) -> 0/1 when k->infinity for consistent T and phi refutable/provable in T.
I think this establishes some relevance to logical uncertainty in the "classical" sense but it doesn't seem to solve tiling agents since it is still restricted to reasoning within a fixed theory T. However, there are other reasons why NP-estimators might be able to solve tiling agents.
One way to think of the tiling agents problem is the computational impossibility of simulating the successor agent in reasonable time. Now, suppose that we put some restrictions on the form of agents that we allow, namely that our agents are search algorithms that divide the strategy space into "chunks" and perform optimization within each chunk in some way (which can be anything). In the end, the best out of all found strategies is selected. If the parent agent knew which chunk the child agent would select it would be able to simulate it by running it on one chunk only. However, our NP-estimator allows getting rid of exactly the type of exponential slow-downs that the multitude of chunks creates enabling the parent to perform a probabilistic "pseudosimulation".
Possibly this logic can be generalized to show that the estimator can be used to estimate in logarithmic time any computation within NC, thus allowing any agent algorithm as long as it's sufficiently parallelizable. This requires further thought.
It also should be possible to use A directly to construct an efficiently computable approximation of Solomonoff induction. Namely, once we introduce a cutoff on the runtime of the hypotheses we allow within our Solomonoff ensemble, computing Solomonoff induction amounts to a counting problem. However, counting problems have efficient approximations modulo an NP-oracle and it seems that it's possible to replace the oracle by a "probability assigner" to yield an efficient (but obviously less accurate) approximation in P.
This computable induction can be used to remove another obstruction of "pseudosimulating" the child agent, namely the unknown behavior of the environment in the future.