Overcoming the Loebian obstacle using evidence logic
In this post I intend to:
- Briefly explain the Loebian obstacle and it's relevance to AI (feel free to skip it if you know what the Loebian obstacle is).
- Suggest a solution in the form a formal system which assigns probabilities (more generally probability intervals) to mathematical sentences (and which admits a form of "Loebian" self-referential reasoning). The method is well-defined both for consistent and inconsistent axiomatic systems, the later being important in analysis of logical counterfactuals like in UDT.
Background
Logic
When can we consider a mathematical theorem to be established? The obvious answer is: when we proved it. Wait, proved it in what theory? Well, that's debatable. ZFC is popular choice for mathematicians, but how do we know it is consistent (let alone sound, i.e. that it only proves true sentences)? All those spooky infinite sets, how do you know it doesn't break somewhere along the line? There's lots of empirical evidence, but we can't prove it, and it's proofs we're interesting in, not mere evidence, right?
Peano arithmetic seems like a safer choice. After all, if the natural numbers don't make sense, what does? Let's go with that. Suppose we have a sentence s in the language of PA. If someone presents us with a proof p in PA, we believe s is true. Now consider the following situations: instead of giving you a proof of s, someone gave you a PA-proof p1 that p exists. After all, PA admits defining "PA-proof" in PA language. Common sense tells us that p1 is a sufficient argument to believe s. Maybe, we can prove it within PA? That is, if we have a proof of "if a proof of s exists then s" and a proof of R(s)="a proof of s exists" then we just proved s. That's just modus ponens.
There are two problems with that.
First, there's no way to prove the sentence L:="for all s if R(s) then s", since it's not a PA-sentence at all. The problem is that "for all s" references s as a natural number encoding a sentence. On the other hand, "then s" references s as the truth-value of the sentence. Maybe we can construct a PA-formula T(s) which means "the sentence encoded by the number s is true"? Nope, that would get us in trouble with the liar paradox (it would be possible to construct a sentence saying "this sentence is false").
Second, Loeb's theorem says that if we can prove L(s):="if R(s) exists then s" for a given s, then we can prove s. This is a problem since it means there can be no way to prove L(s) for all s in any sense, since it's unprovable for s which are unprovable. In other words, if you proved not-s, there is no way to conclude that "no proof of s exists".
What if we add an inference rule Q to our logic allowing to go from R(s) to s? Let's call the new formal system PA1. p1 appended by a Q-step becomes an honest proof of s in PA1. Problem solved? Not really! Now someone can give you a proof of
R1(s):="a PA1-proof of s exists". Back to square one! Wait a second, what if we add a new rule Q1 allowing to go from R1(s) to s? OK, but now we got R2(s):="a PA2-proof of s exists". Hmm, what if add an infinite number of rules Qk? Fine, but now we got Rω(s):="a PAω-proof of s exists". And so on, and so forth, the recursive ordinals are a plenty...
Bottom line, Loeb's theorem works for any theory containing PA, so we're stuck.
AI
Suppose you're trying to build a self-modifying AGI called "Lucy". Lucy works by considering possible actions and looking for formal proofs that taking one of them will increase expected utility. In particular, it has self-modifying actions in its strategy space. A self-modifying action creates essentially a new agent: Lucy2. How can Lucy decide that becoming Lucy2 is a good idea? Well, a good step in this direction would be proving that Lucy2 would only take actions that are "good". I.e., we would like Lucy to reason as follows "Lucy2 uses the same formal system as I, so if she decides to take action a, it's because she has a proof p of the sentence s(a) that 'a increases expected utility'. Since such a proof exits, a does increase expected utility, which is good news!" Problem: Lucy is using L in there, applied to her own formal system! That cannot work! So, Lucy would have a hard time self-modifying in a way which doesn't make its formal system weaker.
As another example where this poses a problem, suppose Lucy observes another agent called "Kurt". Lucy knows, by analyzing her sensory evidence, that Kurt proves theorems using the same formal system as Lucy. Suppose Lucy found out that Kurt proved theorem s, but she doesn't know how. We would like Lucy to be able to conclude s is, in fact, true (at least with the probability that her model of physical reality is correct). Alas, she cannot.
See MIRI's paper for more discussion.
Evidence Logic
Here, cousin_it explains a method to assign probabilities to sentences in an inconsistent theory T. It works as follows. Consider sentence s. Since T is inconsistent, there are T-proofs both of s and of not-s. Well, in a courtroom both sides are allowed to have arguments, why not try the same approach here? Let's weight the proofs as a function of their length, analogically to weighting hypotheses in Solomonoff induction. That is, suppose we have a prefix-free encoding of proofs as bit sequences. Then, it makes sense to consider a random bit sequence and ask whether it is a proof of something. Define the probability of s to be
P(s) := (probability of a random sequence to be a proof of s) / (probability of a random sequence to be a proof of s or not-s)
Nice, but it doesn't solve the Loebian obstacle yet.
I will now formulate an extension of this idea that allows assigning an interval of probabilities [Pmin(s), Pmax(s)] to any sentence s. This interval is a sort of "Knightian uncertainty". I have some speculations how to extract a single number from this interval in the general case, but even without that, I believe that Pmin(s) = Pmax(s) in many interesting cases.
First, the general setting:
- With every sentence s, there are certain texts v which are considered to be "evidence relevant to s". These are divided into "negative" and "positive" evidence. We define sgn(v) := +1 for positive evidence, sgn(v) := -1 for negative evidence.
- Each piece of evidence v is associated with the strength of the evidence strs(v) which is a number in [0, 1]
- Each piece of evidence v is associated with an "energy" function es,v : [0, 1] -> [0, 1]. It is a continuous convex function.
- The "total energy" associated with s is defined to b es := ∑v 2-l(v) es,v where l(v) is the length of v.
- Since es,v are continuous convex, so is es. Hence it attains its minimum on a closed interval which is
[Pmin(s), Pmax(s)] by definition.
- A piece of evidence v for s is defined to be one of the following:
- a proof of s
- sgn(v) := +1
- strs(v) := 1
- es,v(q) := (1 - q)2
- a proof of not-s
- sgn(v) := -1
- strs(v) := 1
- es,v(q) := q2
- a piece of positive evidence for the sentence R-+(s, p) := "Pmin(s) >= p"
- sgn(v) := +1
- strs(v) := strR-+(s, p)(v) p
- es,v(q) := 0 for q > p; strR-+(s, p)(v) (q - p)2 for q < p
- a piece of negative evidence for the sentence R--(s, p) := "Pmin(s) < p"
- sgn(v) := +1
- strs(v) := strR--(s, p)(v) p
- es,v(q) := 0 for q > p; strR--(s, p)(v) (q - p)2 for q < p
- a piece of negative evidence for the sentence R++(s, p) := "Pmax(s) > p"
- sgn(v) := -1
- strs(v) := strR++(s, p)(v) (1 - p)
- es,v(q) := 0 for q < p; strR-+(s, p)(v) (q - p)2 for q > p
- a piece of positive evidence for the sentence R+-(s, p) := "Pmax(s) <= p"
- sgn(v) := -1
- strs(v) := strR+-(s, p)(v) (1 - p)
- es,v(q) := 0 for q < p; strR-+(s, p)(v) (q - p)2 for q > p
- a proof of s
Updateless Intelligence Metrics in the Multiverse
Followup to: Intelligence Metrics with Naturalized Induction using UDT
In the previous post I have defined an intelligence metric solving the duality (aka naturalized induction) and ontology problems in AIXI. This model used a formalization of UDT using Benja's model of logical uncertainty. In the current post I am going to:
- Explain some problems with my previous model (that section can be skipped if you don't care about the previous model and only want to understand the new one).
- Formulate a new model solving these problems. Incidentally, the new model is much closer to the usual way UDT is represented. It is also based on a different model of logical uncertainty.
- Show how to define intelligence without specifying the utility function a priori.
- Since the new model requires utility functions formulated with abstract ontology i.e. well-defined on the entire Tegmark level IV multiverse. These are generally difficult to construct (i.e. the ontology problem resurfaces in a different form). I outline a method for constructing such utility functions.
Problems with UIM 1.0
The previous model postulated that naturalized induction uses a version of Solomonoff induction updated in the direction of an innate model N with a temporal confidence parameter t. This entails several problems:
- The dependence on the parameter t whose relevant value is not easy to determine.
- Conceptual divergence from the UDT philosophy that we should not update at all.
- Difficulties with counterfactual mugging and acausal trade scenarios in which G doesn't exist in the "other universe".
- Once G discovers even a small violation of N at a very early time, it loses all ground for trusting its own mind. Effectively, G would find itself in the position of a Boltzmann brain. This is especially dangerous when N over-specifies the hardware running G's mind. For example assume N specifies G to be a human brain modeled on the level of quantum field theory (particle physics). If G discovers that in truth it is a computer simulation on the merely molecular level, it loses its epistemic footing completely.
UIM 2.0
I now propose the following intelligence metric (the formula goes first and then I explain the notation):
IU(q) := ET[ED[EL[U(Y(D)) | Q(X(T)) = q]] | N]
- N is the "ideal" model of the mind of the agent G. For example, it can be a universal Turing machine M with special "sensory" registers e whose values can change arbitrarily after each step of M. N is specified as a system of constraints on an infinite sequence of natural numbers X, which should be thought of as the "Platonic ideal" realization of G, i.e. an imagery realization which cannot be tempered with by external forces such as anvils. As we shall see, this "ideal" serves as a template for "physical" realizations of G which are prone to violations of N.
- Q is a function that decodes G's code from X e.g. the program loaded in M at time 0. q is a particular value of this code whose (utility specific) intelligence IU(q) we are evaluating.
- T is a random (as in random variable) computable hypothesis about the "physics" of X, i.e a program computing X implemented on some fixed universal computing model (e.g. universal Turing machine) C. T is distributed according to the Solomonoff measure however the expectation value in the definition of IU(q) is conditional on N, i.e. we restrict to programs which are compatible with N. From the UDT standpoint, T is the decision algorithm itself and the uncertainty in T is "introspective" uncertainty i.e. the uncertainty of the putative precursor agent PG (the agent creating G e.g. an AI programmer) regarding her own decision algorithm. Note that we don't actually need to postulate a PG which is "agenty" (i.e. use for N a model of AI hardware together with a model of the AI programmer programming this hardware), we can be content to remain in a more abstract framework.
- D is a random computable hypothesis about the physics of Y, where Y is an infinite sequence of natural numbers representing the physical (as opposed to "ideal") universe. D is distributed according to the Solomonoff measure and the respective expectation value is unconditional (i.e. we use the raw Solomonoff prior for Y which makes the model truly updateless). In UDT terms, D is indexical uncertainty.
- U is a computable function from infinite sequences of natural numbers to [0, 1] representing G's utility function.
- L represents logical uncertainty. It can be defined by the model explained by cousin_it here, together with my previous construction for computing logical expectation values of random variables in [0, 1]. That is, we define EL(dk) to be the probability that a random string of bits p encodes a proof of the sentence "Q(X(T)) = q implies that the k-th digit of U(Y(D)) is 1" in some prefix-free encoding of proofs conditional on p encoding the proof of either that sentence or the sentence "Q(X(T)) = q implies that the k-th digit of U(Y(D)) is 0". We then define
EL[U(Y(D)) | Q(X(T)) = q] := Σk 2-k EL(dk). Here, the sentences and the proofs belong to some fixed formal logic F, e.g. Peano arthimetics or ZFC.
Discussion
- G's mental architecture N is defined in the "ideal" universe X where it is inviolable. However, G's utility function U inhabits the physical universe Y. This means that a highly intelligent q is designed so that imperfect realizations of G inside Y generate as many utilons as possible. A typical T is a low Kolmogorov complexity universe which contains a perfect realization of G. Q(X(T)) is L-correlated to the programming of imperfect realizations of G inside Y because T serves as an effective (approximate) model of the formation of these realizations. For abstract N, this means q is highly intelligent when a Solomonoff-random "M-programming process" producing q entails a high expected value of U.
- Solving the Loebian obstacle requires a more sophisticated model of logical uncertainty. I think I can formulate such a model. I will explain it in another post after more contemplation.
- It is desirable that the encoding of proofs p satisfies a universality property so that the length of the encoding can only change by an additive constant, analogically to the weak dependence of Kolmogorov complexity on C. It is in fact not difficult to formulate this property and show the existence of appropriate encodings. I will discuss this point in more detail in another post.
Generic Intelligence
It seems conceptually desirable to have a notion of intelligence independent of the specifics of the utility function. Such an intelligence metric is possible to construct in a way analogical to what I've done in UIM 1.0, however it is no longer a special case of the utility-specific metric.
Assume N to consist of a machine M connected to a special storage device E. Assume further that at X-time 0, E contains a valid C-program u realizing a utility function U, but that this is the only constraint on the initial content of E imposed by N. Define
I(q) := ET[ED[EL[u(Y(D); X(T)) | Q(X(T)) = q]] | N]
Here, u(Y(D); X(T)) means that we decode u from X(T) and evaluate it on Y(D). Thus utility depends both on the physical universe Y and the ideal universe X. This means G is not precisely a UDT agent but rather a "proto-agent": only when a realization of G reads u from E it knows which other realizations of G in the multiverse (the Solomonoff ensemble from which Y is selected) should be considered as the "same" agent UDT-wise.
Incidentally, this can be used as a formalism for reasoning about agents that don't know their utility functions. I believe this has important applications in metaethics I will discuss in another post.
Utility Functions in the Multiverse
UIM 2.0 is a formalism that solves the diseases of UIM 1.0 at the price of losing N in the capacity of the ontology for utility functions. We need the utility function to be defined on the entire multiverse i.e. on any sequence of natural numbers. I will outline a way to extend "ontology-specific" utility functions to the multiverse through a simple example.
Suppose G is an agent that cares about universes realizing the Game of Life, its utility function U corresponding to e.g. some sort of glider maximization with exponential temporal discount. Fix a specific way DC to decode any Y into a history of a 2D cellular automaton with two cell states ("dead" and "alive"). Our multiversal utility function U* assigns Ys for which DC(Y) is a legal Game of Life the value U(DC(Y)). All other Ys are treated by dividing the cells into cells O obeying the rules of Life and cells V violating the rules of Life. We can then evaluate U on O only (assuming it has some sort of locality) and assign V utility by some other rule, e.g.:
- zero utility
- constant utility per V cell with temporal discount
- constant utility per unit of surface area of the boundary between O and V with temporal discount
Discussion
- The construction of U* depends on the choice of DC. However, U* only depends on DC weakly since given a hypothesis D which produces a Game of Life wrt some other low complexity encoding, there is a corresponding hypothesis D' producing a Game of Life wrt DC. D' is obtained from D by appending a corresponding "transcoder" and thus it is only less Solomonoff-likely than D by an O(1) factor.
- Since the accumulation between O and V is additive rather than e.g. multiplicative, a U*-agent doesn't behave as if it a priori expects the universe the follow the rules of Life but may have strong preferences about the universe actually doing it.
- This construction is reminiscent of Egan's dust theory in the sense that all possible encodings contribute. However, here they are weighted by the Solomonoff measure.
TLDR
The intelligence of a physicalist agent is defined to be the UDT-value of the "decision" to create the agent by the process creating the agent. The process is selected randomly from a Solomonoff measure conditional on obeying the laws of the hardware on which the agent is implemented. The "decision" is made in an "ideal" universe in which the agent is Cartesian, but the utility function is evaluated on the real universe (raw Solomonoff measure). The interaction between the two "universes" is purely via logical conditional probabilities (acausal).
If we want to discuss intelligence without specifying a utility function up front, we allow the "ideal" agent to read a program describing the utility function from a special storage immediately after "booting up".
Utility functions in the Tegmark level IV multiverse are defined by specifying a "reference universe", specifying an encoding of the reference universe and extending a utility function defined on the reference universe to encodings which violate the reference laws by summing the utility of the portion of the universe which obeys the reference laws with some function of the space-time shape of the violation.
Intelligence Metrics with Naturalized Induction using UDT
Followup to: Intelligence Metrics and Decision Theory
Related to: Bridge Collapse: Reductionism as Engineering Problem
A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.
Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:
- The formalism is inherently Cartesian. Solving this problem is known as naturalized induction and it is discussed in detail here.
- The utility function Legg & Hutter use is a formalization of reinforcement learning, while we would like to consider agents with arbitrary preferences. Moreover, a real AGI designed with reinforcement learning would tend to wrestle control of the reinforcement signal from the operators (there must be a classic reference on this but I can't find it. Help?). It is straightword to tweak to formalism to allow for any utility function which depends on the agent's sensations and actions, however we would like to be able to use any ontology for defining it.
- Define a formalism for logical uncertainty. When I started writing this I thought this formalism might be novel but now I see it is essentially the same as that of Benja.
- Use this formalism to define a non-constructive formalization of UDT. By "non-constructive" I mean something that assigns values to actions rather than a specific algorithm like here.
- Apply the formalization of UDT to my quasi-Solomonoff framework to yield an intelligence metric.
- Slightly modify my original definition of the quasi-Solomonoff measure so that the confidence of the innate model becomes a continuous rather than discrete parameter. This leads to an interesting conjecture.
- Propose a "preference agnostic" variant as an alternative to Legg & Hutter's reinforcement learning.
- Discuss certain anthropic and decision-theoretic aspects.
Logical Uncertainty
The formalism introduced here was originally proposed by Benja.
Fix a formal system F. We want to be able to assign probabilities to statements s in F, taking into account limited computing resources. Fix D a natural number related to the amount of computing resources that I call "depth of analysis".
Define P0(s) := 1/2 for all s to be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now define
PD(s) := P0(s | there are no contradictions of length <= D).
Consider X to be a number in [0, 1] given by a definition in F. Then dk(X) := "The k-th digit of the binary expansion of X is 1" is a statement in F. We define ED(X) := Σk 2-k PD(dk(X)).
Remarks
- Clearly if s is provable in F then for D >> 0, PD(s) = 1. Similarly if "not s" is provable in F then for D >> 0,
PD(s) = 0. - If each digit of X is decidable in F then limD -> inf ED(X) exists and equals the value of X according to F.
- For s of length > D, PD(s) = 1/2 since no contradiction of length <= D can involve s.
- It is an interesting question whether limD -> inf PD(s) exists for any s. It seems false that this limit always exists and equals 0 or 1, i.e. this formalism is not a loophole in Goedel incompleteness. To see this consider statements that require a high (arithmetical hierarchy) order halting oracle to decide.
- In computational terms, D corresponds to non-deterministic spatial complexity. It is spatial since we assign truth values simultaneously to all statements so in any given contradiction it is enough to retain the "thickest" step. It is non-deterministic since it's enough for a contradiction to exists, we don't have an actual computation which produces it. I suspect this can be made more formal using the Curry-Howard isomorphism, unfortunately I don't understand the latter yet.
Non-Constructive UDT
Consider A a decision algorithm for optimizing utility U, producing an output ("decision") which is an element of C. Here U is just a constant defined in F. We define the U-value of c in C for A at depth of analysis D to be
VD(c, A; U) := ED(U | "A produces c" is true). It is only well defined as long as "A doesn't produce c" cannot be proved at depth of analysis D i.e. PD("A produces c") > 0. We define the absolute U-value of c for A to be
V(c, A; U) := ED(c, A)(U | "A produces c" is true) where D(c, A) := max {D | PD("A produces c") > 0}. Of course D(c, A) can be infinite in which case Einf(...) is understood to mean limD -> inf ED(...).
For example V(c, A; U) yields the natural values for A an ambient control algorithm applied to e.g. a simple model of Newcomb's problem. To see this note that given A's output the value of U can be determined at low depths of analysis whereas the output of A requires a very high depth of analysis to determine.
Naturalized Induction
Our starting point is the "innate model" N: a certain a priori model of the universe including the agent G. This model encodes the universe as a sequence of natural numbers Y = (yk) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example, N can describe the universe as a universal Turing machine M (representing G) with special "sensory" registers e. N constraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior of e. Alternatively, N can contain in addition to M a non-trivial model of the environment. Or N can be a cellular automaton with the agent corresponding to a certain collection of cells.
However, G's confidence in N is limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real number t which represents the timescale over which N is expected to hold. We then assign to each hypothesis H about Y (you can think about them as programs which compute yk given yj for j < k; more on that later) the weight QS(H) := 2-L(H) (1 - e-t(H)/t). Here L(H) is the length of H's encoding in bits and t(H) is the time during which H remains compatible with N. This is defined for N of deterministic / constraint type but can be generalized to stochastic N.
The weights QS(H) define a probability measure on the space of hypotheses which induces a probability measure on the space of histories Y. Thus we get an alternative to Solomonoff induction which allows for G to be a mechanistic part of the universe, at the price of introducing N and t.
Remarks
- Note that time is discrete in this formalism but t is continuous.
- Since we're later going to use logical uncertainties wrt the formal system F, it is tempting to construct the hypothesis space out of predicates in F rather than programs.
Intelligence Metric
To assign intelligence to agents we need to add two ingredients:
- The decoding Q: {Y} -> {bit-string} of the agent G from the universe Y. For example Q can read off the program loaded into M at time k=0.
- A utility function U: {Y} -> [0, 1] representing G's preferences. U has to be given by a definition in F. Note that N provides the ontology wrt which U is defined.
Instead, we define I(Q0) := EQS(Emax(U(Y(H)) | "Q(Y(H)) = Q0" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above.
Remarks
- IMO the correct way to look at this is intelligence metric = value of decision for the decision problem "what should I program into my robot?". If N is a highly detailed model including "me" (the programmer of the AI), this literally becomes the case. However for theoretical analysis it is likely to be more convenient to work with simple N (also conceptually it leaves room for a "purist" notion of agent's intelligence, decoupled from the fine details of its creator).
- As opposed to usual UDT, the algorithm (H) making the decision (Q) is not known with certainty. I think this represents a real uncertainty that has to be taken into account in decision problems in general: the decision-maker doesn't know her own algorithm. Since this "introspective uncertainty" is highly correlated with "indexical" uncertainty (uncertainty about the universe), it prevents us from absorbing the later into the utility function as proposed by Coscott.
- For high values of t, G can improve its understanding of the universe by bootstrapping the knowledge it already has. This is not possible for low values of t. In other words, if I cannot trust my mind at all, I cannot deduce anything. This leads me to an interesting conjecture: There is a a critical value t* of t from which this bootstrapping becomes possible (the positive feedback look of knowledge becomes critical). I(Q) is non-smooth at t* (phase transition).
- If we wish to understand intelligence, it might be beneficial to decouple it from the choice of preferences. To achieve this we can introduce the preference formula as an unknown parameter in N. For example, if G is realized by a machine M, we can connect M to a data storage E whose content is left undetermined by N. We can then define U to be defined by the formula encoded in E at time k=0. This leads to I(Q) being a sort of "general-purpose" intelligence while avoiding the problems associated with reinforcement learning.
- As opposed to Legg-Hutter intelligence, there appears to be no simple explicit description for Q* maximizing I(Q) (e.g. among all programs of given length). This is not surprising, since computational cost considerations come into play. In this framework it appears to be inherently impossible to decouple the computational cost considerations: G's computations have to be realized mechanistically and therefore cannot be free of time cost and side-effects.
- Ceteris paribus, Q* deals efficiently with problems like counterfactual mugging. The "ceteris paribus" conditional is necessary here since because of cost and side-effects of computations it is difficult to make absolute claims. However, it doesn't deal efficiently with counterfactual mugging in which G doesn't exist in the "other universe". This is because the ontology used for defining U (which is given by N) assumes G does exist. At least this is the case for simple ontologies like described above: possibly we can construct N in which G might or might not exist. Also, if G uses a quantum ontology (i.e. N describes the universe in terms of a wavefunction and U computes the quantum expectation value of an operator) then it does take into account other Everett universes in which G doesn't exist.
- For many choices of N (for example if the G is realized by a machine M), QS-induction assigns well-defined probabilities to subjective expectations, contrary to what is expected from UDT. However:
- This is not the case for all N. In particular, if N admits destruction of M then M's sensations after the point of destruction are not well-defined. Indeed, we better allow for destruction of M if we want G's preferences to behave properly in such an event. That is, if we don't allow it we get a "weak anvil problem" in the sense that G experiences an ontological crisis when discovering its own mortality and the outcome of this crisis is not obvious. Note though that it is not the same as the original ("strong") anvil problem, for example G might come to the conclusion the dynamics of "M's ghost" will be some sort of random.
- These probabilities probably depend significantly on N and don't amount to an elegant universal law for solving the anthropic trilemma.
- Indeed this framework is not completely "updateless", it is "partially updated" by the introduction of N and t. This suggests we might want the updates to be minimal in some sense, in particular t should be t*.
- The framework suggests there is no conceptual problem with cosmologies in which Boltzmann brains are abundant. Q* wouldn't think it is a Boltzmann brain since the long address of Boltzmann brains within the universe makes the respective hypotheses complex thus suppressing them, even disregarding the suppression associated with N. I doubt this argument is original but I feel the framework validates it to some extent.
Solomonoff induction without perfect memory
In this post I construct a variant of Solomonoff induction which allows for agents with imperfect memory.
Solomonoff induction is a mathematical formalization of Occam's razor, and is supposed to be a "master equation" of epistemic rationality. In order words, forming rational expectations about the future is supposed to be tantamount to computing Solomonoff induction (approximately, since it's incomputable).
Solomonoff induction operates by assigning probabilities to continuations
of a given finite sequence
. In the simplest formalism, the sequence elements are bits.
represents past observations and
represents a possible future.
A rational agent is supposed to use
to evaluate the probability of
. There are two problems with this. One is that
is uncomputable whereas
has a limited amount of computing resources in its disposal. For now, we are going to ignore this. Another problem is that
doesn't have direct access to
.
can only estimate
from
's memory. Moreover, this estimation depends on knowledge
has about reality which is supposed to be generated by Solomonoff induction itself. In other words, inferring the future from the past only makes sense in the approximation of perfect memory, in general we need to be able to infer both the past and the future from the present.
The bit-sequence formalism is not well-adapted to this, since the present state of has to contain all of
's memory about the past so it has to be more than a single bit. I suggest solving this by considering sequences
of natural numbers instead of bit-sequences
. Generalizing regular Solomonoff induction to natural numbers is straight-forward: the random program
computes convergent sequences of lower bounds for the transition probabilities
Now we want a new induction procedure whose input is a single natural number representing the state of
's consciousness which allows assigning probabilities to sequences of natural numbers containing
. We achieve this by assigning the following probabilities to Solomonoff hypotheses
:
Here, is a normalization factor,
stands for expectation value and
is defined by
where is the infinite sequence of natural numbers "generated" by
.
The factor penalizes hypotheses in which
appears late in
. Its form was chosen to achieve equal suppression of two spurious hypotheses we want to avoid:
- The "timeless hypothesis"
which computes "true"
silently (i.e. w/o outputting it), extracts
by evaluating
and generates the constant sequence
. It gains a factor of about
on the "correct" hypothesis by reducing age to 0 but loses a factor of about
due to the increased complexity of specifying
explicitly. Its overall suppression is thus about
- The "Boltzmann brain" hypothesis
which generates the sequence
. It gains due to its reduced complexity but is suppressed by
due to increased age. Since an agent is supposed to record memories, we expect
to be exponential in
. We thus get the same order-of-magnitude suppression as before
I think it should be possible to define a class of cellular automata and initial conditions
s.t. the state of the automaton evolves indefinitely rather than e.g. becoming periodic, for which my induction correctly infers the rules of
from the state of the automaton at a sufficiently late time. Note that for many cellular automata, the state is exponential in the time since changes propagate at a fixed velocity.
Another application is to the anthropic trilemma. This formalism suggests continuity of experience is fundamentally meaningful and that subjective probabilities behave just like objective probabilities. We thus eliminate the second and third horns. However, explicitly applying this formalism to solve solve the trilemma remains challenging. In a future post I am going to argue the first horn is actually correct but without giving a qualitative proof using this formalism.
A somewhat speculative application is applying the formalism repeatedly w/o recording the past. Given consciousness state we are able to compute the probability distribution of next consciousness state
. If we wish to consistently continue one step further, we should use both
and
. However,
can only use
to estimate probabilities. This way we are led to a Markovian stochastic process. The physical meaning of this construction is not clear but in some ways this Markovian process is more interesting than the non-Markovian process you get with regular Solomonoff induction or with recording the past in the current formalism. In regular Solomonoff induction, as time progresses
gains information but the physics is fixed. So to speak, the map improves but the territory stays the same. The Markovian version allows for actual physics to change but
cannot notice it! The process follows a pattern but the pattern keeps randomly evolving. Maybe this construction is more than just a mathematical artifact?
Metatickle Intelligence Metrics and Friendly Utility Functions
Related to: Intelligence Metrics and Decision Theories
Previously I presented a formalism for dealing with the Duality and Ontology problems associated with attempts to define a formal metric of general intelligence. It also solves the environment distribution problem. This formalism ran into problems closely related to the problems of decision theory. I tried to solve these problems using a formalization of UDT suitable for this context.
Here I'm going to pursue a different approach, which I believe to be analogous to the "metatickle" version of EDT. I will argue that, as opposed to decision theory, metatickling is a good approach to intelligence metrics. I will also present an analogous formalism for multi-agent systems. Finally, I will suggest an approach for constructing friendly utility functions using this formalism.
Review of Quasi-Solomonoff Distributions
In this section I will remind the idea behind quasi-Solomonoff distributions, glossing over mathematical details. For more details consult the previous article.
Most attempts at constructing a formal general intelligence metric are based on Legg and Hutter and involve considering an agent A interacting with an environment V through actions that A applies to V and observations A makes on V (the latter being information flowing from V to A). The problem with this is that such an agent is indestructible since no process in V can force a change in the inner workings of A. Thus an AI programmed in accord with this formalism will consider it an a priori truth that its mind cannot be tampered with in any way, an obviously false assumption.
In order to deal with this we can make A a part of V, as suggested by Orseau and Ring. This creates another problem, namely it's unclear what prior for V should we use. Legg and Hutter suggest using the Solomonoff distribution which makes sense since a perfectly rational agent is supposed to use the Solomonoff distribution as a prior. However, if A is a part of V, the Solomonoff distribution is clearly too general. For example if our A is implemented on a computing machine M, the rules according to which M works have to be part of our assumptions about V, since without making such assumptions it is impossible to program M in any sensible way.
Enters the quasi-Solomonoff distribution. Suppose you are a programmer building an AI A. Then it makes sense for you to impart to A the knowledge you have about the universe, including at least the rules according to which A's hardware M works. Denoting this knowledge (the tenative model) D, the distribution to use is the Solomonoff distribution updated by a period t of observing D-behavior, where the time parameter t represents your own certainty about D.
It is now tempting to introduce the intelligence metric
where {υi} is a sequence of natural numbers representing the universe Y, U is the utility function, Q is the data ("program") representing A and the expectation value is taken w.r.t. the quasi-Solomonoff distribution.
IEDT suffers from problems analogous to its associated decision theory EDT. Namely, suppose a certain Q is very likely to exist in a universe containing pink unicorns (maybe because pink unicorns have a fetish for this Q). Suppose further that pink unicorns yield very high utility, even though Q itself yields little utility directly. Then IEDT(Q) might be high even though Q has few of the attributes we associate with intelligence.
Alternatively we can introduce the intelligence metric
This time the expectation value is unconditional however we postulated a "divine intervention" which brings Q into existence at time t regardless of the physics selected from the quasi-Solomonoff distribution for Y. This unphysical assumption is an artifact analogous to the use of counterfactuals in CDT.
Metatickling
To make my way out of this conundrum I observe that the increase of probability of pink unicorns in the EDT example is "unjustified", since if you are a programmer building an AI then you know the AI came out the way it is because of you, not because of pink unicorns. One way to interpret this is that D isn't a sufficiently detailed model. However including a model of the AI programmer into D seems impractical. Therefore I suggest instead including a generic intelligence optimization process O.
Denote the probability assigned by the quasi-Solomonoff distribution to program R as the physics of Y. Then, the metatickle quasi-Solomonoff distribution assigns to R the probability
Here Z is a normalization factor, β is a constant representing O's power of optimization, ER is expectation value in an R-universe and I is the yet unspecified intelligence metric (yep, we're going self-referential).
The metatickle intelligence metric is then defined as the solution to the equation
where the expectation value is taken w.r.t. the metatickle quasi-Solomonoff distribution defined by D and IβMTDT.
It is suggestive to apply some fixed-point theorem to get results about existence and/or uniqueness of IβMTDT, something I haven't done yet.
Metatickle EDT suffers from a problem common with CDT, namely it two-boxes on Newcomb's problem. The same applies here. Specifically, suppose Ω is a superintelligence which is able to predict Q by modeling O and it places utility in the first box iff A(Q) one-boxes. Then two-boxers are assigned high intelligence for the self-referential reason that Ω predicts this and leaves the first box empty. However, I claim that in this context this behavior is sensible. From A(Q)'s point-of-view, it would gain nothing by one-boxing since one-boxing would only mean Q came out with a statistical fluke and Ω still left the first box empty (since the fluke is unpredictable). From O's point-of-view, it is doing everything right since its purpose is maximizing intelligence, not utility. O behaves just like the philosophy students Yudkowsky likes to describe, which prefer winning arguments over winning money. Indeed, we can consider a situation in which Ω would generate utility on the condition that Q is made unintelligent. In this case there is nothing paradoxical about the resulting negative relation between intelligence and utility. This argument seems similar to the standard defense of CDT: "Ω simply discriminates in favor of irrational behavior". However, the standard counterargument "it isn't discrimination as long as only the final decisions are involved rather than the intrinsic process leading to the decisions" doesn't apply here, since from O's point-of-view Q is a final decision.
Note that MTDT-intelligent agents cope fantastically with the usual Newcomb's problem. That is, if Ω prepares the boxes after formation of A (moment t) then an MTDT-intelligent A will one-box (ceteris paribus i.e. ignoring possible non-decision-theoretic effects of programming A in this way).
Note also that it doesn't mean we can use ICDT just as well. CDT-intelligent agents suffer from a severe mental disability, namely they are unable to deduce anything about the universe from the properties of their own self. They are blinded by faith in a divine intervention which created them. This problem doesn't apply to IMTDT.
It seems useful to let β go to infinity, which corresponds to "perfect" O. I suspect I∞MTDT converges in many cases. In this limit the "pink unicorn" effect is likely to entirely disappear, since for highly intelligent Q any hypothesis of Q's origin that doesn't involve something that looks like an actual intelligence optimization process would be suppressed by its complexity.
Multi-Agent Systems
It is also interesting to consider a system of N agents Ak(Qk), each with its own utility function Uk and program Qk encoded in the universe as qk(υt). In this case the metatickle quasi-Solomonoff distribution is defined by
and we have the system of equations
The functions Ik define a game and it's interesting to study e.g. its Nash equilibria.
In this case the limit is more complicated since depending on the relative speed of growth of the different β's, there might be different results.
Friendly Utility Functions
The problem of constructing a friendly utility function can be regarded as inverse to the problem of constructing strong AI. Namely the latter is creating an optimal agent given a utility function whereas the former is finding the utility function of the agent which is already known (homo sapiens).
Fix a specific agent Alice, w.r.t. which the utility functions should be friendly. Our prior for Alice's unknown utility function U will be that it's computed by some uniformly distributed program T (like in the definition of the Solomonoff distribution). The information we use to update this prior is that Alice was generated by an intelligence optimization process of power β, i.e. she was selected from a metatickle quasi-Solomonoff distribution corresponding to U. We then take the expectation value U*(Alice) of the resulting distribution on utility functions1.
It is not so clear how to determine β. Evidently is not a good approach here since we don't consider humans to be optimal for their own terminal values. A possible solution is to postulate a Solomonoff-type prior for β as well.
1It might fail to converge. To remedy this, we can restrict ourselves to utility functions taking values in the interval [0, 1].
Intelligence Metrics and Decision Theories
In this article I review a few previously proposed mathematical metrics of general intelligence and propose my own approach. This approach has a peculiar relation to decision theory, different decision theories corresponding to different notions of "maximally intelligent agent". I propose a formalization of UDT in this context.
Background
The first (to my knowledge) general intelligence metric was proposed by Legg and Hutter in "Universal Intelligence". Their proposal was to consider an agent A interacting with environment V where A exerts a sequence of actions ai on V and in turn receives from V the observations oi. Their original proposal was including in the observations a special utility channel ui, so that the utility of A is defined by
This is essentially reinforcement learning. However, general utility functions of the form can be just as easily accommodated by the method.
The Legg-Hutter intelligence metric is given by
Where the expectation value is taken with respect to a Solomonoff distribution for V.
The maximally intelligent agent w.r.t. this metric is the famous AIXI.
This definition suffers from a number of problems:
- Efficiency problem: The are no constraints on the function
implemented by A. The computing resources allowed for evaluating it are unlimited and it can even be uncomputable (as it is in the case of AIXI).
- Duality (Anvil) problem: A is not a part of the physical universe (V) it attempts to model. A is "unbreakble" i.e. nothing that happens in V can modify it as far as the evaluation of U is concerned.
- Ontology problem1: U is defined in terms of actions and observations rather than in terms of intrinsic properties of the universe. A Legg-Hutter intelligent A with a naively constructed U would be inclined to wirehead itself.
- Degeneracy problem (my own observation): This problem is superficially absent in the original (reinforcement learning) formulation, but for a general U the question arises whether this truly corresponds to the intuitive notion of intelligence. It is clear that for some "degenerate" choices of U very simple agents would be optimal which wouldn't do any of the things we commonly ascribe to intelligence (e.g. modeling the external universe). In other words, it might make sense to ask whether a given agent is intelligent without specifying U in advance.
- Verifiability problem (my own observation): A descriptive model of intelligence should work for real-world intelligent agents (humans). In the real world, intelligence developed by natural selection and therefore must be a verifiable property: that it, it must be possible to measure the intelligence of an agent using a limit amount of computing resource (since Nature only possesses a limited amount of computing resources, as far as we know). Now, the Solomonoff distribution involves randomly generating a program R for computing the transition probabilities of V. Since there are no limitations on the computing resources consumed by R, simulating V can be very costly. On average it will insanely costly, thx to the busy beaver function. This inhibits any feasible way of computing I(A).
Side note: due toit might be that the "correct" definition of I(A) is efficiently computable but it's not feasible to compute A with human-like level of I(A) (that is it's not possible to compute it using a short program and scarce computing resources). In this case the success of natural evolution of intelligence would have to be blamed on the Anthropic Principle. It would also mean we cannot construct an AI without using the human brain as a "template" in some sense. Moreover there would be a physical bound on constructing intelligent agents which might rule out the "AI foom" scenario.
Orseau and Ring2 proposed solutions for the efficiency and duality problem.
Solving the efficiency problem is straightforward. Consider a computing machine M (e.g. a universal Turing machine) which produces an action a every fixed cycle (for example a is read off a register in M) and able to receive an observation o every fixed cycle (for example the o is written to a register in M). We can then evaluate I(A(Q)), where A(Q) is the abstract (Legg-Hutter) agent computed by M primed with the program Q. We can then ask for Q*(n), the maximally intelligent program of length at most n. In general it seems to be uncomputable.
The duality problem is more tricky. Orseau and Ring suggest considering a natural number Q which primes the stochastic process
Given a utility function we can define the intelligence of Q to be
For example we can imagine Q to be the state of a computer controlling a robot, or the state of a set of cells within a cellular automaton. This removes the unphysical separation of A and V. However, for me this framework seems too general if we don't impose any constraints on ρ.
Intelligence in a Quasi-Solomonoff Universe
Consider a universe Y described by a sequence of states {υi}. In order for the Solomonoff distribution to be applicable, we need the state υi to be represented as a natural number. For example Y can consist of a computing machine M interacting with an environment V through actions a and observations o. Or, Y can be a cellular automaton with all but a finite number of cells set to a stationary "vacuum" state.
Consider further a tentative model D of the dynamics of Y. There are several types of model which might be worth considering:
- A constraint model only distinguishes between admissible and inadmissible state transitions, without assigning any probabilities when multiple state transitions are possible. This allows considering the minimalistic setting in which Y is a computing machine M with input register o and D-admissible transitions respect the dynamics of M while allowing the value of o to change in arbitrary way.
- A complete model prescribes the conditional probabilities
for all i. This allows elaborate models with a rich set of degrees of freedom. In particular it will serve to solve the ontology problem since U will be allowed to depend on any degrees of freedom we include the model.
- A dynamics model prescribes conditional probabilities of the form
. Here
is fixed and ρD is a function of
variables. In particular D doesn't define
for
. That is, D describes time-translation invariant dynamics with unspecified initial conditions. This also allows for elaborate models.
The quasi-Solomonoff probability distribution for {υi} associated with model D and learning time t is defined by
- For a constraint model, the conditional probabilities of a Solomonoff distribution given that all transitions are D-admissible for
.
- For a complete model we consider a stochastic process the transition probabilities of which are governed by D for
and by the Solomonoff distribution for
.
- For a dynamics model, the distribution is constructed as follows. Consider a joint distribution for two universes
and
where
is distributed according to Solomonoff and
is distributed according to D with
serving as initial conditions. Now take the conditional distribution given that
for
.
Here t is a parameter representing the amount of evidence in favor of D.
The agent A is now defined by a property q(υt) of υt, for example some part of the state of M. Given a utility function we can consider
where the expectation value is taken with respect to the quasi-Solomonoff distribution. This can be interpreted as an intelligence metric.
We now take the meta-level point of view where Q is regarded as a decision allowing to make the link to decision theory. For example we can think of an AI programmer making the decision of what code to enter into her robot, or natural evolution making a "decision" regarding the DNA of h. sapiens. According to this point of view, IEDT(Q) is the quality of the decision Q according to Evidential Decision Theory. It is possible to define the Causal Decision Theory analogue ICDT(Q) by imaging a "divine intervention" which changes the q(υt) into Q regardless of previous history. We have
where is expectation value with respect to an unconditional quasi-Solomonoff distribution modified by "divine intervention" as above.
These intelligence metrics are prone to the standard criticism of the respective decision theories. IEDT(Q) factors in the indirect effect Q has on U by giving more weight to universes in which Q is likely. ICDT(Q) favors agents that fail on certain Newcomb-like problems. Specifically, suppose A discovers that the universe contains another agent Ω which created two boxes B1 and B2 and placed utility in each before t. Then a CDT-intelligent A would prefer taking B1 + B2 over taking B1 only even if it knows Ω predicted A's decision and adjusted the utility in the boxes in such a way that taking B1 only would be preferable.
I suggest solving the problem by applying a certain formalization of UDT.
The Solomonoff distribution works by randomly producing a program R which computes3 the transition probabilities of the universe. When we apply the Solomonoff distribution to Y, we can imagine the event space as consisting of pairs (R, {ηi,υ}). Here are uniformly distributed numbers used to determine the occurrence of random events. That is, (R, {ηi,υ}) determines {υi} by the recursive rule
Here PR stands for the conditional probability computed by R and
We define UT as the program receiving (R, {ηi,υ}) as input (it's infinite but it doesn't pose a problem) and producing a binary fraction as output, s.t.
- UT halts in time T at most on any input
is minimal among all such programs. Here the expectation value is taken w.r.t. the quasi-Solomonoff distirbution.
Further, we define U*T as the program receiving (R, {ηi,υ}, Q) as input and producing a binary fraction as output, s.t.
- U*T halts in time T at most on any input
is minimal among all such programs.
Thus UT and U*T are least-squares optimal predictors for U under the constraint of running time T where U*T is provided the additional information q(υt).
Consider
u(Q,T) is the gain in estimated utility obtained by adding the information q(υt)=Q, provided that the underlying physics (R, {ηi,υ}) of Y (which allows predicting q(υt)=Q) is already known. For T >> 0, u(Q,T) approaches 0 (since the additional piece of information that q(υt)=Q can be easily deduced within given computing resources). For T approaching 0, u(Q,T) approaches 0 as well, since there's not enough time to process the actual input anyway. Hence it is interesting to consider the quantity
It is tempting to interpret u* as an intelligence metric. However it doesn't seem sensible to compare u*(Q) for different values of Q. Denote T* the value of T for which the maximum above is achieved (). Instead of defining an intelligence metric per se, we can define Q to be a maximally UDT-intelligent agent if we have
This represents the (non-vacuous) self-referential principle that Q is the best agent given the information that Y is the sort of universe that gives rise to Q. The role of T* is to single out the extent of logical uncertainty for which Q cannot be predicted but given Q the consequences for U can be predicted.
Directions for Further Research
- It is important to obtain existence results for maximally UDT-intelligent agents for the concept to make sense.
- We need a way to consider non-maximally UDT-intelligent agents. For example we can consider the parameter
as an intelligence metric but I'm not sure this is the correct choice.
- My proposal doesn't solve the degeneracy and verifiability problems. The latter problem can be tackled by replacing the Solomonoff distribution by something allowing efficient induction. For example we can use the distribution with the minimal Fisher distance to the Solomonoff distribution under some computing resources constraints. However I'm not convinced this would be conceptually correct.
- It is possible to consider multi-agent systems. In this settings, the problem transforms from a sort-of optimization problem to a game-theory problem. It should then be possible to study Nash equilibria etc.
- It is tempting to look for a way to close the loop between the basic point of view "Q is a program" and the meta point of view "Q is a decision". This might provide an interesting attack angle on self-improving AI. Of course the AI in this framework is already self-improving if M capable of reprogramming itself.
Footnotes
1The name "ontology problem" is courtesy pengvado
2I was exposed to this work by reading Anja
3It computes them in the weak sense of producing a convergent sequence of lower bounds
View more: Prev
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)