Followup to: Agents with Cartesian childhood and Physicalist adulthood

In previous posts I have defined a formalism for quantifying the general intelligence of an abstract agent (program). This formalism relies on counting proofs in a given formal system F (like in regular UDT), which makes it susceptible to the Loebian obstacle. That is, if we imagine the agent itself making decisions by looking for proofs in the same formal system F then it would be impossible to present a general proof of its trustworthiness, since no formal system can assert is own soundness. Thus the agent might fail to qualify for high intelligence ranking according to the formalism. We can assume the agent uses a weaker formal system the soundness of which is provable in F but then we still run into difficulties if we want the agent to be self-modifying (as we expect it to be). Such an agent would have to trust its descendants which means that subsequent agents use weaker and weaker formal systems until self-modification becomes impossible.

One known solution to this is Benja's parametric polymorphism. In this post I adapt parametric polymorphism to the updateless intelligence metric framework. The formal form of this union looks harmonious but it raises questions which I currently don't fully understand.

"Ineffable mystery" using oracles instead of timeouts

In the original parametric polymorphism, a constant κ is introduced (informally known as "the number of ineffable mystery") s.t. the agent has to prove its actions are "safe" for time period κ (i.e. have no "bad" consequences during this period). Since it's impossible to the agent to prove any upper bound on κ, effectively its actions have to be safe indefinitely. I found that for our purposes it's better to cast the formalism otherwise. Instead of κ playing the role of a "timeout", the agent is provided with an oracle that answers questions of the form "κ > n?" for any n. Contrary to what the agent thinks, the oracle provides an affirmative answer for all n.

Let's do some math. The formalism I introduce at this point is very close to Yudkowsky and Herreshoff, with some adjustments.

Consider an infinite chain of agents A0, A1, A2... where A{i+1} is created by Ai. Formally, each Ai is a program for a universal oracle machine which computes a_i = (a_i, A{i+1}). Here a_i is a "motor action". I'm ignoring sensory input for the sake of simplicity (of course sensory input will reappear when we return to intelligence metrics).

Denote T_κ the usual theory of parametric polymorphism. That is T_κ results from starting with a theory T (e.g. Peano Arithmetic), extending the language by the symbol κ (which behaves like a natural number) and adding the (self-referential) axiom schema 

(1) For all formulae φ: κ > 0 -> For all n: (@_{T_κ} "φ(n)" -> φ(n)[κ\κ-1])

Here '@_S' is a formula with the semantics of "provable in theory S" and φ(n)[κ\κ-1] is φ(n) with all occurrences of κ replaced by κ-1. Thus (1) is a self-referential soundness schema saved from paradoxicality by the "counter" κ. See Yudkowsky and Herreshoff for the proof that T_κ is sound (it has 1 model for each possible value of κ).

Assume the licensing condition for A0 to perform action b_0 is

(2) Exists n: T_κ | κ > n -> (A0() = b_0 -> G)

Here S | φ means that φ is provable in S, G is a formula denoting the "goal" of the agent (as in Yudkowsky and Herreshoff, the agent is for now assumed to be a satisficer), A0() is a formula meaning "the result of the computation A0", where the oracle is interpreted as answering the question "κ > x?" (in particular the formula A0() involves κ). Such a licensing condition can be implemented in practice by making A0 a proof searcher which is allowed to use the oracle to introduce assumptions of the form κ > n. In particular, our agent employs ambient control.

We then have the following proposition, showing that A0 is able to tile indefinitely (create an infinite chain of meaningful descendants). The proposition is formulated for A1 but the rest of the chain follows by induction since the form of the licensing condition is preserved (up to a certain twist) by the tiling.

Proposition

Assume b_0 = (b_0, B1) and B1 is s.t.

(3) T_κ | For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]")))

Here 0 is some default "harmless" action, performed if no appropriate theorem is proved.

Further assume that if B1 is created and does 0, the goal is achieved (in this framework "achieving the goal" is to be thought of as "not doing anything unsafe").

(4) T_κ | (A0() = b_0 and B1() = 0) -> G

Then A0 is allowed to create B1 i.e. licensing condition (2) holds.

B1's licensing condition contains a "twist", namely it believes that when the oracle receives input n it answers the question "κ+1 > n?" rather than "κ > n" and its goal is shifted in the same fashion. Note that the goal typically depends on κ since the agent itself and thus its κ-oracle is part of the universe in which this goal is to be attained. For example the goal can be a function of the numbers (a0, a1, a2...) defined as the output of corresponding computations with κ-oracles. Thus the shift in G can also be interpreted as a shifted interpretation of the oracle by B1.

Proof

(3) trivially implies that

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]"))))

Using the premise κ > 0, we can apply the soundness schema (1). We obtain:

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and ((κ > n and B1() = b_1) -> G)))))

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (B1() = b_1 -> G))))

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or (B1() = b_1 -> G))))

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or G)))

This trivially implies that

T_κ | κ > 0 -> (For all b_1: ((A0() = b_0 and B1() = b_1) -> (b_1 = 0 or G)))

Applying (4) we finally get (2) for n=0

Back to intelligence metrics

To apply parametric polymorphism to the updateless intelligence formalism, we need to do the following:

  • Assume the machine M on which the agent is running is an oracle machine.
  • Assume the Solomonoff measure of the ideal (Cartesian) universe X is defined using a universal oracle machine. The oracle in M has to correspond to the oracle in the hypothesis T describing X: this correspondence becomes part of the rules N.
  • Assume the universal program u defining the Solomonoff measure for the physical universe is a universal oracle program, i.e. the hypotheses D describing the physical universe are also allowed to invoke the oracle.
  • Assume the logical expectation value EL is computed using T_κ extended by N applied to the given T (this is provable in T_κ anyway but we want the proof to be short) and the axiom schema {κ > n} for every natural number n. The latter extension is consistent since adding any finite number of such axioms admits models. The proofs counted in Einterpret the oracle as answering the the question "κ > n?". That is, they are proofs of theorems of the form "if this oracle-program T computes q when the oracle is taken to be κ > n, then the k-th digit of the expected utility is 0/1 where the expected utility is defined by a Solomonoff sum over oracle programs with the oracle again taken to be κ > n".

Discussion

  • Such an agent, when considering hypotheses consistent with given observations, will always face a large number of different compatible hypothesis with similar complexity. These hypotheses result from arbitrary insertions of the oracle (which increase complexity of course, but not drastically). It is not entirely clear to me how such an epistemology will look like.
  • The formalism admits naturalistic trust to the extent the agent believes that the other agent's oracle is "genuine" and carries a sufficient "twist". This will often be ambiguous so trust will probably be limited to some finite probability. If the other agent is equivalent to the given one on the level of physical implementation then the trust probability is likely to be high.
  • The agent is able to quickly confirm κ > n for any n small enough to fit into memory. For the sake of efficiency we might want to enhance this ability by allowing the agent to confirm that (Exist n: φ(n)) -> Exist n: (φ(n) and κ > n) for any given formula φ.
  • For the sake of simplicity I neglected multi-phase AI development, but the corresponding construction seems to be straightforward.
  • Overall I retain the feeling that a good theory of logical uncertainty should allow the agent to assign a high probability the soundness of its own reasoning system (a la Christiano et al). Whether this will make parametric polymorphism redundant remains to be seen.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 1:29 PM

Possible typo:

These hypotheses result from arbitrary insertions of the oracle (which increase complexity of course, but not drastically). It is entirely clear to me how such an epistemology will look like.

Shouldn't that be it is NOT entirely clear to me...?

Yep, thx!