You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Tiling agents with transfinite parametric polymorphism

2 Squark 09 May 2014 05:32PM

The formalism presented in this post turned out to be erroneous (as opposed to the formalism in the previous post). The problem is that the step in the proof of the main proposition in which the soundness schema is applied cannot be generalized to the ordinal setting since we don't know whether ακ is a successor ordinal so we can't replace it by ακ'=ακ-1. I'm not deleting this post primarily to preserve the useful discussion in the comments.


Followup to: Parametric polymorphism in updateless intelligence metric

In the previous post, I formulated a variant of Benja's parametric polymorphism suitable for constructing updateless intelligence metrics. More generally, this variants admits agents which are utility maximizers (in the informal sense of trying their best to maximize a utility function, not in the formal sense of finding the absolutely optimal solution; for example they might be "meliorizers" to use the terminology of Yudkowsky and Herreshoff) rather than satisficers. The agents using this formalism labor under a certain "delusion", namely, since they believe that κ ("the number of ineffable mystery") is an actual finite number (whereas it is secretly infinite, in a sense), they think that the chain of tiling agents is effectively finite as well (since at some point agent #n in the chain will discover that κ > n is false and will fail to construct a subsequent agent of the same "epistemic strength"). The same problem exists in Weaver's intuitionistic assertability predicate formalism.

To overcome this limitation, I suggest to extend κ's semantics from natural numbers to elements of a certain recursive ordinal (at least morally; technically it is done a bit differently, see below). In Benja's original formulation this doesn't appear a valid option, since κ is interpreted as a time interval. However, in my variant κ is just an abstract parameter queries about which can be directed to a special "oracle", so there is no such limitation.

Formalism

Fix α a recursive ordinal and αi a system of notations for ordinals smaller than α such that α0=0, the function f defined by αf(i)i+1 is recursive and the bit-valued function g(i,j) which equals 1 iff αi < αj is also recursive. We also assume that

(0) T |- For all i: g(i,f(i))=1

where T is our starting theory. We define Tκ(α) to be T extended by the symbol κ (which stands for the index of the ordinal, not the ordinal itself) and the following self-referential soundness schema:

(1) For all formulae φ: For all n, i: ((@_Tκ(α) "φ(n)" and g(i,κ)=1) -> φ(n)[κ\i])

Tκ(α) is proven to be sound analogously to Yudkowsky and Herreshoff, with transfinite induction over ordinals below α replacing the ordinary induction.

The licensing condition for agent A0 to perform action b0 is

(2) Exists n: Tκ(α) |- (g(n,κ)=1 -> (A0()=b-> G))

The property which is sufficient for agent B1 to be creatable by agent A0 is

(3) Tκ(α) |- For all b1: (B1()=b1 -> (b1=0 or Exists n: (g(n,κ)=1 and @_Tκ(α) "(g(n,f(κ))=1 and B1()[κ\f(κ)]=b1) -> G[κ\f(κ)]")))

The proof that (3) indeed enables tiling is analogous to the previous post (but requires using (0)).

Discussion

In order for A0 to know that the tiling chain is infinite, it is enough for her to verify ακ >= ω. In order for her to know that A1 is going to know the tiling chain is infinite, she needs to verify ακ >= ω+1. In order for her to know that all agents are going to know the tiling chain is infinite, she needs to verify ακ >= 2ω. In order for her to know that all agents are going to know that, she needs to verify ακ >= 3ω et cetera.

It remains to decide which ordinal should we actually use. My intuition is that the correct ordinal is the least α with the property that α is the proof-theoretic ordinal of Tκ(α) extended by the axiom schema {g(i,κ)=1}. This seems right since the agent shouldn't get much from ακ > β for β above the proof theoretic ordinal. However, a more formal justification is probably in order.

Parametric polymorphism in updateless intelligence metrics

4 Squark 25 April 2014 07:46PM

Followup to: Agents with Cartesian childhood and Physicalist adulthood

In previous posts I have defined a formalism for quantifying the general intelligence of an abstract agent (program). This formalism relies on counting proofs in a given formal system F (like in regular UDT), which makes it susceptible to the Loebian obstacle. That is, if we imagine the agent itself making decisions by looking for proofs in the same formal system F then it would be impossible to present a general proof of its trustworthiness, since no formal system can assert is own soundness. Thus the agent might fail to qualify for high intelligence ranking according to the formalism. We can assume the agent uses a weaker formal system the soundness of which is provable in F but then we still run into difficulties if we want the agent to be self-modifying (as we expect it to be). Such an agent would have to trust its descendants which means that subsequent agents use weaker and weaker formal systems until self-modification becomes impossible.

One known solution to this is Benja's parametric polymorphism. In this post I adapt parametric polymorphism to the updateless intelligence metric framework. The formal form of this union looks harmonious but it raises questions which I currently don't fully understand.

"Ineffable mystery" using oracles instead of timeouts

In the original parametric polymorphism, a constant κ is introduced (informally known as "the number of ineffable mystery") s.t. the agent has to prove its actions are "safe" for time period κ (i.e. have no "bad" consequences during this period). Since it's impossible to the agent to prove any upper bound on κ, effectively its actions have to be safe indefinitely. I found that for our purposes it's better to cast the formalism otherwise. Instead of κ playing the role of a "timeout", the agent is provided with an oracle that answers questions of the form "κ > n?" for any n. Contrary to what the agent thinks, the oracle provides an affirmative answer for all n.

Let's do some math. The formalism I introduce at this point is very close to Yudkowsky and Herreshoff, with some adjustments.

Consider an infinite chain of agents A0, A1, A2... where A{i+1} is created by Ai. Formally, each Ai is a program for a universal oracle machine which computes a_i = (a_i, A{i+1}). Here a_i is a "motor action". I'm ignoring sensory input for the sake of simplicity (of course sensory input will reappear when we return to intelligence metrics).

Denote T_κ the usual theory of parametric polymorphism. That is T_κ results from starting with a theory T (e.g. Peano Arithmetic), extending the language by the symbol κ (which behaves like a natural number) and adding the (self-referential) axiom schema 

(1) For all formulae φ: κ > 0 -> For all n: (@_{T_κ} "φ(n)" -> φ(n)[κ\κ-1])

Here '@_S' is a formula with the semantics of "provable in theory S" and φ(n)[κ\κ-1] is φ(n) with all occurrences of κ replaced by κ-1. Thus (1) is a self-referential soundness schema saved from paradoxicality by the "counter" κ. See Yudkowsky and Herreshoff for the proof that T_κ is sound (it has 1 model for each possible value of κ).

Assume the licensing condition for A0 to perform action b_0 is

(2) Exists n: T_κ | κ > n -> (A0() = b_0 -> G)

Here S | φ means that φ is provable in S, G is a formula denoting the "goal" of the agent (as in Yudkowsky and Herreshoff, the agent is for now assumed to be a satisficer), A0() is a formula meaning "the result of the computation A0", where the oracle is interpreted as answering the question "κ > x?" (in particular the formula A0() involves κ). Such a licensing condition can be implemented in practice by making A0 a proof searcher which is allowed to use the oracle to introduce assumptions of the form κ > n. In particular, our agent employs ambient control.

We then have the following proposition, showing that A0 is able to tile indefinitely (create an infinite chain of meaningful descendants). The proposition is formulated for A1 but the rest of the chain follows by induction since the form of the licensing condition is preserved (up to a certain twist) by the tiling.

Proposition

Assume b_0 = (b_0, B1) and B1 is s.t.

(3) T_κ | For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]")))

Here 0 is some default "harmless" action, performed if no appropriate theorem is proved.

Further assume that if B1 is created and does 0, the goal is achieved (in this framework "achieving the goal" is to be thought of as "not doing anything unsafe").

(4) T_κ | (A0() = b_0 and B1() = 0) -> G

Then A0 is allowed to create B1 i.e. licensing condition (2) holds.

B1's licensing condition contains a "twist", namely it believes that when the oracle receives input n it answers the question "κ+1 > n?" rather than "κ > n" and its goal is shifted in the same fashion. Note that the goal typically depends on κ since the agent itself and thus its κ-oracle is part of the universe in which this goal is to be attained. For example the goal can be a function of the numbers (a0, a1, a2...) defined as the output of corresponding computations with κ-oracles. Thus the shift in G can also be interpreted as a shifted interpretation of the oracle by B1.

Proof

(3) trivially implies that

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]"))))

Using the premise κ > 0, we can apply the soundness schema (1). We obtain:

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and ((κ > n and B1() = b_1) -> G)))))

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (B1() = b_1 -> G))))

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or (B1() = b_1 -> G))))

T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or G)))

This trivially implies that

T_κ | κ > 0 -> (For all b_1: ((A0() = b_0 and B1() = b_1) -> (b_1 = 0 or G)))

Applying (4) we finally get (2) for n=0

Back to intelligence metrics

To apply parametric polymorphism to the updateless intelligence formalism, we need to do the following:

  • Assume the machine M on which the agent is running is an oracle machine.
  • Assume the Solomonoff measure of the ideal (Cartesian) universe X is defined using a universal oracle machine. The oracle in M has to correspond to the oracle in the hypothesis T describing X: this correspondence becomes part of the rules N.
  • Assume the universal program u defining the Solomonoff measure for the physical universe is a universal oracle program, i.e. the hypotheses D describing the physical universe are also allowed to invoke the oracle.
  • Assume the logical expectation value EL is computed using T_κ extended by N applied to the given T (this is provable in T_κ anyway but we want the proof to be short) and the axiom schema {κ > n} for every natural number n. The latter extension is consistent since adding any finite number of such axioms admits models. The proofs counted in Einterpret the oracle as answering the the question "κ > n?". That is, they are proofs of theorems of the form "if this oracle-program T computes q when the oracle is taken to be κ > n, then the k-th digit of the expected utility is 0/1 where the expected utility is defined by a Solomonoff sum over oracle programs with the oracle again taken to be κ > n".

Discussion

  • Such an agent, when considering hypotheses consistent with given observations, will always face a large number of different compatible hypothesis with similar complexity. These hypotheses result from arbitrary insertions of the oracle (which increase complexity of course, but not drastically). It is not entirely clear to me how such an epistemology will look like.
  • The formalism admits naturalistic trust to the extent the agent believes that the other agent's oracle is "genuine" and carries a sufficient "twist". This will often be ambiguous so trust will probably be limited to some finite probability. If the other agent is equivalent to the given one on the level of physical implementation then the trust probability is likely to be high.
  • The agent is able to quickly confirm κ > n for any n small enough to fit into memory. For the sake of efficiency we might want to enhance this ability by allowing the agent to confirm that (Exist n: φ(n)) -> Exist n: (φ(n) and κ > n) for any given formula φ.
  • For the sake of simplicity I neglected multi-phase AI development, but the corresponding construction seems to be straightforward.
  • Overall I retain the feeling that a good theory of logical uncertainty should allow the agent to assign a high probability the soundness of its own reasoning system (a la Christiano et al). Whether this will make parametric polymorphism redundant remains to be seen.

Logical thermodynamics: towards a theory of self-trusting uncertain reasoning

5 Squark 28 March 2014 04:06PM

Followup to: Overcoming the Loebian obstacle using evidence logic

In the previous post I proposed a probabilistic system of reasoning for overcoming the Loebian obstacle. For a consistent theory it seems natural the expect such a system should yield a coherent probability assignment in the sense of Christiano et al. This means that

a. provably true sentences are assigned probability 1

b. provably false sentences are assigned probability 0

c. The following identity holds for any two sentences φ, ψ

[1] P(φ) = P(φ and ψ) + P(φ and not-ψ)

In the previous formalism, conditions a & b hold but condition c is violated (at least I don't see any reason it should hold).

In this post I attempt to achieve the following:

  • Solve the problem above.
  • Generalize the system to allow for logical uncertainty induced by bounded computing resources. Note that although the original system is already probabilistic, in is not uncertain in the sense of assigning indefinite probability to the zillionth digit of pi. In the new formalism, the extent of uncertainty is controlled by a parameter playing the role of temperature in a Maxwell-Boltzmann distribution.

Construction

Define a probability field to be a function p : {sentences} -> [0, 1] satisfying the following conditions:

  • If φ is a tautology in propositional calculus (e.g. φ = ψ or not-ψ) then p(φ) = 1
  • For all φ: p(not-φ) = 1 - p(φ)
  • For all φ, ψ: P(φ) = P(φ and ψ) + P(φ and not-ψ)
Probability fields are a convex set: a convex linear combination of probability fields is a probability field. Essentially, probability fields are probability measures in the space of truth assignments consistent w.r.t. propositional calculus.

We define the energy of a probability field p to be E(p) := Σφ Σv 2-l(v) Eφ,v(p(φ)). Here v are pieces of evidence as defined in the previous post, Eφ,v are their associated energy functions and l(v) is the length of (the encoding of) v. We assume  that the encoding of v contains the encoding of the sentence φ for which it is evidence and Eφ,v(p(φ)) := 0 for all φ except the relevant one. Note that the associated energy functions are constructed in the same way as in the previous post, however they are not the same because of the self-referential nature of the construction: it refers to final probability assignment.

The final probability assignment is defined to be

P(φ) = Integralp [e-E(p)/T p(φ)] / Integralp e-E(p)/T

Here T >= 0 is a parameter representing the magnitude of logical uncertainty. The integral is infinite-dimensional so it's not obviously well-defined. However, I suspect it can be defined by truncating to a finite set of statements and taking a limit wrt this set. In the limit T -> 0, the expression should correspond to computing the centroid of the set of minima of E (which is convex because E is convex).

Remarks

  • Obviously this construction is merely a sketch and work is required to show that
    • The infinite-dimensional integrals are well-defined
    • The resulting probability assignment is coherent for consistent theories and T = 0
    • The system overcomes the Loebian obstacle for tiling agents in some formal sense
  • For practical application to AI we'd like an efficient way to evaluate these probabilities. Since the form of the probabilities is analogous to statistical physics, it is suggestive to use similarly inspired Monte Carlo algorithms.

 

Overcoming the Loebian obstacle using evidence logic

4 Squark 14 March 2014 06:34PM

In this post I intend to:

  • Briefly explain the Loebian obstacle and it's relevance to AI (feel free to skip it if you know what the Loebian obstacle is).
  • Suggest a solution in the form a formal system which assigns probabilities (more generally probability intervals) to mathematical sentences (and which admits a form of "Loebian" self-referential reasoning). The method is well-defined both for consistent and inconsistent axiomatic systems, the later being important in analysis of logical counterfactuals like in UDT.

Background

Logic

When can we consider a mathematical theorem to be established? The obvious answer is: when we proved it. Wait, proved it in what theory? Well, that's debatable. ZFC is popular choice for mathematicians, but how do we know it is consistent (let alone sound, i.e. that it only proves true sentences)? All those spooky infinite sets, how do you know it doesn't break somewhere along the line? There's lots of empirical evidence, but we can't prove it, and it's proofs we're interesting in, not mere evidence, right?

Peano arithmetic seems like a safer choice. After all, if the natural numbers don't make sense, what does? Let's go with that. Suppose we have a sentence s in the language of PA. If someone presents us with a proof p in PA, we believe s is true. Now consider the following situations: instead of giving you a proof of s, someone gave you a PA-proof p1 that p exists. After all, PA admits defining "PA-proof" in PA language. Common sense tells us that p1 is a sufficient argument to believe s. Maybe, we can prove it within PA? That is, if we have a proof of "if a proof of s exists then s" and a proof of R(s)="a proof of s exists" then we just proved s. That's just modus ponens

There are two problems with that.

First, there's no way to prove the sentence L:="for all s if R(s) then s", since it's not a PA-sentence at all. The problem is that "for all s" references s as a natural number encoding a sentence. On the other hand, "then s" references s as the truth-value of the sentence. Maybe we can construct a PA-formula T(s) which means "the sentence encoded by the number s is true"? Nope, that would get us in trouble with the liar paradox (it would be possible to construct a sentence saying "this sentence is false").

Second, Loeb's theorem says that if we can prove L(s):="if R(s) exists then s" for a given s, then we can prove s. This is a problem since it means there can be no way to prove L(s) for all s in any sense, since it's unprovable for s which are unprovable. In other words, if you proved not-s, there is no way to conclude that "no proof of s exists".

What if we add an inference rule Q to our logic allowing to go from R(s) to s? Let's call the new formal system PA1p1 appended by a Q-step becomes an honest proof of s in PA1. Problem solved? Not really! Now someone can give you a proof of 
R1(s):="a PA1-proof of s exists". Back to square one! Wait a second, what if we add a new rule Q1 allowing to go from R1(s) to s? OK, but now we got R2(s):="a PA2-proof of s exists". Hmm, what if add an infinite number of rules Qk? Fine, but now we got Rω(s):="a PAω-proof of s exists". And so on, and so forth, the recursive ordinals are a plenty...

Bottom line, Loeb's theorem works for any theory containing PA, so we're stuck.

AI

Suppose you're trying to build a self-modifying AGI called "Lucy". Lucy works by considering possible actions and looking for formal proofs that taking one of them will increase expected utility. In particular, it has self-modifying actions in its strategy space. A self-modifying action creates essentially a new agent: Lucy2. How can Lucy decide that becoming Lucy2 is a good idea? Well, a good step in this direction would be proving that Lucywould only take actions that are "good". I.e., we would like Lucy to reason as follows "Lucyuses the same formal system as I, so if she decides to take action a, it's because she has a proof p of the sentence s(a) that 'a increases expected utility'. Since such a proof exits, a does increase expected utility, which is good news!" Problem: Lucy is using L in there, applied to her own formal system! That cannot work! So, Lucy would have a hard time self-modifying in a way which doesn't make its formal system weaker

As another example where this poses a problem, suppose Lucy observes another agent called "Kurt". Lucy knows, by analyzing her sensory evidence, that Kurt proves theorems using the same formal system as Lucy. Suppose Lucy found out that Kurt proved theorem s, but she doesn't know how. We would like Lucy to be able to conclude s is, in fact, true (at least with the probability that her model of physical reality is correct). Alas, she cannot.

See MIRI's paper for more discussion.

Evidence Logic

Here, cousin_it explains a method to assign probabilities to sentences in an inconsistent theory T. It works as follows. Consider sentence s. Since T is inconsistent, there are T-proofs both of s and of not-s. Well, in a courtroom both sides are allowed to have arguments, why not try the same approach here? Let's weight the proofs as a function of their length, analogically to weighting hypotheses in Solomonoff induction. That is, suppose we have a prefix-free encoding of proofs as bit sequences. Then, it makes sense to consider a random bit sequence and ask whether it is a proof of something. Define the probability of s to be

P(s) := (probability of a random sequence to be a proof of s) / (probability of a random sequence to be a proof of s or not-s)

Nice, but it doesn't solve the Loebian obstacle yet.

I will now formulate an extension of this idea that allows assigning an interval of probabilities [Pmin(s), Pmax(s)] to any sentence s. This interval is a sort of "Knightian uncertainty". I have some speculations how to extract a single number from this interval in the general case, but even without that, I believe that Pmin(s) = Pmax(s) in many interesting cases.

First, the general setting:

  • With every sentence s, there are certain texts v which are considered to be "evidence relevant to s". These are divided into "negative" and "positive" evidence. We define sgn(v) := +1 for positive evidence, sgn(v) := -1 for negative evidence.
  • Each piece of evidence v is associated with the strength of the evidence strs(v) which is a number in [0, 1]
  • Each piece of evidence v is associated with an "energy" function es,v : [0, 1] -> [0, 1]. It is a continuous convex function.
  • The "total energy" associated with s is defined to b es := ∑v 2-l(ves,v where l(v) is the length of v.
  • Since es,v are continuous convex, so is es. Hence it attains its minimum on a closed interval which is 
    [Pmin(s), Pmax(s)] by definition.
Now, the details:
  • A piece of evidence v for s is defined to be one of the following:
    • a proof of s
      • sgn(v) := +1
      • strs(v) := 1
      • es,v(q) := (1 - q)2
    • a proof of not-s
      • sgn(v) := -1
      • strs(v) := 1
      • es,v(q) := q2
    • a piece of positive evidence for the sentence R-+(s, p) := "Pmin(s) >= p"
      • sgn(v) := +1
      • strs(v) := strR-+(s, p)(v) p
      • es,v(q) := 0 for q > p; strR-+(s, p)(v) (q - p)2 for q < p
    • a piece of negative evidence for the sentence R--(s, p) := "Pmin(s) < p"
      • sgn(v) := +1
      • strs(v) := strR--(s, p)(v) p
      • es,v(q) := 0 for q > p; strR--(s, p)(v) (q - p)2 for q < p
    • a piece of negative evidence for the sentence R++(s, p) := "Pmax(s) > p"
      • sgn(v) := -1
      • strs(v) := strR++(s, p)(v) (1 - p)
      • es,v(q) := 0 for q < p; strR-+(s, p)(v) (q - p)2 for q > p
    • a piece of positive evidence for the sentence R+-(s, p) := "Pmax(s) <= p"
      • sgn(v) := -1
      • strs(v) := strR+-(s, p)(v) (1 - p)
      • es,v(q) := 0 for q < p; strR-+(s, p)(v) (q - p)2 for q > p
Technicality: I suggest that for our purposes, a "proof of s" is allowed to be a proof of sentence equivalent to s in 0-th order logic (e.g. not-not-s). This ensures that our probability intervals obey the properties we'd like them to obey wrt propositional calculus.

Now, consider again our self-modifying agent Lucy. Suppose she makes her decisions according to a system of evidence logic like above. She can now reason along the lines of "Lucyuses the same formal system as I. If she decides to take action a, it's because she has strong evidence for the sentence s(a) that 'a increases expected utility'. I just proved that there would be strong evidence for the expected utility increasing. Therefore, the expected utility would have a high value with high logical probability. But evidence for high logical probability of a sentence is evidence for the sentence itself. Therefore, I now have evidence that expected utility will increase!"

This analysis is very sketchy, but I think it lends hope that the system leads to the desired results.