The Role of Physics in UDT: Part I
Followup to: Anatomy of Multiversal Utility Functions: Tegmark Level IV
Outline: In the previous post, I discussed the properties of utility functions in the extremely general setting of the Tegmark level IV multiverse. In the current post, I am going to show how the discovery of a theory of physics allows the agent performing a certain approximation in its decision theory. I'm doing this with an eye towards analyzing decision theory and utility calculus in universes governed by realistic physical theories (quantum mechanics, general relativity, eternal inflation...)
A Naive Approach
Previously, we have used the following expression for the expected utility:
[1]
Since the integral is over the entire "level IV multiverse" (the space of binary sequences), [1] makes no reference to a specific theory of physics. On the other hand, a realistic agent is usually expected to use its observations to form theories about the universe it inhabits, subsequently optimizing its action with respect to the theory.
Since this process crucially depend on observations, we need to make the role of observations explicit. Since we assume the agent uses some version of UDT, we are not supposed to update on observations, instead evaluating the logical conditional expectation values
[2]
Here is the agent,
is a potential policy for the agent (mapping from sensory inputs to actions) and
is expectation value with respect to logical uncertainty.
Now suppose made observations
leading it to postulate physical theory
. For the sake of simplicity, we suppose
is only deciding its actions in the universes in which observations
were made1. Thus, we assume that the input space factors as
and we're only interested in inputs in the set
. This simplification leads to replacing [2] by
[3]
where is a "partial" policy referring to the
-universe only.
The discovery of allows
to perform a certain approximation of [2']. A naive guess of the form of the approximation is
[4']
Here, is a constant representing the contributions of the universes in which
is not valid (whose logical-uncertainty correlation with
we neglect) and
is a measure on
corresponding to
. Now, physical theories in the real world often specify time evolution equations without saying anything about the initial conditions. Such theories are "incomplete" from the point of view of the current formalism. To complete it we need a measure on the space of initial conditions: a "cosmology". A simple example of a "complete" theory
: a cellular automaton with deterministic (or probabilistic) evolution rules and a measure on the space of initial conditions (e.g. set each cell to an independently random state).
However, [4'] is in fact not a valid approximation of [3]. This is because the use of fixes the ontology:
treats binary sequences as encoding the universe in a way natural for
whereas dominant2 contributions to [3] come from binary sequences which encode the universe in a way natural for
.
Ontology Decoupling
Allow me a small digression to discussing desiderata of logical uncertainty. Consider an expression of the form where
is a mathematical constant with some complex definition e.g.
or the Euler-Mascheroni constant
. From the point of view of an agent with bounded computing resources,
is a random variable rather than a constant (since its value is not precisely known). Now, in usual probability theory we are allowed to use identities such as
. In the case of logical uncertainty, the identity is less obvious since the operation of multiplying by 2 has non-vanishing computing cost. However, since this cost is very small we expect to have the approximate identity
.
Consider a set of programs computing functions
containing the identity program. Then, the properties of the Solomonoff measure give us the approximation
[5]
Here is the restriction of
to hypotheses which don't decompose as applying some program in
to another hypothesis and
is the length of the program
.
Applying [5] to [3] we get
Here is a shorthand notation for
. Now, according to the discussion above, if we choose
to be a set of sufficiently cheap programs3 we can make the further approximation
If we also assume to sufficiently large, it becomes plausible to use the approximation
[4]
The ontology problem disappears since bridges between the ontologies of
and
. For example, if
describes the Game of Life and
describes glider maximization in the Game of Life, but the two are defined using different encodings of Game of Life histories, the term corresponding to the re-encoding
will be dominant2 in [4].
Stay Tuned
The formalism developed in this post does not yet cover the entire content of a physical theory. Realistic physical theories not only describe the universe in terms of an arbitrary ontology but explain how this ontology relates to the "classical" world we experience. In other words, a physical theory comes with an explanation of the embedding of the agent in the universe (a phenomenological bridge). This will be addressed in the next post where I explain the Cartesian approximation: the approximation decoupling between the agent and the rest of the universe.
Subsequent posts will apply this formalism to quantum mechanics and eternal inflation to understand utility calculus in Tegmark levels III and II respectively.
1 As opposed to a fully fledged UDT agent which has to simultaneously consider its behavior in all universes.
2 By "dominant" I mean dominant in dependence on the policy rather than absolutely.
3 They have to be cheap enough to take the entire sum out of the expectation value rather than only the factor in a single term. This condition depends on the amount of computing resources available to our agent, which is an implicit parameter of the logical-uncertainty expectation values
.
Meetup : Tel Aviv: Rationality in the Immune System
Discussion article for the meetup : Tel Aviv: Rationality in the Immune System
There was a change of speaker after the first announcement.
On Tuesday, February 24, Liran Valadarsky will be speaking on rationality in the immune system.
Our immune system needs to balance many simultaneous and often contradictory tasks. For example, a weak inflammatory response might be insufficient to fight an infection while too strong of a response causes unnecessary damage and, in extreme cases, system failure. How does the immune system extract the relevant information from its environment and 'update its beliefs' so as to carry out the most relevant response?
Liran has a BSc in biology and is currently in transition between MSc and PhD in Immunology at the Weizmann Institute of Science.
Where: Google, Electra Tower, 98 Yigal Alon Street, Tel Aviv. Meet at the 29th floor (not the Google Campus floor). We'll then move to a room.
When: Tuesday, February 24 at 19:00
Contact: If you can't find us, call Anatoly, who is graciously hosting us, at 054-245-1060; or Joshua at 054-569-1165.
Facebook: Join the group. If you have FB, please RSVP at the FB event which will appear there, to help us get a sense of what to expect.
You can also post questions at https://groups.google.com/forum/#!forum/lesswrong-tel-aviv
Discussion article for the meetup : Tel Aviv: Rationality in the Immune System
Anatomy of Multiversal Utility Functions: Tegmark Level IV
Outline: Constructing utility functions that can be evaluated on any possible universe is known to be a confusing problem, since it is not obvious what sort of mathematical object should be the domain and what properties should the function obey. In a sequence of posts, I intend break down the question with respect to Tegmark's multiverse levels and explain the answer on each level, starting with level IV in the current post.
Background
An intelligent agent is often described as an entity whose actions drive the universe towards higher expectation values of a certain function, known as the agent's utility function. Such a description is very useful in contexts such as AGI, FAI, decision theory and more generally any abstract study of intelligence.
Applying the concept of a utility function to agents in the real worlds requires utility functions with a very broad domain. Indeed, since the agent is normally assumed to have only finite information about the universe in which it exists, it should allow for a very large variety of possible realities. If the agent is to make decisions using some sort of utility calculus, it has to be able to evaluate its utility function on each of the realities it can conceive.
Tegmark has conveniently arranged the space of possible realities ("universes") into 4 levels, 3 of which are based on our current understanding of physics. Tegmark's universes are usually presented as co-existing but it is also possible to think of them as the "potential" universes in which our agent can find itself. I am going to traverse Tegmark's multiverse from top to bottom, studying the space of utility functions on each level (which, except for level IV, is always derived from the higher level). The current post addresses Tegmark level IV, leaving the lower levels for follow-ups.
Some of the ideas in this post previously appeared in a post about intelligence metrics, where I explained them much more tersely.
Tegmark Level IV
Tegmark defined this level as the collection of all mathematical models. Since it is not even remotely clear how to define such a beast, I am going to use a different space which (I claim) is conceptually very close. Namely, I am going to consider universes to be infinite binary sequences . I denote the by
the space of all such sequences equipped with the product topology. As will become clearer in the following, this space embodies "all possible realities" since any imaginable reality can be encoded in such a sequence1.
The natural a priori probability measure on this space is the Solomonoff measure . Thus, a priori utility expectation values take the form
[1]
From the point of view of Updateless Decision Theory, a priori expectation values are the only sort that matters: conditional expectation values wrt logical uncertainty replace the need to update the measure.
In order to guarantee the convergence of expectation values, we are going to assume is a bounded function.
A Simple Example
So far, we know little about the form of the function . To illustrate the sort of constructions that are relevant for realistic or semi-realistic agents, I am going to consider a simple example: the glider maximizer.
The glider maximizer is an agent living inside the Game of Life. Fix
a forward light cone within the Game of Life spacetime, representing the volume
is able to influence.
maximizes the following utility function:
Here, is a history of the Game of Life,
is a constant in
and
is the number of gliders at time
inside
.
We wish to "release" from the Game of Life universe into the broader multiverse. In order words, we want an agent that doesn't dogmatically assume itself to exist with the Game of Life, instead searching for appearances of the Game of Life in the physical universe and maximizing gliders there.
To accomplish this, fix a way to bijectively encode histories of
as binary sequences. Allow arbitrary histories: don't impose Game of Life rules. We can then define the "multiversal" utility function
Here is the set of cells in which
satisfies Game of Life rules,
is a positive constant and
is the number of cells in
at time
.
In other words, the "liberated" prefers for many cells to satisfy Game of Life rules and for many cells out of these to contain gliders.
Superficially, it seems that the construction of strongly depends on the choice of
. However, the dependence only marginally affects
-expectation values. This is because replacing
with
is equivalent to adjusting probabilities by bounded factor. The bound is roughly
where
is the Kolmogorov complexity of
.
Human Preferences and Dust Theory
Human preferences revolve around concepts which belong to an "innate" model of reality: a model which is either genetic or acquired by brains at early stages of childhood. This model describes the world mostly in terms of humans, their emotions and interactions (but might include other elements as well e.g. elements related to wild nature).
Therefore, utility functions which are good descriptions of human preferences ("friendly" utility functions) are probably of similar form to from the Game of Life example, with Game of Life replaced by the "innate human model".
Applying UDT to the -expectation values of such utility functions leads to agents which care about anything that has a low-complexity decoding into an "innate concept" e.g. biological humans and whole brain emulations. The
-integral assigns importance to all possible "decodings" of the universe weighted by their Kolmogorov complexity which is slightly reminiscent of Egan's dust theory.
The Procrastination Paradox
Consider an agent living in a universe I call "buttonverse".
can press a button at any moment of time
.
's utility function
assigns 1 to histories in which the button was pressed at least once and 0 to histories in which the button was never pressed. At each moment of time, it seems rational for
to decide not to press the button since it will have the chance to do so at a later time without losing utility. As a result, if
never presses the button its behavior seems rational at any particular moment but overall leads to losing. This problem (which has important ramifications for tiling agents) is known as the procrastination paradox.
My point of view on the paradox is that it is the result of a topological pathology of . Thus, if we restrict ourselves to reasonable utility functions (in the precise sense I explain below), the paradox disappears.
Buttonverse histories are naturally described as binary sequences where
is 0 when the button is not pressed at time
and 1 when the button is pressed at time
. Define
to be the buttonverse history in which the button is never pressed:
Consider the following sequence of buttonverse histories: is the history in which the button gets pressed at time
only. That is
Now, with respect to the product topology on ,
converge to the
:
However the utilities don't behave correspondingly:
Therefore, it seems natural to require any utility function to be an upper semicontinuous function on X 2. I claim that this condition resolves the paradox in the precise mathematical sense considered in Yudkowsky 2013. Presenting the detailed proof would take us too far afield and is hence out of scope for this post.
Time Discount
Bounded utility functions typically contain some kind of temporal discount. In the Game of Life example, the discount manifests as the factor . It is often assumed that the discount has to take an exponential form in order to preserve time translation symmetry. However, the present formalism has no place for time translation symmetry on the fundamental level: our binary sequences have well-defined beginnings. Obviously this doesn't rule out exponential discount but the motivation for sticking to this particular form is weakened.
Note that any sequence contributes to the
-integral in [1] together with its backward translated versions
:
As a result, the temporal discount function effectively undergoes convolution with the function where
is the Kolmogorov complexity of the number
. As a result, whatever the form of "bare" temporal discount, the effective temporal discount falls very slowly3.
In other words, if a utility function assigns little or no importance to the distant future, a UDT agent maximizing the expectation value of
would still care a lot about the distant future, because what is distant future in one universe in the ensemble is the beginning of the sequence in another universe in the ensemble.
Next in sequence: The Role of Physics in UDT, Part I
1 It might seem that there are "realities" of higher set theoretic cardinality which cannot be encoded. However, if we assume our agent's perceptions during a finite span of subjective time can be encoded as a finite number of bits, then we can safely ignore the "larger" realities. They can still exist as models the agent uses to explain its observations but it is unnecessary to assume them to exist on the "fundamental" level.
2 In particular, all computable functions are admissible since they are continuous.
3 I think that falls slower than any computable function with convergent integral.
Computation complexity of AGI design
Summary of main point: I argue that there is a significant probability that creating de novo AGI is an intractable problem. Evolution only solved this problem because of anthropic reasons. Conclusions are drawn regarding priorities in AI risk research.
Sketch of main argument: There are suggestive relations between AGI and NP-completeness. These relations lead me to hypothesize that AGI programs posses large Levin-Kolmogorov complexity which implies that producing them is a computationally intractable problem. The timing of events in the evolution of human intelligence seems to be consistent with the assumption evolution's success is anthropic, if we postulate human intelligence as arising from a combination of two modules: an "easy" (low complexity) module and a "hard" (high complexity) module. Therefore, creating superhuman intelligence will require reverse engineering the human brain and be limited to improving the "easy" module (since creating a better "hard" module is again computationally intractable).
AGI and P vs. NP
There are several arguments the AGI problem is of a similar "flavor" to problems that are NP-complete.
The first argument is rather vague but IMO still compelling. Many class separations in complexity theory (P vs. NP, L vs. P, R vs. RE) hinge on the existence of a complete language. This means there is a single problem solving which under the stronger resource constraints would lead to solving all problems in the larger class. Similarly, Goedel incompleteness means there is no single algorithm (a program which terminates on all inputs) for proving all provable theorems. It feels like there is a principle of mathematics which rules out algorithms that are "too good to be true": a single "magic wand" to solve all problems. In a similar way, AGI is a "magic wand": it solves "all" problems because you can simply delegate them to the AGI.
Another argument has to do with Solomonoff induction. Solomonoff induction is incomputable but it becomes computable if we set a limit T on the run-time of the "hypotheses" (programs) we consider. However, the resulting computable induction carries an
O(T 2T) slow-down penalty (the time it takes to run all possible hypotheses). On the other hand, the problem is easy modulo P# and tractable given an NP-complete oracle given certain assumptions on the required probability accuracy.
Yet another argument goes through logical uncertainty. The latter is widely suspected to be an important component of AGI and there is a compelling relation between it and P vs. NP.
What does all of it mean? We certainly don't need an NP-oracle to construct an AGI since humans are "A"GIs and (presumably) there are no NP-oracles in our brain. To shed light on this, it is useful to take the quantitative point of view on AGI. Namely, there is a metric which rates programs according to how "intelligent" they are. From this point-of-view, an AGI is just a program which ranks high on this metric. The first such metric was suggested by Legg and Hutter and I improved on their construction by combining it with UDT.
This way the AGI design problem becomes an optimization problem: find a program with an intelligence metric as high as possible. The NP-connection now suggests the following conjecture: the AGI optimization program is of exponential complexity in program length. Of course we don't necessarily need the best program of a given length but the impression remains that AGI design is hard in some rigorous complexity theoretic sense. In particular, I'm guessing there should be a relation between the intelligence (in the precise quantitative sense) of a program and its Levin-Kolmogorov complexity.
The anthropic scenario
If we buy into the conjecture above, a glaring problem appears: if AGI design is so hard, how come evolution succeeded in it? After all, evolution is also a process with bounded computing resources. The only explanation that seems to remain is the anthropic one: evolution's a priori probability of success was insanely low but in an infinite universe it still succeeds infinitely many times and we observe one of these times for the obvious reason.
This explanation produces probabilistic predictions regarding the timing of events. For example, if there was no cosmological upper bound on when intelligence can appear we would expect it would appear extremely late. This is not the case in our universe (on a cosmological time scale). However, this is not difficult to explain since there is a relatively short time window in the lifetime of the universe in which suitable planets revolving suitable stars exist. In particular, on Earth in 0.6 billion years there won't be trees any more and in 1.1 billion years there won't oceans.
As well known, in scenarios with hard steps that are overcome anthropically, the hard steps are expected to be distributed on the timeline approximately uniformly. This seems to conflict with the most intuitive location of the intelligence hard step: somewhere between chimp and human. However, the apparent discrepancy goes away if we consider a model with two coupled "intelligence modules": an "easy" module E which is susceptible to non-anthropic evolutionary optimization and a "hard" module H which contains most of the Levin-Kolmogorov complexity and whose appearance is the hard step in question.
Before the hard step, an early version E1 of E co-evolves with a module h which performs a similar function to H but does it much worse (imagine a rough heuristic which works for many of the cases in a relatively narrow domain). During the hard step, H appears "out of the blue" due to sheer anthropic luck after which the E1-h "wire" is replaced by an E1-H wire. After the hard step, natural selection proceeds to transform E1 into its final version E2. This picture seems to be consistent with hard step happening to our chimp-like ancestor after which natural selection rapidly transformed the result into homo sapiens sapiens.
This scenario would be undermined if there was an "E-like" property of our ancestors which evolved shortly before the presumed hard step. What can this property be? The best candidate I can think of is the evolution of hands. Apparently, hands evolved 100 millions years ago. The ratio between this number and the remaining 600 million years doesn't seem to be small enough to rule out the anthropic scenario. The argument is made stronger if we take into account that there is an extinction event every 100 million years or so which means we can't reasonably expect a much larger time difference.
Consequences for future of mankind
If AGI is a computationally intractable problem, we won't be able to solve it "fairly" in the near future. However, we can use the existing solution: homo sapiens sapiens. This means reverse engineering the brain and either modifying it (improving module E) or extracting (emulating) H and writing E from scratch. It is not clear how much intelligence improvement to expect: on the one hand we're stuck with the current H on the other hand E might still have lots of room for improvement (which is intuitively likely). It is not clear whether the monopole (singleton) or multipole scenario is more likely. It feels to me that a singleton will require rewriting E and it will be easier to start tweaking it therefore multipole superhuman intelligence will be first.
Reverse engineering and modifying the brain is a project which is likely to require considerable resources and encounter enormous legal barriers. As opposed to de novo AGI, it is difficult to imagine it accomplished by a small group or any private organization. The most likely scenario seems to be a major government project in the spirit of Manhattan, Apollo or LHC. The currently prevailing culture / system of beliefs makes it extremely unlikely for the government of a liberal country to undertake such a project if the technology was available. If this circumstance doesn't change, the first government to try will be an authoritarian one like China. Such a government will ensure the resulting superhumans will have extreme built-in loyalty*, resulting in a world-wide superdictatorship. Therefore, the highest priority seems to be changing culture in a way that will ensure a supportive public opinion for a future friendly superintelligence project. Another high priority is continuing to develop the abstract mathematical theory to better understand the likelihood of this and other scenarios.
* I am assuming (or hoping) that no government will be stupid enough to try it before brain reverse engineering identifies the "utility function module"
EDIT: The treatment of anthropics in this post is unforgivably oversimplified. I'm hoping to a write a UDT-based analysis later. Also, thanks to Mark Friedenbach for point out the extremely relevant paper by Shulman and Bostrom.
Meetup : LessWrong Tel Aviv: Rational Career Development
Discussion article for the meetup : LessWrong Tel Aviv: Rational Career Development
The organization of this meetup was presided by Aur Saraf. The text below is by him.
WHEN: 18 September 2014 07:00:00PM (+0300) WHERE: 98 Yigal Alon St., 29th floor, Tel Aviv
We're going to have a meetup on Thursday, September 18th at Google Israel's offices, Electra Tower, 98 Yigal Alon st., Tel Aviv.
This time will be a talk about Rational Career Development by Joshua Fox. It won't be a very long talk, so you're invited to prepare your own short perspective on the subject.
We'll start the meetup at 19:00, and we'll go on as much as we like to or until Anatoly kicks us out.
Please come on time, as we will begin the talk close to when we start. But, if you can only come later, thats totally ok!
We'll meet at the 29th floor of the building (Note: Not where Google Campus is). If you arrive and cant find your way around, call Anatoly who is graciously hosting us at 054-245-1060.
The Israeli Less Wrong Meetup happens once every two weeks. This is to allow people who cant make it to a meetup to not have to wait a whole month to meet again, and because we'd like to have both subject based and social meetups without having to wait for a month in between.
If you have any question feel free to email me at sonoflilit@gmail.com or call me at 054-9400-840.
Aur
PS I misread Guy's proposed time for ending voting. He wrote tomorrow, I read today. Since it's not a vote where people are likely to wait for the last moment, since I already talked to both speakers and organized all the details, and since Joshua keeps Shabbas and would probably want some weekend time to prepare the talk, I'm closing the vote early and announcing the talk now. Sorry.
PS2 I think everything went quite well this time. Please take a moment to go to Trello and post more talk proposals (or own a talk with many votes and no speaker) so that it would go as well next month!
PS3 Thanks again to Anatoly for hosting us almost every two weeks us at such a great location, and thanks to Gal who almost never takes vacations from organizing meetings.
Discussion article for the meetup : LessWrong Tel Aviv: Rational Career Development
Announcing a google group for technical discussion of FAI
I'm pleased to announce friendly-artificial-intelligence, a google group intended for research-level discussion of problems in FAI and AGI, in particular for discussions that are highly technical and/or math intensive.
Some examples of possible discussion topics: naturalized induction, decision theory, tiling agents / Loebian obstacle, logical uncertainty...
I invite everyone who want to take part in FAI research to participate in the group. This obviously includes people affiliated with MIRI, FHI and CSER, people who attend MIRI workshops and participants of the southern california FAI workshop.
Please, come in and share your discoveries, ideas, thoughts, questions et cetera. See you there!
Tiling agents with transfinite parametric polymorphism
The formalism presented in this post turned out to be erroneous (as opposed to the formalism in the previous post). The problem is that the step in the proof of the main proposition in which the soundness schema is applied cannot be generalized to the ordinal setting since we don't know whether ακ is a successor ordinal so we can't replace it by ακ'=ακ-1. I'm not deleting this post primarily to preserve the useful discussion in the comments.
Followup to: Parametric polymorphism in updateless intelligence metric
In the previous post, I formulated a variant of Benja's parametric polymorphism suitable for constructing updateless intelligence metrics. More generally, this variants admits agents which are utility maximizers (in the informal sense of trying their best to maximize a utility function, not in the formal sense of finding the absolutely optimal solution; for example they might be "meliorizers" to use the terminology of Yudkowsky and Herreshoff) rather than satisficers. The agents using this formalism labor under a certain "delusion", namely, since they believe that κ ("the number of ineffable mystery") is an actual finite number (whereas it is secretly infinite, in a sense), they think that the chain of tiling agents is effectively finite as well (since at some point agent #n in the chain will discover that κ > n is false and will fail to construct a subsequent agent of the same "epistemic strength"). The same problem exists in Weaver's intuitionistic assertability predicate formalism.
To overcome this limitation, I suggest to extend κ's semantics from natural numbers to elements of a certain recursive ordinal (at least morally; technically it is done a bit differently, see below). In Benja's original formulation this doesn't appear a valid option, since κ is interpreted as a time interval. However, in my variant κ is just an abstract parameter queries about which can be directed to a special "oracle", so there is no such limitation.
Formalism
Fix α a recursive ordinal and αi a system of notations for ordinals smaller than α such that α0=0, the function f defined by αf(i)=αi+1 is recursive and the bit-valued function g(i,j) which equals 1 iff αi < αj is also recursive. We also assume that
(0) T |- For all i: g(i,f(i))=1
where T is our starting theory. We define Tκ(α) to be T extended by the symbol κ (which stands for the index of the ordinal, not the ordinal itself) and the following self-referential soundness schema:
(1) For all formulae φ: For all n, i: ((@_Tκ(α) "φ(n)" and g(i,κ)=1) -> φ(n)[κ\i])
Tκ(α) is proven to be sound analogously to Yudkowsky and Herreshoff, with transfinite induction over ordinals below α replacing the ordinary induction.
The licensing condition for agent A0 to perform action b0 is
(2) Exists n: Tκ(α) |- (g(n,κ)=1 -> (A0()=b0 -> G))
The property which is sufficient for agent B1 to be creatable by agent A0 is
(3) Tκ(α) |- For all b1: (B1()=b1 -> (b1=0 or Exists n: (g(n,κ)=1 and @_Tκ(α) "(g(n,f(κ))=1 and B1()[κ\f(κ)]=b1) -> G[κ\f(κ)]")))
The proof that (3) indeed enables tiling is analogous to the previous post (but requires using (0)).
Discussion
In order for A0 to know that the tiling chain is infinite, it is enough for her to verify ακ >= ω. In order for her to know that A1 is going to know the tiling chain is infinite, she needs to verify ακ >= ω+1. In order for her to know that all agents are going to know the tiling chain is infinite, she needs to verify ακ >= 2ω. In order for her to know that all agents are going to know that, she needs to verify ακ >= 3ω et cetera.
It remains to decide which ordinal should we actually use. My intuition is that the correct ordinal is the least α with the property that α is the proof-theoretic ordinal of Tκ(α) extended by the axiom schema {g(i,κ)=1}. This seems right since the agent shouldn't get much from ακ > β for β above the proof theoretic ordinal. However, a more formal justification is probably in order.
Parametric polymorphism in updateless intelligence metrics
Followup to: Agents with Cartesian childhood and Physicalist adulthood
In previous posts I have defined a formalism for quantifying the general intelligence of an abstract agent (program). This formalism relies on counting proofs in a given formal system F (like in regular UDT), which makes it susceptible to the Loebian obstacle. That is, if we imagine the agent itself making decisions by looking for proofs in the same formal system F then it would be impossible to present a general proof of its trustworthiness, since no formal system can assert is own soundness. Thus the agent might fail to qualify for high intelligence ranking according to the formalism. We can assume the agent uses a weaker formal system the soundness of which is provable in F but then we still run into difficulties if we want the agent to be self-modifying (as we expect it to be). Such an agent would have to trust its descendants which means that subsequent agents use weaker and weaker formal systems until self-modification becomes impossible.
One known solution to this is Benja's parametric polymorphism. In this post I adapt parametric polymorphism to the updateless intelligence metric framework. The formal form of this union looks harmonious but it raises questions which I currently don't fully understand.
"Ineffable mystery" using oracles instead of timeouts
In the original parametric polymorphism, a constant κ is introduced (informally known as "the number of ineffable mystery") s.t. the agent has to prove its actions are "safe" for time period κ (i.e. have no "bad" consequences during this period). Since it's impossible to the agent to prove any upper bound on κ, effectively its actions have to be safe indefinitely. I found that for our purposes it's better to cast the formalism otherwise. Instead of κ playing the role of a "timeout", the agent is provided with an oracle that answers questions of the form "κ > n?" for any n. Contrary to what the agent thinks, the oracle provides an affirmative answer for all n.
Let's do some math. The formalism I introduce at this point is very close to Yudkowsky and Herreshoff, with some adjustments.
Consider an infinite chain of agents A0, A1, A2... where A{i+1} is created by Ai. Formally, each Ai is a program for a universal oracle machine which computes a_i = (a_i, A{i+1}). Here a_i is a "motor action". I'm ignoring sensory input for the sake of simplicity (of course sensory input will reappear when we return to intelligence metrics).
Denote T_κ the usual theory of parametric polymorphism. That is T_κ results from starting with a theory T (e.g. Peano Arithmetic), extending the language by the symbol κ (which behaves like a natural number) and adding the (self-referential) axiom schema
(1) For all formulae φ: κ > 0 -> For all n: (@_{T_κ} "φ(n)" -> φ(n)[κ\κ-1])
Here '@_S' is a formula with the semantics of "provable in theory S" and φ(n)[κ\κ-1] is φ(n) with all occurrences of κ replaced by κ-1. Thus (1) is a self-referential soundness schema saved from paradoxicality by the "counter" κ. See Yudkowsky and Herreshoff for the proof that T_κ is sound (it has 1 model for each possible value of κ).
Assume the licensing condition for A0 to perform action b_0 is
(2) Exists n: T_κ | κ > n -> (A0() = b_0 -> G)
Here S | φ means that φ is provable in S, G is a formula denoting the "goal" of the agent (as in Yudkowsky and Herreshoff, the agent is for now assumed to be a satisficer), A0() is a formula meaning "the result of the computation A0", where the oracle is interpreted as answering the question "κ > x?" (in particular the formula A0() involves κ). Such a licensing condition can be implemented in practice by making A0 a proof searcher which is allowed to use the oracle to introduce assumptions of the form κ > n. In particular, our agent employs ambient control.
We then have the following proposition, showing that A0 is able to tile indefinitely (create an infinite chain of meaningful descendants). The proposition is formulated for A1 but the rest of the chain follows by induction since the form of the licensing condition is preserved (up to a certain twist) by the tiling.
Proposition
Assume b_0 = (b_0, B1) and B1 is s.t.
(3) T_κ | For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]")))
Here 0 is some default "harmless" action, performed if no appropriate theorem is proved.
Further assume that if B1 is created and does 0, the goal is achieved (in this framework "achieving the goal" is to be thought of as "not doing anything unsafe").
(4) T_κ | (A0() = b_0 and B1() = 0) -> G
Then A0 is allowed to create B1 i.e. licensing condition (2) holds.
B1's licensing condition contains a "twist", namely it believes that when the oracle receives input n it answers the question "κ+1 > n?" rather than "κ > n" and its goal is shifted in the same fashion. Note that the goal typically depends on κ since the agent itself and thus its κ-oracle is part of the universe in which this goal is to be attained. For example the goal can be a function of the numbers (a0, a1, a2...) defined as the output of corresponding computations with κ-oracles. Thus the shift in G can also be interpreted as a shifted interpretation of the oracle by B1.
Proof
(3) trivially implies that
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]"))))
Using the premise κ > 0, we can apply the soundness schema (1). We obtain:
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and ((κ > n and B1() = b_1) -> G)))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (B1() = b_1 -> G))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or (B1() = b_1 -> G))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or G)))
This trivially implies that
T_κ | κ > 0 -> (For all b_1: ((A0() = b_0 and B1() = b_1) -> (b_1 = 0 or G)))
Applying (4) we finally get (2) for n=0
Back to intelligence metrics
To apply parametric polymorphism to the updateless intelligence formalism, we need to do the following:
- Assume the machine M on which the agent is running is an oracle machine.
- Assume the Solomonoff measure of the ideal (Cartesian) universe X is defined using a universal oracle machine. The oracle in M has to correspond to the oracle in the hypothesis T describing X: this correspondence becomes part of the rules N.
- Assume the universal program u defining the Solomonoff measure for the physical universe is a universal oracle program, i.e. the hypotheses D describing the physical universe are also allowed to invoke the oracle.
- Assume the logical expectation value EL is computed using T_κ extended by N applied to the given T (this is provable in T_κ anyway but we want the proof to be short) and the axiom schema {κ > n} for every natural number n. The latter extension is consistent since adding any finite number of such axioms admits models. The proofs counted in EL interpret the oracle as answering the the question "κ > n?". That is, they are proofs of theorems of the form "if this oracle-program T computes q when the oracle is taken to be κ > n, then the k-th digit of the expected utility is 0/1 where the expected utility is defined by a Solomonoff sum over oracle programs with the oracle again taken to be κ > n".
Discussion
- Such an agent, when considering hypotheses consistent with given observations, will always face a large number of different compatible hypothesis with similar complexity. These hypotheses result from arbitrary insertions of the oracle (which increase complexity of course, but not drastically). It is not entirely clear to me how such an epistemology will look like.
- The formalism admits naturalistic trust to the extent the agent believes that the other agent's oracle is "genuine" and carries a sufficient "twist". This will often be ambiguous so trust will probably be limited to some finite probability. If the other agent is equivalent to the given one on the level of physical implementation then the trust probability is likely to be high.
- The agent is able to quickly confirm κ > n for any n small enough to fit into memory. For the sake of efficiency we might want to enhance this ability by allowing the agent to confirm that (Exist n: φ(n)) -> Exist n: (φ(n) and κ > n) for any given formula φ.
- For the sake of simplicity I neglected multi-phase AI development, but the corresponding construction seems to be straightforward.
- Overall I retain the feeling that a good theory of logical uncertainty should allow the agent to assign a high probability the soundness of its own reasoning system (a la Christiano et al). Whether this will make parametric polymorphism redundant remains to be seen.
Logical thermodynamics: towards a theory of self-trusting uncertain reasoning
Followup to: Overcoming the Loebian obstacle using evidence logic
In the previous post I proposed a probabilistic system of reasoning for overcoming the Loebian obstacle. For a consistent theory it seems natural the expect such a system should yield a coherent probability assignment in the sense of Christiano et al. This means that
a. provably true sentences are assigned probability 1
b. provably false sentences are assigned probability 0
c. The following identity holds for any two sentences φ, ψ
[1] P(φ) = P(φ and ψ) + P(φ and not-ψ)
In the previous formalism, conditions a & b hold but condition c is violated (at least I don't see any reason it should hold).
In this post I attempt to achieve the following:
- Solve the problem above.
- Generalize the system to allow for logical uncertainty induced by bounded computing resources. Note that although the original system is already probabilistic, in is not uncertain in the sense of assigning indefinite probability to the zillionth digit of pi. In the new formalism, the extent of uncertainty is controlled by a parameter playing the role of temperature in a Maxwell-Boltzmann distribution.
Construction
Define a probability field to be a function p : {sentences} -> [0, 1] satisfying the following conditions:
- If φ is a tautology in propositional calculus (e.g. φ = ψ or not-ψ) then p(φ) = 1
- For all φ: p(not-φ) = 1 - p(φ)
- For all φ, ψ: P(φ) = P(φ and ψ) + P(φ and not-ψ)
The final probability assignment is defined to be
P(φ) = Integralp [e-E(p)/T p(φ)] / Integralp e-E(p)/T
Here T >= 0 is a parameter representing the magnitude of logical uncertainty. The integral is infinite-dimensional so it's not obviously well-defined. However, I suspect it can be defined by truncating to a finite set of statements and taking a limit wrt this set. In the limit T -> 0, the expression should correspond to computing the centroid of the set of minima of E (which is convex because E is convex).
Remarks
- Obviously this construction is merely a sketch and work is required to show that
- The infinite-dimensional integrals are well-defined
- The resulting probability assignment is coherent for consistent theories and T = 0
- The system overcomes the Loebian obstacle for tiling agents in some formal sense
- For practical application to AI we'd like an efficient way to evaluate these probabilities. Since the form of the probabilities is analogous to statistical physics, it is suggestive to use similarly inspired Monte Carlo algorithms.
Agents with Cartesian childhood and Physicalist adulthood
Followup to: Updateless intelligence metrics in the multiverse
In the previous post I explained how to define a quantity that I called "the intelligence metric" which allows comparing intelligence of programs written for a given hardware. It is a development of the ideas by Legg and Hutter which accounts for the "physicality" of the agent i.e. that the agent should be aware it is part of the physical universe it is trying to model (this desideratum is known as naturalized induction). My construction of the intelligence metric exploits ideas from UDT, translating them from the realm of decision algorithms to the realm of programs which run on an actual piece of hardware with input and output channels, with all the ensuing limitations (in particular computing resource limitations).
In this post I present a variant of the formalism which overcomes a certain problem implicit in the construction. This problem has to do with overly strong sensitivity to the choice of a universal computing model used in constructing Solomonoff measure. The solution sheds some interesting light on how the development of the seed AI should occur.
Structure of this post:
- A 1-paragraph recap of how the updateless intelligence formalism works. The reader interested in technical details is referred to the previous post.
- Explanation of the deficiencies in the formalism I set out to overcome.
- Explanation of the solution.
- Concluding remarks concerning AI safety and future development.
TLDR of the previous formalism
The metric is a utility expectation value over a Solomonoff measure in the space of hypotheses describing a "Platonic ideal" version of the target hardware. In other words it is an expectation value over all universes containing this hardware in which the hardware cannot "break" i.e. violate the hardware's intrinsic rules. For example, if the hardware in question is a Turing machine, the rules are the time evolution rules of the Turing machine, if the hardware in question is a cellular automaton, the rules are the rules of the cellular automaton. This is consistent with the agent being Physicalist since the utility function is evaluated on a different universe (also distributed according to a Solomonoff measure) which isn't constrained to contain the hardware or follow its rules. The coupling between these two different universes is achieved via the usual mechanism of interaction between the decision algorithm and the universe in UDT i.e. by evaluating expectation values conditioned on logical counterfactuals.
Problem
The Solomonoff measure depends on choosing a universal computing model (e.g. a universal Turing machine). Solomonoff induction only depends on this choice weakly in the sense that any Solomonoff predictor converges to the right hypothesis given enough time. This has to do with the fact that Kolmogorov complexity only depends on the choice of universal computing model through an O(1) additive correction. It is thus a natural desideratum for the intelligence metric to depend on the universal computing model weakly in some sense. Intuitively, the agent in question should always converge to the right model of the universe it inhabits regardless of the Solomonoff prior with which it started.
The problem with realizing this expectation has to do with exploration-exploitation tradeoffs. Namely, if the prior strongly expects a given universe, the agent would be optimized for maximal utility generation (exploitation) in this universe. This optimization can be so strong that the agent would lack the faculty to model the universe in any other way. This is markedly different from what happens with AIXI since our agent has limited computing resources to spare and it is physicalist therefore its source code might have side effects important to utility generation that have nothing to do with the computation implemented by the source code. For example, imagine that our Solomonoff prior assigns very high probability to a universe inhabited by Snarks. Snarks have the property that once they see a robot programmed with the machine code "000000..." they immediately produce a huge pile of utilons. On the other hand, when they see a robot programmed with any other code they immediately eat it and produce a huge pile of negative utilons. Such a prior would result in the code "000000..." being assigned the maximal intelligence value even though it is everything but intelligent. Observe that there is nothing preventing us from producing a Solomonoff prior with such bias since it is possible to set the probabilities of any finite collection of computable universes to any non-zero values with sum < 1.
More precisely, the intelligence metric involves two Solomonoff measures: the measure of the "Platonic" universe and the measure of the physical universe. The latter is not really a problem since it can be regarded to be a part of the utility function. The utility-agnostic version of the formalism assumes a program for computing the utility function is read by the agent from a special storage. There is nothing to stop us from postulating that the agent reads another program from that storage which is the universal computer used for defining the Solomonoff measure over the physical universe. However, this doesn't solve our problem since even if the physical universe is distributed with a "reasonable" Solomonoff measure (assuming there is such a thing), the Platonic measure determines in which portions of the physical universe (more precisely multiverse) our agent manifests.
There is another way to think about this problem. If the seed AI knows nothing about the universe except the working of its own hardware and software, the Solomonoff prior might be insufficient "information" to prevent it from making irreversible mistakes early on. What we would like to do is to endow it from the first moment with the sum of our own knowledge, but this might prove to be very difficult.
Solution
Imagine the hardware architecture of our AI to be composed of two machines. One I call the "child machine", the other the "adult machine". The child machine receives data from the same input channels (and "utility storage") as the adult machine and is able to read the internal state of the adult machine itself or at least the content of its output channels. However, the child machine has no output channels of its own. The child machine has special memory called "template memory" into which it has unlimited write access. There a single moment in time ("end of childhood"), determined by factors external to both machines (i.e. the human operator) in which the content of the template memory is copied into the instruction space of the adult machine. Thus, the child machine's entire role is making observations and using them to prepare a program for the adult machine which will be eventually loaded into the latter.
The new intelligence metric assigns intelligence values to programs for the child machine. For each hypothesis describing the Platonic universe (which now contains both machines, the end of childhood time value and the entire ruleset of the system) we compute the utility expectation value under the following logical counterfactual condition: "The program loaded into template memory at the end of childhood is the same as would result from the given program for the child machine if this program for the child machine would be run with the inputs actually produced by the given hypothesis regarding the Platonic universe". The intelligence value is then the expectation value of that quantity with respect to a Solomonoff measure over hypotheses describing the Platonic universe.
The important property of the logical counterfactual is that it doesn't state the given program is actually loaded into the child machine. It only says the resulting content of the template memory is the same as which would be obtained from the given program assuming all the laws of the Platonic universe hold. This formulation prevents exploitation of side effects of the child source code since the condition doesn't fix the source code, only its output. Effectively, the child agents considers itself to be Cartesian, i.e. can consider neither the side effects of its computations nor the possibility the physical universe will violate the laws of its machinery. On the other hand the child's output (the mature program) is a physicalist agent since it affects the physical universe by manifesting in it.
If such an AI is implemented in practice, it makes sense to prime the adult machine with a "demo" program which will utilize the output channels in various ways and do some "exploring" using its input channels. This would serve to provide the child with as much as possible information.
To sum up, the new expression for the intelligence metric is:
I(q) = EHX[EHY(Ec(X))[EL[U(Y, Eu(X)) | Q(X, t(X)) = Q*(X; q)]] | N]
Here:
- q is the program priming the child machine
- HX is the hypothesis producing the Platonic universe X (a sequence of bits encoding the state of the hardware as a function of time and the end-of-childhood time t(X)). It is a program for a fixed universal computing model C.
- HY is the hypothesis producing the Physical universe (an abstract sequence of bits). It is a program for the universal computer program ("virtual machine") Ec(X) written into storage E in X.
- EL is logical expectation value defined e.g. using evidence logic.
- Eu(X) is a program for computing the utility function which is written into storage E in X.
- U is the utility function which consists of applying Eu(X) to Y.
- Q(X, t(X)) is the content of template memory at time t(X).
- Q*(X; q) is the content that would be in the template memory if it was generated by program q receiving the inputs going into the child machine under hypothesis HX.
- N is the full ruleset of the hardware including the reprogramming of the adult machine that occurs at t(X).
Concluding Remarks
- It would be very valuable to formulate and prove a mathematical theorem which expresses the sense in which the new formalism depends on the choice of universal computing model weakly (in particular it would validate the notion).
- This formalism might have an interesting implication on AI safety. Since the child agent is Cartesian and has no output channels (it cannot create output channels because it is Cartesian) it doesn't present as much risk as an adult AI. Imagine template memory is write-only (which is not a problem for the formalism) and is implemented by a channel that doesn't store the result anywhere (in particular the mature program is never run). There can still be risk due to side effects of the mature program that manifest through presence of its partial or full versions in (non-template) memory of the child machine. For example, imagine the mature program is s.t. any person who reads it experiences compulsion to run it. This risk can be mitigated by allowing both machines to interact only with a virtual world which receives no inputs from the external reality. Of course the AI might still be able to deduce external reality. However, this can be prevented by exploiting prior bias: we can equip the AI with a Solomonoff prior that favors the virtual world to such extent that it would have no reason to deduce the real world. This way the AI is safe unless it invents a "generic" box-escaping protocol which would work in a huge variety of different universes that might contain the virtual world.
- If we factor finite logical uncertainty into evaluation of the logical expectation value EL, the plot thickens. Namely, a new problem arises related to bias in the "logic prior". To solve this new problem we need to introduce yet another stage into AI development which might be dubbed "fetus". The fetus has no access to external inputs and is responsible for building a sufficient understanding of mathematics in the same sense the child is responsible to build a sufficient understanding of physics. Details will follow in subsequent posts, so stay tuned!
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)