Proposal: Use logical depth relative to human history as objective function for superintelligence

sbenthall

I attended Nick Bostrom's talk at UC Berkeley last Friday and got intrigued by these problems again. I wanted to pitch an idea here, with the question: Have any of you seen work along these lines before? Can you recommend any papers or posts? Are you interested in collaborating on this angle in further depth?

The problem I'm thinking about (surely naively, relative to y'all) is: What would you want to program an omnipotent machine to optimize?

For the sake of avoiding some baggage, I'm not going to assume this machine is "superintelligent" or an AGI. Rather, I'm going to call it a supercontroller, just something omnipotently effective at optimizing some function of what it perceives in its environment.

As has been noted in other arguments, a supercontroller that optimizes the number of paperclips in the universe would be a disaster. Maybe any supercontroller that was insensitive to human values would be a disaster. What constitutes a disaster? An end of human history. If we're all killed and our memories wiped out to make more efficient paperclip-making machines, then it's as if we never existed. That is existential risk.

The challenge is: how can one formulate an abstract objective function that would preserve human history and its evolving continuity?

I'd like to propose an answer that depends on the notion of logical depth as proposed by C.H. Bennett and outlined in section 7.7 of Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications which I'm sure many of you have handy. Logical depth is a super fascinating complexity measure that Li and Vitanyi summarize thusly:

Logical depth is the necessary number of steps in the deductive or causal path connecting an object with its plausible origin. Formally, it is the time required by a universal computer to compute the object from its compressed original description.

The mathematics is fascinating and better read in the original Bennett paper than here. Suffice it presently to summarize some of its interesting properties, for the sake of intuition.

"Plausible origins" here are incompressible, i.e. algorithmically random.
As a first pass, the depth D(x) of a string x is the least amount of time it takes to output the string from an incompressible program.
There's a free parameter that has to do with precision that I won't get into here.
Both a string of length n that is comprised entirely of 1's, and a string of length n of independent random bits are both shallow. The first is shallow because it can be produced by a constant-sized program in time n. The second is shallow because there exists an incompressible program that is the output string plus a constant sized print function that produces the output in time n.
An example of a deeper string is the string of length n that for each digit i encodes the answer to the ith enumerated satisfiability problem. Very deep strings can involve diagonalization.
Like Kolmogorov complexity, there is an absolute and a relative version. Let D(x/w) be the least time it takes to output x from a program that is incompressible relative to w,

That's logical depth. Here is the conceptual leap to history-preserving objective functions. Suppose you have a digital representation of all of human society at some time step t, calling this h_t. And suppose you have some representation of the future state of the universe u that you want to build an objective function around. What's important, I posit, is the preservation of the logical depth of human history in its computational continuation in the future.

We have a tension between two values. First, we want there to be an interesting, evolving future. We would perhaps like to optimize D(u).

However, we want that future to be our future. If the supercontroller maximizes logical depth by chopping all the humans up and turning them into better computers and erasing everything we've accomplished as a species, that would be sad. However, if the supercontroller takes human history as an input and then expands on it, that's much better. D(u/h_t) is the logical depth of the universe as computed by a machine that takes human history at time slice t as input.

Working on intuitions here--and your mileage may vary, so bear with me--I think we are interested in deep futures and especially those futures that are deep with respect to human progress so far. As a conjecture, I submit that those will be futures most shaped by human will.

So, here's my proposed objective for the supercontroller, as a function of the state of the universe. The objective is to maximize:

f(u) = D(u/h_t) / D(u)

I've been rather fast and loose here and expect there to be serious problems with this formulation. I invite your feedback! I'd like to conclude by noting some properties of this function:

It can be updated with observed progress in human history at time t' by replacing h_twithh_t'. You could imagine generalizing this to something that dynamically updated in real time.
This is a quite conservative function, in that it severely punishes computation that does not depend on human history for its input. It is so conservative that it might result in, just to throw it out there, unnecessary militancy against extra-terrestrial life.
There are lots of devils in the details. The precision parameter I glossed over. The problem of representing human history and the state of the universe. The incomputability of logical depth (of course it's incomputable!). My purpose here is to contribute to the formal framework for modeling these kinds of problems. The difficult work, like in most machine learning problems, becomes feature representation, sensing, and efficient convergence on the objective.

Thank you for your interest.

Sebastian Benthall

PhD Candidate

UC Berkeley School of Information

The problem I'm thinking about (surely naively, relative to y'all) is: What would you want to program an omnipotent machine to optimize?

The challenge is: how can one formulate an abstract objective function that would preserve human history and its evolving continuity?

Logical depth is the necessary number of steps in the deductive or causal path connecting an object with its plausible origin. Formally, it is the time required by a universal computer to compute the object from its compressed original description.

The mathematics is fascinating and better read in the original Bennett paper than here. Suffice it presently to summarize some of its interesting properties, for the sake of intuition.

"Plausible origins" here are incompressible, i.e. algorithmically random.
As a first pass, the depth D(x) of a string x is the least amount of time it takes to output the string from an incompressible program.
There's a free parameter that has to do with precision that I won't get into here.
Both a string of length n that is comprised entirely of 1's, and a string of length n of independent random bits are both shallow. The first is shallow because it can be produced by a constant-sized program in time n. The second is shallow because there exists an incompressible program that is the output string plus a constant sized print function that produces the output in time n.
An example of a deeper string is the string of length n that for each digit i encodes the answer to the ith enumerated satisfiability problem. Very deep strings can involve diagonalization.
Like Kolmogorov complexity, there is an absolute and a relative version. Let D(x/w) be the least time it takes to output x from a program that is incompressible relative to w,

We have a tension between two values. First, we want there to be an interesting, evolving future. We would perhaps like to optimize D(u).

So, here's my proposed objective for the supercontroller, as a function of the state of the universe. The objective is to maximize:

f(u) = D(u/h_t) / D(u)

I've been rather fast and loose here and expect there to be serious problems with this formulation. I invite your feedback! I'd like to conclude by noting some properties of this function:

It can be updated with observed progress in human history at time t' by replacing h_twithh_t'. You could imagine generalizing this to something that dynamically updated in real time.
This is a quite conservative function, in that it severely punishes computation that does not depend on human history for its input. It is so conservative that it might result in, just to throw it out there, unnecessary militancy against extra-terrestrial life.
There are lots of devils in the details. The precision parameter I glossed over. The problem of representing human history and the state of the universe. The incomputability of logical depth (of course it's incomputable!). My purpose here is to contribute to the formal framework for modeling these kinds of problems. The difficult work, like in most machine learning problems, becomes feature representation, sensing, and efficient convergence on the objective.

Thank you for your interest.

Sebastian Benthall

PhD Candidate

UC Berkeley School of Information

Re: your first point:

As I see it, there are two separate problems. One is preventing catastrophic destruction of humanity (Problem 1). The other is creating utopia (Problem 2). Objective functions that are satisficing with respect to Problem 1 may not be solutions to Problem 2. While as I read it the Yudkowsky post you linked to argues for prioritizing Problem 2, on the contrary my sense of the thrust of Bostrom's argument is that it's critical to solve Problem 1. Maybe you can tell me if I've misunderstood.

Without implicating human values, I'm claiming that the function f(u) = D(u/ht) / D(u) satisfies Problem 1 (the existential problem). I'm just going to refer to that function as f now.

You seem have conceded this point. Maybe I've misinterpreted you.

As for solving Problem 2, I think we'd agree that any solutions to the utopia problem will also be solutions to the existence problem (Problem 1). The nice thing about f is that its range is (0,1), so it's easy to compose it with other functions that could weight it more towards a solution to Problem 2.

Re: your second point:

I'm not sure if I entirely follow what you're say here, so I'm having a hard time understanding exactly the point of disagreement.

Is the point you're making about the unpredictability of the outcome of optimizing for f? Because the abstract patterns favored by f will look like noise relative to physics?

I think there are a couple elaborations worth making.

First, like Kolmogorov complexity, logical depth depends on a universal computer specification. I gather that you are assuming that the universal computer in question is something that simulated fundamental physics. This need not be the case. Depth is computed as the least running time of incompressible programs on the universal computer.

Suppose we were to try to evolve through a computational process a program that outputs a string that represented the ultimate, flourishing potential of humanity. One way to get that is to run the Earth as a physical process for a period of time and get a description of it at the end, selecting only those timelines in its stochastic unfolding in which life on Earth successfully computes itself indefinitely.

If you stop somewhere along the way, like timestep t, then you are going to get a representation that encodes some of the progress towards that teleological end.

(I think there's a rough conceptual analogy to continuations in functional programming here, if that helps)

An important property of logical depth is the Slow Growth Law. This is proved by Bennett. It says that deep objects cannot be produced quickly from shallow ones. Incompressible programs being the shallowest strings of all. It's not exactly that depth stacks additively, but I'm going to pretend it does for the intuitive argument here (which may be wrong):

If you have the depth of human progress D(h) and the depth of the universe at some future time D(u), then always D(u/h) < D(u) assuming h is deep at all and the computational products of humanity exist at all. But...

ah, I think I've messed up the formula. Let's see... let's have h' be a human slice taken after the time of h.

D(u) > D(u/h') > D(u/h) > D(h) assuming humanity's computational process continues. The more that h' encodes the total computational progress of u, i.e., the higher D(u/h') is relative to D(u)...

Ok, I think I need to modify the formula some. Here's function g:

g(u) = (D(h) + D(u/h)) / D(u)

Does maximizing this function produce better results? Or have I missed your point?

General response: I think you should revise the chances of this working way downwards until you have some sort of toy model where you can actually prove, completely, with no "obvious" assumptions necessary, that this will preserve values or at least the existence of an agent in a world. But I think enough has been said about this already.

Specific response:

Is the point you're making about the unpredictability of the outcome of optimizing for f? Because the abstract patterns favored by f will look like noise relative to physics?

"Looks l... (read more)

10

Proposal: Use logical depth relative to human history as objective function for superintelligence

10

10

10

Proposal: Use logical depth relative to human history as objective function for superintelligence

10

10