Depth-based supercontroller objectives, take 2

sbenthall

Thanks for all the helpful comments and discussion around this post about using logical depth as an objective function for a supercontroller to preserve human existence.

As this is work in progress, I was a bit muddled and stood duly corrected on a number of points. I'm writing to submit a new, clarified proposal, with some comments directed at objections.

§1. Proposed objective function

Maximize g(u), where u is a description of the universe, h is a description of humanity (more on this later) at the time when the objective function is set, and G is defined as:

g(u) = D(u) - D(u/h)

where D(x) is logical depth and D(x/y) is relative logical depth of x and y.

§2. A note on terminology

I don't intend to annoy by saying "objective function" and "supercontroller" rather than "utility function" and "superintelligence." Rather, I am using this alternative language deliberately to scope the problem to a related question that is perhaps more well-defined or possibly easier to solve. If I understand correctly, "utility function" refers to any function, perhaps implicit, that characterizes the behavior of an agent. By "objective function", I mean a function explicitly coded as the objective of some optimization process, or "controller". I gather that a "superintelligence" is an agent that is better than a generic human at myriad tasks. I think this raises a ton of definitional issues, so instead I will talk about a "supercontroller", which is just arbitrarily good at achieving its objective.

Saying that a supercontroller is arbitrarily good at achieving an objective is tricky, since it's possible to define functions that are impossible to solve. For example, objective functions that involve incomputable functions like the Halting Problem. In general my sense is that computational complexity is overlooked within the "superintelligence" discourse, which is jarring for me since I come from a more traditional AI/machine learning background where computational complexity is at the heart of everything. I gather that it's assumed that a superintelligence will have such effectively unbounded access to computational resources due to its self-modification that complexity is not a limiting factor. It is in that spirit that I propose an incomputable objective function here. My intention is to get past the function definition problem so that work can then proceed to questions of safe approximation and implementation.

§3. Response to general objections

Apparently this community harbors a lot of skepticism towards an easy solution to the problem of giving a supercontroller an objective function that won't kill everybody or create a dystopia. If I am following the thread of argument correctly, much of this skepticism comes from Yudkowsky, for example here. The problem, he asserts, is that superintelligence that does not truly understand human morality could result in a "hyperexistential catastrophe," a fate worse than death.

Leave out just one of these values from a superintelligence, and even if you successfully include every other value, you could end up with a hyperexistential catastrophe, a fate worse than death. If there's a superintelligence that wants everything for us that we want for ourselves, except the human values relating to controlling your own life and achieving your own goals, that's one of the oldest dystopias in the book. (Jack Williamson's "With Folded Hands", in this case.)

After a long discussion of the potential dangers of a poorly written superintelligence utility function, he concludes:

In the end, the only process that reliably regenerates all the local decisions you would make given your morality, is your morality. Anything else - any attempt to substitute instrumental means for terminal ends - ends up losing purpose and requiring an infinite number of patches because the system doesn't contain the source of the instructions you're giving it. You shouldn't expect to be able to compress a human morality down to a simple utility function, any more than you should expect to compress a large computer file down to 10 bits.

The astute reader will anticipate my responses to this objection. There are two.

§3.1 The first is that we can analytically separate the problem of existential catastrophe from hyperexistential catastrophe. Assuming the supercontroller is really very super, then over all possible objective functions F, we can partition the set into those that kill all humans and those that don't. Let's call the set of humanity preserving functions E. Hyperexistentially catastrophic functions will be members of E but still undesirable. Let's hope that either supercontrollers are impossible or that there is some non-empty subset of E that is both existentially and hyperexistentially favorable. These functions don't have to be utopian. You might stub your toe now and then. They just have to be alright. Let's call this subset A.

A is a subset of E is a subset of F.

I am claiming that g is in E, and that's pretty good place to start if we are looking for something in A.

§3.2 The second response to Yudkowksy's general "source code" objection--that a function that does not contain the source of the instructions given to it will require an infinite number of patches--is that the function g does contain the source of the instructions given to it. That is what the h term is for. Hence, this is not grounds to object to this function.

This is perhaps easy to miss, because the term h has been barely defined. To the extent that it has, it is a description of humanity. To be concrete, let's imagine that it is a representation of the physical state of humanity including its biological makeup--DNA and neural architecture--as well as its cultural and technological accomplishments. Perhaps it contains the entire record of human history up until now. Who knows--we are talking about asymptotic behaviors here.

The point is--and I think you'll agree with me if you share certain basic naturalistic assumptions about ethics--that while not explicitly coding for something like "what's the culminating point of collective, coherent, extrapolated values?", this description accomplishes the more modest task of including in it, somewhere, the an encoding of those values as they are now. We might disagree about which things represent values and which represent noise or plain fact. But if we do a thorough job we'll at least make sure we've got them all.

This is a hack, perhaps. But personally I treat the problem of machine ethics with a certain amount of urgency and so am willing to accept something less than perfect.

§4. So why depth?

I am prepared to provide a mathematical treatment of the choice of g as an objective function in another post. Since I expect it gets a little hairy in the specifics, I am trying to troubleshoot it intuitively first to raise the chance that it is worth the effort. For now, I will try to do a better job of explaining the idea in prose than I did in the last post.

§4.1 Assume that change in the universe can be modeled as a computational process, or a number of interacting processes. A process is the operation of general laws of change--modeled as a kind of universal Turing Machine--that starts with some initial set of data--the program--and then operates on that data, manipulating it over discrete moments in time. For any particular program, that process may halt--outputing some data--or it may not. Of particular interest are those programs that basically encode no information directly about what their outcome is. These are the incompressible programs.

Let's look at the representation h. Given all of the incompressible programs P, only some of them will output h. Among these programs are all the incompressible programs that include h at any time stage in its total computational progression, modified with something like, "At time step t, stop here and output whatever you've got!". Let's call the set of all programs from processes that include h in their computational path H. H is a subset of P.

What logical depth does is abstract over all processes that output a string. D(h) is (roughly) the minimum amount of time, over all p in H, for p to output h.

Relative logical depth goes a step further and looks at processes that start with both some incompressible program and some other potentially much more compressible string as input. So let's look at the universe at some future point, u, and the value D(u/h).

§4.2 Just as an aside to try to get intuitions on the same page: If the D(u/h) < D(h), then something has gone very wrong, because the universe is incredibly vast and humanity is a rather small part of ti. Even if the only process that created the universe was something in the human imagination (!) this change to the universe would mean that we'd have lost something that the processes that created the human present had worked to create. This is bad news.

The intuition here is that as time goes forward, it would be good if the depth of the universe also went up. Time is computation. A supercontroller that tries to minimize depth will be trying to stop time and that would be very bizarre indeed.

§4.3 The intuition I'm trying to sell you on is that when we talk about carrying about human existence, i.e. when trying to find a function that is in E, we are concerned with the continuation of the processes that have resulted in humanity at any particular time. A description of humanity is just the particular state at a point in time of one or more computational processes which are human life. Some of these processes are the processes of human valuation and the extrapolation of those values. You might agree with me that CEV is in H.

§4.4 So consider the supercontroller's choice of two possible future timelines, Q and R. Future Q looks like taking the processes of H and removing some of the 'stop and print here' clauses, and letting them run for another couple thousand years, maybe accelerating them computationally. Future R looks like something very alien. The surface of earth is covered in geometric crystal formations that maximize the solar-powered production of grey goo, which is spreading throughout the galaxy at a fast rate. The difference is that the supercontroller did something different in the two timelines.

We can, for either of these timelines, pick a particular logical depth, say c, and slice the timelines at points q and r respectively such that D(q) = D(r) = c.

Recall our objective function is to maximize g(u) = D(u) - D(u/h).

Which will be higher, g(q) or g(r)?

The D(u) term is the same for each. So we are interested in maximizing the value of - D(u/h), which is the same as minimizing D(u/h)--the depth relative to humanity.

By assumption, the state of the universe at r has overwritten all the work done by the processes of human life. Culture, thought, human DNA, human values, etc. have been stripped to their functional carbon and hydrogen atoms and everything now just optimizes for paperclip manufacturing or whatever. D(u/r) = D(u). Indeed anywhere along timeline R where the supercontroller has decided to optimize for computational power at the expense of existing human processes, g(r) is going to be dropping closer to zero.

Compare with D(q/h). Since q is deep, we know some processes have been continuing to run. By assumption, the processes that have been running in Q are the same ones that have resulted in present-day humanity, only continued. The minimum time way to get to q will be to pick up those processes where they left off and continue to compute them. Hence, q will be shallow relative to h. D(q/h) will be significantly lower than D(q) and so be favored by objective function g.

§5 But why optimize?

You may object: if the function depends on depth measure D which only depends on the process that produces h and q with minimal computation, maybe this will select for something inessential about humanity and mess things up. Depending on how you want to slice it, this function may fall outside of the existentially preserving set E let alone the hyperexistentially acceptable set A. Or suppose you are really interested only in the continuation of a very specific process, such as coherent extrapolated volition (here, CEV).

To this I propose a variation on the depth measure, D*, which I believe was also proposed by Bennett (though I have to look that up to be sure.) Rather than taking the minimum computational time required to produce some representation, D* is a weighted average over the computational time is takes to produce the string. The weights can reflect something like the Kolmogorov complexity of the initial programs/processes. You can think of this as an analog of Solomonoff induction, but through time instead of space.

Consider the supercontroller that optimizes for g*(u) = D*(u) - D*(u/h).

Suppose your favorite ethical process, such as CEV, is in H. h encodes for some amount of computational progress on the path towards completed CEV. By the same reasoning as above, future universes that continue from h on the computational path of CEV will be favored, albeit only marginally, over futures that are insensitive to CEV.

This is perhaps not enough consolation to those very invested in CEV, but it is something. The processes of humanity continue to exist, CEV among them. I maintain that this is pretty good. I.e. that g* is in A.

The physicists' definition of entropy is misaligned with the intuitive definition, because it is affected massively more by micro-scale things like temperature and by mixing, than by macro-scale things like objects and people. This tends to trip people up, when they try to take it out of chemistry and physicists and bring it anywhere else.

When I look at your function g(u), I notice that it has a very similar flavor. While I'm having a very hard time interpreting what it will actually end up optimizing, my intuition is that it will end up dominated by some irrelevant micro-scale property like temperature. That's the outside view.

On the inside view, I see a few more reasons to worry. This universe's physics is very computationally inefficient, so the shortest computational path from any state A to some other state B will almost certainly bypass it somehow. Furthermore, human brains are very computationally inefficient, so I would expect the shortest computational path to bypass them, too. I don't know what that computational shortcut might be, but I wouldn't expect to like it.

Exploring the properties of logical depth might be interesting, but I don't expect a good utility function out of it.

Your point about physical entropy is noted and a good one.

One reason to think that something like D(u/h) would pick out higher level features of reality is that h encodes those higher-level features. It may be possible to run a simulation of humanity on more efficient physical architecture. But unless that simulation is very close to what we've already got, it won't be selected by g.

You make an interesting point about the inefficiency of physics. I'm not sure what you mean by that exactly, and am not in a position of expertise to say otherwise. However, I think there is a way to get around this problem. Like Kolmogorov complexity, depth has another hidden term in it, the specification of the universal Turing machine that is used, concretely, to measure the depth and size of strings. By defining depth in terms of a universal machine that is a physics simulator, then there wouldn't be a way to "bypass" physics computationally. That would entail being able to build a computer, which our physics, that would be more efficient than our physics. Tell me if that's not impossible.

Re: brains, I'm suggesting that we encode whatever we think is important about brains in the h term. If brains execute a computational process, then that process will be preserved somehow. It need not be preserve on grey matter exactly. Those brains could be uploaded onto more efficient architecture.

I appreciate your intuitions on this but this function is designed rather specifically to challenge them.

I finally figured out what this does.

It takes h, applies an iterated hashing/key-stretching style function to it, and tiles the universe with the result.

Sorry.

Yeah, something like that. "Make the state of the universe such that it's much easier to compute knowing h than without h" doesn't mean that the computation will use any interesting features of h, it could just be key-stretching.

Could you flesh this out? I'm not familiar with key-stretching.

A pretty critical point is whether or not the hashed value is algorithmically random. The depth measure has the advantage of picking over all permissible starting conditions without having to run through each one. So it's not exactly analogous to a brute force attack. So for the moment I'm not convinced on this argument.

Maybe. Can you provide an argument for that?

As stated, that wouldn't maximize g, since applying the hash function once and tiling would cap the universe at finite depth. Tiling doesn't make any sense.

I don't think it's literally tiling. More hash stretching all the way.

I had a hazy sense of that direction of thing being the most likely actual result. Thanks for putting your finger on it for me.

Note: I may have badly misunderstood this, as I am not familiar with the notion of logical depth. Sorry if I have!

I found this post's arguments to be much more comprehensible than your previous ones; thanks so much for taking the time to rewrite them. With that said, I see three problems:

1) '-D(u/h)' optimizes for human understanding of (or, more precisely, human information of) the universe, such that given humans you can efficiently get out a description of the rest of the universe. This also ensures that whatever h is defined as continues to exist. But many (indeed, even almost all) humans values aren't about entanglement with the universe. Because h isn't defined explicitly, it's tough for me to state a concrete scenario where this goes wrong. (This isn't a criticism of the definition of h, I agree with your decision not to try to tightly specify it.) But, e.g. it's easy to imagine that humans having any degree of freedom would be inefficient, so people would end drug-addled, in pods, with videos and audio playing continuously to put lots of carefully selected information into the humans. This strikes me as a poor outcome.

2) Some people (e.g. David Pearce (?) or MTGandP) argue that the best possible outcome is essentially tiled- that rather than have large and complicated beings human-scale or larger, it would be better to have huge numbers of micro-scale happy beings. I disagree, but I'm not absolutely certain, and I don't think we can rule out this scenario without explicitly or implicitly engaging with it.

3) As I understand it, in 3.1 you state that you aren't claiming that g is an optimal objective function, just that it leaves humans alive. But in this case 'h', which was not ever explicitly defined, is doing almost all of the work: g is guaranteed to preserve 'h', which you verbally identified with the physical state of humanity. But because you haven't offered a completely precise definition of humanity here, what the function as described above would preserve is 'a representation of the physical state of humanity including its biological makeup--DNA and neural architecture--as well as its cultural and technological accomplishments'. This doesn't strike me as a significant improvement from simply directly programming in that humans should survive, for whatever definition of humans/humanity selected; while it leaves the supercontroller with different incentives, in neither scenario are said incentives aligned with human morality.

(My intuition regarding g* is even less reliable than my intuition regarding g; but I think all 3 points above still apply.)

Thanks for your thoughtful response. I'm glad that I've been more comprehensible this time. Let me see if I can address the problems you raise:

1) Point taken that human freedom is important. In the background of my argument is a theory that human freedom has to do with the endogeneity of our own computational process. So, my intuitions about the role of efficiency and freedom are different from yours. One way of describing what I'm doing is trying to come up with a function that a supercontroller would use if it were to try to maximize human freedom. The idea is that choices humans make are some of the most computationally complex things they do, and so the representations created by choices are deeper than others. I realize now I haven't said any of that explicitly let alone argued for it. Perhaps that's something I should try to bring up in another post.

2) I also disagree with the morality of this outcome. But I suppose that would be taken as beside the point. Let me see if I understand the argument correctly: if the most ethical outcome is in fact something very simple or low-depth, then this supercontroller wouldn't be able to hit that mark? I think this is a problem whenever morality (CEV, say) is a process that halts.

I wonder if there is a way to modify what I've proposed to select for moral processes as opposed to other generic computational processes.

3) A couple responses:

Oh, if you can just program in "keep humanity alive" then that's pretty simple and maybe this whole derivation is unnecessary. But I'm concerned about the feasibility of formally specifying what is essential about humanity. VAuroch has commented that he thinks that coming up with the specification is the hard part. I'm trying to defer the problem to a simpler one of just describing everything we can think of that might be relevant. So, it's meant to be an improvement over programming in "keep humanity alive" in terms of its feasibility, since it doesn't require solving perhaps impossible problems of understanding human essence.
Is it the consensus of this community that finding an objective function in E is an easy problem? I got the sense from Bostrom's book talk that existential catastrophe was on the table as a real possibility.

I encourage you to read the original Bennett paper if this interests you. I think your intuitions are on point and appreciate your feedback.

Thanks for your response!

1) Hmmm. OK, this is pretty counter-intuitive to me.

2) I'm not totally sure what you mean here. But, to give a concrete example, suppose that the most moral thing to do would be to tile the universe with very happy kittens (or something). CEV, as I understand, would create as many of these as possible, with its finite resources; whereas g/g* would try to create much more complicated structures than kittens.

3) Sorry, I don't think I was very clear. To clarify: once you've specified h, a superset of human essence, why would you apply the particular functions g/g to h? Why not just directly program in 'do not let h cease to exist'? g/g do get around the problem of specifying 'cease to exist', but this seems pretty insignificant compared to the difficulty of specifying h. And unlike with programming a supercontroller to preserve an entire superset of human essence, g/g* might wind up with the supercontroller focused on some parts of h that are not part of the human essence- so it doesn't completely solve the definition of 'cease to exist'.

(You said above that h is an improvement because it is a superset of human essence. But we can equally program a supercontroller not to let a superset of human essence cease to exist, once we've specified said superset.)

I enjoyed both this and the previous post. Not the usual computational fare around here, and it's fun to play with new frameworks. I upvoted particularly for incorporating feedback and engaging with objections.

I have a couple of ways in which I'd like to challenge your ideas.

If I'm not mistaken, there are two routes to take in maximizing g. Either you can minimize D(u/h), or you can just drive D(u) through the roof and not damage h too badly. Intuitively, the latter seems to give you a better payoff per joule invested. Let's say that our supercontroller grabs a population of humans, puts them in stasis pods of some kind, and then goes about maximizing entropy by superheating the moon. This is a machine that has done a pretty good job of increasing g(u). As long as the supercontroller is careful to keep D(u/h) from approaching D(u), it can easily ignore that term without negotiating the complexity of human civilzation or even human consciousness. That said, I clearly don't understand relative logical depth very well- so maybe D(u/h) approaches D(u), in the case that D(u) increases as h is held constant?

Another very crucial step here is in the definition of humanity, and which processes count as human ones. I'm going to assume that everyone here is a member in good standing of Team Reductionism, so this is not a trivial task. It is called trans humanism, after all, and you are more than willing to abstract away from the fleshy bits when you define 'human'. So what do you keep? It seems plausible, even likely, that we will not be able to define 'humanity' with a precision that satisfies our intuitions until we already have the capacity to create a supercontroller. In this sense your suggestion is hiding the problem it attempts to solve- that is, how to define our values with sufficient rigor that our machines can understand them.

Thanks for your encouraging comments. They are much appreciated! I was concerned that following the last post with an improvement on it would be seen as redundant, so I'm glad that this process has your approval.

Regarding your first point:

Entropy is not depth. If you do something that increases entropy, then you actually reduce depth, because it is easier to get to what you have from an incompressible starting representation. In particular, the incompressible representation that matches the high-entropy representation you have created. So if you hold humanity steady and superheat the moon, you more or less just keep things at D(u) = D(h), with low D(u/h).
You can do better if you freeze humanity and then create fractal grey goo, which is still in the spirit of your objection. Then you have high D(u), D(u/h) is something like D(u) - D(h) except for when the fractal starts to reproduce human patterns out of the sheer vigor of its complexity, in which case I guess D(u/h) would begin to drop...though I'm not sure. This may require a more thorough look at the mathematics. What do you think?

Regarding your second point...

Strictly speaking, I'm not requiring that h abstract away the fleshy bits and capture what is essentially human or transhuman. I am trying to make the objective function agnostic to these questions. Rather, h can include fleshy bits and all. What's important is that it includes at least what is valuable, and that can be done by including anything that might be valuable. The needle in the haystack can be discovered later, if it's there at all. Personally, I'm not a transhumanist. I'm an existentialist; I believe our existence precedes our essence.

That said I think this is a clever point with substance to it. I am, in fact, trying to shift our problem-solving attention to other problems. However, I am trying to turn attention to more tractable and practical questions.

One simple one is: how can we make better libraries for capturing human existence, so that a supercontroller could make use of as much data as possible as it proceeds?

Another is: given that the proposed objective function is in fact impossible to compute, but (if the argument is ultimately successful) also given that it points in the right direction, what kinds of processes/architectures/algorithms would approximate a g-maximizing supercontroller? Since we have time to steer in the right direction now, how should we go about it?

My real agenda is that I think that there are a lot of pressing practical questions regarding machine intelligence and its role in the world, and that the "superintelligence" problem is a distraction except that it can provide clearer guidelines of how we should be acting now.

Culture, thought, human DNA, human values, etc. have been stripped to their functional carbon and hydrogen atoms and everything now just optimizes for paperclip manufacturing or whatever. D(u/r) = D(u)

I contest this derivation. Whatever process produced humanity, made so that humanity produced an unsafe supercontroller. This may means that whatever the supercontroller is optimized for, it's part of the process that produced humanity, and so it does not make g(u,h) go to zero.

Of course, without a concrete model, it's impossible to say for certain.

So, the key issue is whether or not the representations produced by the paperclip optimizer could have been produced by other processes. If there is another process that produces the paperclip-optimized representations more efficiently than going through the process of humanity, then that process dominates the calculation of D(r).

In other words, for this objection to make sense, it's not enough for the humanity to have been sufficient for the R scenario. It must be necessary for producing R, or at least necessary to result in it in the most efficient possible way.

What are your criteria for a more concrete model than what has been provided?

Addressing part of the assumptions: While its assumed that a superintelligence has access to Enough Resources, or at least enough to construct more for itself and thus proceed rapidly toward a state of Enough Resources, the programmers of the superintelligence do not. This is very important when you consider that h needs to be present as input to the superintelligence before it can take action. So the programmers must provide something that compresses to h at startup. And that's a very difficult problem; if we could correctly determine what-all was needed for a full specification of humanity, we'd be a substantial way toward solving the complexity of value problem. So even if this argument works (and I don't think I trust it), it still wouldn't deal with the problem adequately.

I see, that's interesting. So you are saying that while the problem as scoped in §2 may take a function of arbitrary complexity, there is a constraint in the superintelligence problem I have missed, which is that the complexity of the objective function has certain computational limits.

I think this is only as extreme a problem as you say in a hard takeoff situation. In a slower takeoff situation, inaccuracies due to missing information could be corrected on-line as computational capacity grows. This is roughly business-as-usual for humanity---powerful entities direct the world according to their current best theories; these are sometimes corrected.

It's interesting that you are arguing that if we knew what information to include in a full specification of humanity, we'd be making substantial progress towards the value problem. In §3.2 I argued that the value problem need only be solved with a subset of the full specification of humanity. The fullness of that specification was desirable just because it makes it less likely that we'll be missing the parts that are important to value.

If, on the other hand, that you are right and the full specification of humanity is important to solving the value problem--something I'm secretly very sympathetic to--then

(a) we need a supercomputer capable of processing the full specification in order to solve the value problem, so unless there is an iterative solution here the problem is futile and we should just accept that The End Is Nigh, or else try, as I've done, to get something Close Enough and hope for slow takeoff, and

(b) the solution to the value problem is going to be somewhere done the computational path from h and is exactly the sort of thing that would be covered in the scope of g*.

It would be a very nice result, I think, if the indirect normativity problem or CEV or whatever could be expressed in terms of the the depth of computational paths from the present state of humanity for precisely this reason. I don't think I've hit that yet exactly but it's roughly what I'm going for. I think it may hinge on whether the solution to the value problem is something that involves a halting process, or whether really it's just to ensure the continuation of human life (i.e. as a computational process). In the latter case, I think the solution is very close to what I've been proposing.

While I would agree that not all portions of h are needed to solve the value problem, I think it's very plausible that it would take all of h to be certain that you'd solved the value problem. As in, you couldn't know that you had included everything important unless you knew that you had everything unimportant as well.

Also, I don't think I'm sympathetic to the idea that a slow takeoff buys you time to correct things. How would you check for inaccuracies? You don't have a less-flawed version to compare things to; if you did, you'd be using that version. Some inaccuracies will be large and obvious, but that's rarely, if ever, going to catch the kinds of errors that lead to hyperexistential catastrophe, and will miss many existential catastrophes.

Close Enough and hope for slow takeoff, and

On the one hand "close enough" is adequate for horseshoes, but probably not good enough for THE FATE OF THE UNIVERSE (grabs algebra nerd by lapels and shakes vigorously)

On the other hand, supergeniuses like Ben Goertzel have suggested that a takeoff might follow a "semi-hard" trajectory. While others have suggested "firm takeoff" (Voss), and even "tumescent takeoff"

Like most of humanity, I'll start getting concerned when the computers finally beat us in chess. (...off camera whispering)

On a more serious note, the superhuman AI that polices this site just had a most unwelcome message for me: You are trying to eject messages too fast. Rehydrate and try again in 3 minutes. The machines! ...they're takin' over! They're already layin' down the law!

I'm not sure we can "algebra" our way out of this dilemma. I think that we need to sit up and take notice that "liberal democracy" ("libertarian democracy," since I'm using the term "liberal" like Hayek did, and not like the conscious movement to hijack the term) dramatically outperforms state collectivist totalitarianism. During the time period when the USA was less totalitarian and had more of the features of "liberal democracy" than what we currently do, or did at the country's beginning, we performed better (more stated happiness, more equality under the law, more wealth generation, more immigrants revealing a preference for living here, etc.).

So why mention governments in a discussion of intelligent control? Because governments are how we choose to govern (or destroy), at the largest human scale. As such, they represent the best large-scale systems humans can set up, in accordance with human nature.

So, superhuman synthetic intelligence should build upon that. How? Well, we should make superhumanly-intelligent "classical liberals" that are fully equipped with mirror neurons. There should be many of them, and they should be designed to (1) protect their lives using the minimum force necessary to do so (2) argue about what course of action is best once their own lives have been preserved. If they possess mirror neurons and exposure to such thinkers as Hayek, it won't be hard to prevent them from destroying the world and all humans ---they will have a natural predisposition toward protecting and expanding life and luxury.

The true danger is that thinkers from LessWrong mistakenly believe they have designed reasonably intelligent FAI, and they build only ONE.

Lack of competition consolidates power, and with it, tendencies toward corruption.

I don't know if Bayes and Algebra mastery can teach a human being this lesson. Perhaps one needs to read "Lord of the Rings" or something similar, and perhaps algebra masters need to read something that causes all other variables to be multiplied by decimal percentages below the teens, and the "absolute power corrupts absolutely" variable needs to be ranked high and multiplied by 1.

There is wisdom in crowds (of empaths, using language alone). That said: Humans developed as societies with statistical distributions of majority empath conformists to minority "pure" sociopaths. Technology changes that, and rather suddenly. Dissenters can be found out and eliminated or discredited. Co-conspirators can be given power and prestige. Offices can be protected, and conformists can be catered to. Critics can be bought off.

It's a big world, with a lot of scary things that never get mentioned on LessWrong. My feeling is: there is no "one size fits all" smartest being.

Every John Galt you create is likely to be very imperfect in some way. No matter how general his knowledge. Even with the figures Kurzweil uses, he could be a smart Randian objectivist spacecraft designer, or a smart Hayekian liberal gardener, and even with all of human knowledge at its fingertips that wouldn't account for character, preference, or what details the synthetic mind chose to master. It might master spacecraft building, but spend all its time designing newer and more complex gardens, restaurants, and meals.

Emergence is messy. Thing clusters are messy. ...And hierarchical.

A superintelligence will likely derive its highest values the way we do: by similar "goal networks" in the same general direction "outvoting" one another (or, "hookers and blow" may intervene as a system-crashing external stimulus).

In any case, I'd rather have several such brains, rather than only one.

Thanks for your thoughtful response. I'm glad that I've been more comprehensible this time. Let me see if I can address the problems you raise:

I wonder if there is a way to modify what I've proposed to select for moral processes as opposed to other generic computational processes.

3) A couple responses:

Oh, if you can just program in "keep humanity alive" then that's pretty simple and maybe this whole derivation is unnecessary. But I'm concerned about the feasibility of formally specifying what is essential about humanity. VAuroch has commented that he thinks that coming up with the specification is the hard part. I'm trying to defer the problem to a simpler one of just describing everything we can think of that might be relevant. So, it's meant to be an improvement over programming in "keep humanity alive" in terms of its feasibility, since it doesn't require solving perhaps impossible problems of understanding human essence.
Is it the consensus of this community that finding an objective function in E is an easy problem? I got the sense from Bostrom's book talk that existential catastrophe was on the table as a real possibility.

I encourage you to read the original Bennett paper if this interests you. I think your intuitions are on point and appreciate your feedback.

3Randaly12y

Thanks for your response! 1) Hmmm. OK, this is pretty counter-intuitive to me. 2) I'm not totally sure what you mean here. But, to give a concrete example, suppose that the most moral thing to do would be to tile the universe with very happy kittens (or something). CEV, as I understand, would create as many of these as possible, with its finite resources; whereas g/g* would try to create much more complicated structures than kittens. 3) Sorry, I don't think I was very clear. To clarify: once you've specified h, a superset of human essence, why would you apply the particular functions g/g to h? Why not just directly program in 'do not let h cease to exist'? g/g do get around the problem of specifying 'cease to exist', but this seems pretty insignificant compared to the difficulty of specifying h. And unlike with programming a supercontroller to preserve an entire superset of human essence, g/g* might wind up with the supercontroller focused on some parts of h that are not part of the human essence- so it doesn't completely solve the definition of 'cease to exist'. (You said above that h is an improvement because it is a superset of human essence. But we can equally program a supercontroller not to let a superset of human essence cease to exist, once we've specified said superset.)