Moloch: optimisation, "and" vs "or", information, and sacrificial ems
Go read Yvain/Scott's Meditations On Moloch. It's one of the most beautiful, disturbing, poetical look at the future that I've ever seen.
Go read it.
Don't worry, I can wait. I'm only a piece of text, my patience is infinite.
De-dum, de-dum.
You sure you've read it?
Ok, I believe you...
Really.
I hope you wouldn't deceive an innocent and trusting blog post? You wouldn't be a monster enough to abuse the trust of a being as defenceless as a constant string of ASCII symbols?
Of course not. So you'd have read that post before proceeding to the next paragraph, wouldn't you? Of course you would.
Academic Moloch
Ok, now to the point. The "Moloch" idea is very interesting, and, at the FHI, we may try to do some research in this area (naming it something more respectable/boring, of course, something like "how to avoid stable value-losing civilization attractors").
The project hasn't started yet, but a few caveats to the Moloch idea have already occurred to me. First of all, it's not obligatory for an optimisation process to trample everything we value into the mud. This is likely to happen with an AI's motivation, but it's not obligatory for an optimisation process.
One way of seeing this is the difference between "or" and "and". Take the democratic election optimisation process. It's clear, as Scott argues, that this optimises badly in some ways. It encourages appearance over substance, some types of corruption, etc... But it also optimises along some positive axes, with some clear, relatively stable differences between the parties which reflects some voters preferences, and punishment for particularly inept behaviour from leaders (I might argue that the main benefit of democracy is not the final vote between the available options, but the filtering out of many pernicious options because they'd never be politically viable). The question is whether these two strands of optimisation can be traded off against each other, or if a minimum of each is required. So can we make a campaign that is purely appearance based with any substantive position ("or": maximum on one axis is enough), or do you need a minimum of substance and a minimum of appearance to buy off different constituencies ("and": you need some achievements on all axes)? And no, I'm not interested in discussing current political examples.
Another example Scott gave was of the capitalist optimisation process, and how it in theory matches customers' and producers' interests, but could go very wrong:
Suppose the coffee plantations discover a toxic pesticide that will increase their yield but make their customers sick. But their customers don't know about the pesticide, and the government hasn't caught up to regulating it yet. Now there's a tiny uncoupling between "selling to [customers]" and "satisfying [customers'] values", and so of course [customers'] values get thrown under the bus.
This effect can be combated to some extent with extra information. If the customers (or journalists, bloggers, etc...) know about this, then the coffee plantations will suffer. "Our food is harming us!" isn't exactly a hard story to publicise. This certainly doesn't work in every case, but increased information is something that technological progress would bring, and this needs to be considered when asking whether optimisation processes will inevitably tend to a bad equilibrium as technology improves. An accurate theory of nutrition, for instance, would have great positive impact if its recommendations could be measured.
Finally, Zack Davis's poem about the em stripped of (almost all) humanity got me thinking. The end result of that process is tragic for two reasons: first, the em retains enough humanity to have curiosity, only to get killed for this. And secondly, that em once was human. If the em was entirely stripped of human desires, the situation would be less tragic. And if the em was further constructed in a process that didn't destroy any humans, this would be even more desirable. Ultimately, if the economy could be powered by entities developed non-destructively from humans, and which were clearly not conscious or suffering themselves, this would be no different that powering the economy with the non-conscious machines we use today. This might happen if certain pieces of a human-em could be extracted, copied and networked into an effective, non-conscious entity. In that scenario, humans and human-ems could be the capital owners, and the non-conscious modified ems could be the workers. The connection of this with the Moloch argument is that it shows that certain nightmare scenarios could in some circumstances be adjusted to much better outcomes, with a small amount of coordination.
The point of the post
The reason I posted this is to get people's suggestions about ideas relevant to a "Moloch" research project, and what they thought of the ideas I'd had so far.
"All natural food" as an constrained optimisation problem
A look at all natural foods through the lenses of Bayesianism, optimisation, and friendly utility functions.
How should we consider foods that claim to be "all natural"? Or, since that claim is a cheap signal, foods that have few ingredients, all of them easy to recognise and all "natural"? Or "GM free"?
From the logical point of view, the case is clear: valuing these foods is nothing more that the appeal to nature fallacy. Natural products include many pernicious things (such as tobacco, hemlock, belladonna, countless parasites, etc...). And the difference between natural and not-natural isn't obvious: synthetic vitamin C is identical to the "natural" molecule, and gene modifications are just advanced forms of selective breeding.
But we're not just logicians, we're Bayesians. So let's make a few prior assumptions:
- There are far more possible products in the universe that are bad to eat than are good.
- Products that humans have been consuming for generations are much more likely to be good to eat that than a random product.
Now let's see the food industry as optimising along a few axis:
- Cost. This should be low.
- Immediate consumer satisfaction (including taste, appearance, texture, general well-being for a week or so). This should be high.
- Long term damage to the consumer's health. This should be low.
Thoughts and problems with Eliezer's measure of optimization power
Back in the day, Eliezer proposed a method for measuring the optimization power (OP) of a system S. The idea is to get a measure of small a target the system can hit:
You can quantify this, at least in theory, supposing you have (A) the agent or optimization process's preference ordering, and (B) a measure of the space of outcomes - which, for discrete outcomes in a finite space of possibilities, could just consist of counting them - then you can quantify how small a target is being hit, within how large a greater region.
Then we count the total number of states with equal or greater rank in the preference ordering to the outcome achieved, or integrate over the measure of states with equal or greater rank. Dividing this by the total size of the space gives you the relative smallness of the target - did you hit an outcome that was one in a million? One in a trillion?
Actually, most optimization processes produce "surprises" that are exponentially more improbable than this - you'd need to try far more than a trillion random reorderings of the letters in a book, to produce a play of quality equalling or exceeding Shakespeare. So we take the log base two of the reciprocal of the improbability, and that gives us optimization power in bits.
For example, assume there were eight equally likely possible states {X0, X1, ... , X7}, and S gives them utilities {0, 1, ... , 7}. Then if S can make X6 happen, there are two states better or equal to its achievement (X6 and X7), hence it has hit a target filling 1/4 of the total space. Hence its OP is log2 4 = 2. If the best S could manage is X4, then it has only hit half the total space, and has an OP of only log2 2 = 1. Conversely, if S reached the perfect X7, 1/8 of the total space, then it would have an OP of log2 8 = 3.
How to measure optimisation power
As every school child knows, an advanced AI can be seen as an optimisation process - something that hits a very narrow target in the space of possibilities. The Less Wrong wiki entry proposes some measure of optimisation power:
One way to think mathematically about optimization, like evidence, is in information-theoretic bits. We take the base-two logarithm of the reciprocal of the probability of the result. A one-in-a-million solution (a solution so good relative to your preference ordering that it would take a million random tries to find something that good or better) can be said to have log2(1,000,000) = 19.9 bits of optimization.
This doesn't seem a fully rigorous definition - what exactly is meant by a million random tries? Also, it measures how hard it would be to come up with that solution, but not how good that solution is. An AI that comes up with a solution that is ten thousand bits more complicated to find, but that is only a tiny bit better than the human solution, is not one to fear.
Other potential measurements could be taking any of the metrics I suggested in the reduced impact post, but used in reverse: to measure large deviations from the status quo, not small ones.
Anyway, before I reinvent the coloured wheel, I just wanted to check whether there was a fully defined agreed upon measure of optimisation power.
Limits on self-optimisation
Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.
Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.
Context: FOOM, Friendliness, optimisation processes.
Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.
A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)
For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.
Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.
It seems to follow that you cannot build a fully general proper-optimisation detector.
However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)