How should negative externalities be handled? (Warning: politics)
Politics ahead! Read at your own risk, mind killers, etc. Let all caveats be well and thoroughly emptored.
It seems reasonably clear to me that, from a computational perspective, functional central planning is not practically possible. Resource allocation among many agents looks an awful lot like an exponential time problem, and the world market is quite an efficient approximation. In the real world, markets, regulated to preclude blackmail, theft, and slavery, will tend to provide a better approximation of "correct" resource allocation between free agents than a central resource allocation algorithm could plausibly achieve without a tremendous, invasive amount of information about the desires of every market participant, and quite a lot of computing power (within a few orders of magnitude of the combined computational budget of the human species).
It would be naive to say that we'd need exactly the computational power of the human species in order to achieve it: we can imagine how we might optimize the resource allocation scheme by quite a lot. Populations are (at least somewhat) compressible, in that there are a number of groups of individual people who optimize for similar things, allowing you to save on simulating all of them. Additionally, a decent chunk of human neurological and intellectual activity is not dedicated to economic optimization of any kind, which saves you some computing time there as well. And, of course, humans are not rational, and the homunculi representing them in the optimized market simulation could be, giving them substantially more bang for their cognitive buck - we can imagine, for instance, that this market simulation would not sink billions of dollars into lotteries each year! It may also be that the behavior of the market itself, on some level, is lawful, and a sufficiently intelligent agent could find general-case solutions that are less expensive than market simulation.
Still, though, the amount of information and raw processing power needed to pull off central planning competitive with the market approximation seems to be out of our reach for the time being. As a result of this, and a few other factors, my own politics tend to lean Libertarian / minarchist, and I'm aware that there is some of this sentiment in circulation on this site, though generally not explicitly. I'm trying to refine my beliefs surrounding some of the sticky issues in Libertarian philosophy (mostly related to children and extreme policy cases), and I thought I'd ask LW what they thought about one issue in particular.
I have been wondering whether or not there are any interventions in the economy that can have a positive expected benefit. I honestly don't know if this is the case: put another way, the question is really asking if there are any characteristic behaviors of markets that are undesirable in some sense, and can be corrected by the application of an external law. Furthermore, such things cannot be profitable to correct for any participant or plausibly-sized collection of participants in the market, but must be good for the market as a whole, or must be something that requires regulatory power to fix.
An obvious example of this sort of thing is the tragedy of the commons and negative externalities. The most pressing case study would be climate change: the science suggests, fairly firmly, that human CO2 emissions are causing long-term shifts in global climate. How disastrous these shifts will actually be is less well settled, but there is at least a reasonable probability that it will be fairly unpleasant, in the long term. Personally, I feel that we are likely to run into much bigger problems much sooner than the 50-200 year timescales these disasters seem to expected on. However, were this not the case, I find that I'm not quite sure how my ideal government, run by a few thousand much smarter and better informed copies of me, ought to respond to the issue. I don't know what I think the ideal policy for dealing with these sorts of externalities is, and I thought I'd ask for LessWrong's thoughts on the matter.
In my own mind, I think that as light a touch as possible is probably desirable. Law is a very blunt instrument, and crude legislation like a carbon tax could easily have its own serious negative implications (driving industry to countries that simply don't care about CO2 emissions, for example). However, actions like subsidizing and partially deregulating nuclear power plants could help a lot by making coal-fired power plants noncompetitive. We could also declare a policy of slowly withdrawing any government involvement in overseas oil acquisition, which would drive up the price of petroleum products and make electric cars a more appealing alternative. However, I don't know if there would be horrifying consequences to any of these actions: this is the underlying problem - I am not as smart as the market, and guessing its moods is not something that I, or any human is going to be very good at. However, it seems clear that some intervention is necessary in this sort of case. Rock, hard place, you are here.
Thoughts?
A Series of Increasingly Perverse and Destructive Games
Related to: Higher Than the Most High
The linked post describes a game in which (I fudge a little), Omega comes to you and two other people, and ask you to tell him an integer. The person who names the largest integer is allowed to leave. The other two are killed.
This got me thinking about variations on the same concept, and here's what I've come up, taking that game to be GAME0. The results are sort of a fun time-waster, and bring up some interesting issues. For your enjoyment...
THE GAMES:
GAME1: Omega takes you and two strangers (all competent programmers), and kidnaps and sedates you. You awake in three rooms with instructions printed on the wall explaining the game, and a computer with an operating system and programming language compiler, but no internet. Food, water, and toiletries are provided, but no external communication. The participants are allowed to write programs on the computer in a language that supports arbitrarily large numerical values. The programs are taken by Omega and run on a hypercomputer in finite time (this hypercomputer can resolve the halting problem and infinite loops, but programs that do not eventually halt return no output). The person who wrote the program with the largest output is allowed to leave. The others are instantly and painlessly killed. In the event of a tie, everyone dies. If your program returns no output, that is taken to be zero.
GAME2: Identical to GAME1, except that each program you write has to take two inputs, which will be the text of the other players' programs (assume they're all written in the same language). The reward for outputting the largest number apply normally.
GAME3: Identical to Game2, except that while you are sedated, Omega painlessly and imperceptibly uploads you. Additionally, the instructions on the wall now specify that your program must take four inputs - blackbox functions which represent the uploaded minds of all three players, plus a simulation of the room you're in, indistinguishable from the real thing. We'll assume that players can't modify or interpret the contents of their opponents' brains. The room function take an argument of a string (which controls the text printed on the wall, and outputs whatever number the person in the simulation's program returns).
In each of these games, which program should you write if you wish to survive?
SOME DISCUSSION OF STRATEGY:
GAME1: Clearly, the trivial strategy (implement the Ackerman or similar fast-growing functions and generate some large integer), gives no better than random results, because it's the bare minimal strategy anyone will employ, and your ranking in the results, without knowledge of your opponents is entirely up to chance / how long you're willing to sit there typing nines for your Ackermann argument.
A few alternatives for your consideration:
1: if you are aware of an existence hypothesis (say, a number with some property which is not conclusively known to exist and could be any integer), write a program that brute-force tests all integers until it arrives at an integer which matches the requirements, and use this as the argument for your rapidly-growing function. While it may never return any output, if it does, the output will be an integer, and the expected value goes towards infinity.
2: Write a program that generates all programs shorter than length n, and finds the one with the largest output. Then make a separate stab at your own non-meta winning strategy. Take the length of the program you produce, tetrate it for safety, and use that as your length n. Return the return value of the winning program.
On the whole, though, this game is simply not all that interesting in a broader sense.
GAME2: This game has its own amusing quirks (primarily that it could probably actually be played in real life on a non-hypercomputer), however, most of its salient features are also present in GAME3, so I'm going to defer discussion to that. I'll only say that the obvious strategy (sum the outputs of the other two players' programs and return that) leads to an infinite recursive trawl and never halts if everyone takes it. This holds true for any simple strategy for adding or multiplying some constant with the outputs of your opponents' programs.
GAME3: This game is by far the most interesting. For starters, this game permits acausal negotiation between players (by parties simulating and conversing with one another). Furthermore, anthropic reasoning plays a huge role, since the player is never sure if they're in the real world, one of their own simulations, or one of the simulations of the other players.
Players can negotiate, barter, or threaten one another, they can attempt to send signals to their simulated selves (to indicate that they are in their own simulation and not somebody else's). They can make their choices based on coin flips, to render themselves difficult to simulate. They can attempt to brute-force the signals their simulated opponents are expecting. They can simulate copies of their opponents who think they're playing any previous version of the game, and are unaware they've been uploaded. They can simulate copies of their opponents, observe their meta-strategies, and plan around them. They can totally ignore the inputs from the other players and play just the level one game. It gets very exciting very quickly. I'd like to see what strategy you folks would employ.
And, as a final bonus, I present GAME4 : In game 4, there is no Omega, and no hypercomputer. You simply take a friend, chloroform them, and put them in a concrete room with the instructions for GAME3 on the wall, and a linux computer not plugged into anything. You leave them there for a few months working on their program, and watch what happens to their psychology. You win when they shrink down into a dead-eyed, terminally-paranoid and entirely insane shell of their former selves. This is the easiest game.
Happy playing!
Inferring Values from Imperfect Optimizers
One approach to constructing a Friendly artificial intelligence is to create a piece of software that looks at large amounts of evidence about humans, and attempts to infer their values. I've been doing some thinking about this problem, and I'm going to talk about some approaches and problems that have occurred to me.
In a naive approach, we might define the problem like this: take some unknown utility function, U, and plug it into a mathematically clean optimization process (like AIXI) O. Then, look at your data set and take the information about the inputs and outputs of humans, and find the simplest U that best explains human behavior.
Unfortunately, this won't work. The best possible match for U is one that models not just those elements of human utility we're interested in, but also all the details of our broken, contradictory optimization process. The U we derive through this process will optimize for confirmation bias, scope insensitivity, hindsight bias, the halo effect, our own limited intelligence and inefficient use of evidence, and just about everything else that's wrong with us. Not what we're looking for.
Okay, so let's try putting a bandaid on it - let's go back to our original problem setup. However, we'll take our original O, and use all of the science on cognitive biases at our disposal to handicap it. We'll limit its search space, saddle it with a laundry list of cognitive biases, cripple its ability to use evidence, and in general make it as human-like as we possibly can. We could even give it akrasia by implementing hyperbolic discounting of reward. Then we'll repeat the original process to produce U'.
If we plug U' into our AI, the result will be that it will optimize like a human who had suddenly been stripped of all the kinds of stupidity that we programmed into our modified O. This is good! Plugged into a solid CEV infrastructure, this might even be good enough to produce a future that's a nice place to live. However, it's not quite ideal. If we miss a cognitive bias, then it'll be incorporated into the learned utility functions, and we may never be rid of it. What would be nice would be if we could get the AI to learn about cognitive biases, exhaustively, and update in the future if it ever discovered a new one.
If we had enough time and money, we could do this the hard way: acquire a representative sample of the human population, and pay them to perform tasks with simple goals under tremendous surveillance, and have the AI derive the human optimization process from the actions taken towards a known goal. However, if we assume that the human optimization process can be defined as a function over the state of the human brain, we should not trust the completeness of any such process learned from less data than the entropy of the human brain, which is on the order of tens of petabytes of extremely high quality evidence. If we want to be confident in the completeness of our model, we may need more experimental evidence than it is really practical to accumulate. Which isn't to say that this approach is useless - if we can hit close enough to the mark, then the AI may be able to run more exhaustive experimentation later and refine its own understanding of human brains to be closer to the ideal.
But it'd really be nice if our AI could do unsupervised learning to figure out the details of human optimization. Then we could simply dump the internet into it, and let it grind away at the data and spit out a detailed, complete model of human decision-making, from which our utility function could be derived. Unfortunately, this does not seem to be a tractable problem. It's possible that some insight could be gleaned by examining outliers with normal intelligence, but deviant utility functions (I am thinking specifically of sociopaths), but it's unclear how much insight can be produced by these methods. If anyone has suggestions for a more efficient way of going about it, I'd love to hear it. As it stands, it might be possible to get enough information from this to supplement a supervised learning approach - the closer we get to a perfectly accurate model, the higher the probability of Things Going Well.
Anyways, that's where I am right now. I just thought I'd put up my thoughts and see if some fresh eyes see anything I've been missing.
Cheers,
Niger
AI "Boxing" and Utility Functions
So, I had this idea the other day when I was thinking about how to safely conduct research on potentially-FOOM-capable AI software. I'd like to sketch it out briefly and then get feedback on it.
So, this started out with the idea that an AI based on AIXI is, in some sense, safer than a fully functional AI, due to the existence of the anvil problem. Because AIXI can't conceive of its own nonexistence, it has no preference ordering over its own mortality, and won't (shouldn't) resist any attempt to shut it down. In other words, if AIXI starts to FOOM undesirably out of control, you actually can go pull the plug on it without fuss. Unfortunately, in terms of safety, the anvil problem gives AIXI a number of other undesirable properties: both third parties and the AI itself can modify its utility function at any time, for any reason, which is very unstable. However, I think a similar idea might be useful for reducing (though not eliminating) the existential risks posed by powerful optimization processes in the near term.
Say you have a piece of AI software ω, with an unFriendly instantaneous utility function, {maximize U}. You would like to use ω for some industrial application (say, manufacturing paperclips), but you're concerned about it FOOMing and resulting in human extinction. You decide to 'box' the AI, but, having read up on the subject, you are worried about it outsmarting you if you try to disable it.
So, you replace your original utility function U with a modified version in terms of U, U'.
U' = { max(U) | if ω exists
Ø | if ω !exist}
U' has several useful qualities. The agent will resist modifications to its utility function, while not resisting attempts to turn it off. It is entirely ambivalent towards its own existence. As a result, if it began to FOOM undesirably, stopping it would be fairly trivial. Furthermore, the AI would have no incentive to deceive us, so it'd be fairly easy to keep an eye on.
It should be noted that this is not in any way a final solution to the FAI problem. If the AI FOOMS quickly enough that we all die or get brain-hacked before we realize something's amiss, it could still all go very poorly. But that seems unlikely. I would guess that it takes at least a day or two to go from a boxed human-level AI to something strongly superhuman. Unfortunately, for this to work, everyone has to use it, which leave a lot of leftover existential risk from people using AIs without stable utility functions, cranks who think unFriendly AI will discover universal morality, and people who prematurely think they've figured out a good Friendly utility function.
That said, something like this could help to gain more time to develop a proper FAI, and would be relatively simple to sell other developers on. SI or a similar organization could even develop a standardized, cross-platform open-source software package for utility functions with all of this built in, and distribute it to wannabe strong-AI developers.
Are there any obvious problems with this idea that I'm missing? If so, can you think of any ways to address them? Has this sort of thing been discussed in the past?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)