Draft of Muehlhauser & Helm, 'The Singularity and Machine Ethics'

In 1981... Doug Lenat entered the Traveller Trillion Credit Squadron tournament... It was a war game. The contestants had been given several volumes of rules, well beforehand, and had been asked to design their own fleet of warships with a mythical budget of a trillion dollars. The fleets then squared off against one another in the course of a weekend...

Lenat had developed an artificial-intelligence program that he called Eurisko, and he decided to feed his program the rules of the tournament. Lenat did not give Eurisko any advice or steer the program in any particular strategic direction. He was not a war-gamer. He simply let Eurisko figure things out for itself. For about a month, for ten hours every night on a hundred computers at Xerox PARC, in Palo Alto, Eurisko ground away at the problem, until it came out with an answer. Most teams fielded some version of a traditional naval fleet—an array of ships of various sizes, each well defended against enemy attack. Eurisko thought differently. "The program came up with a strategy of spending the trillion on an astronomical number of small ships like P.T. boats, with powerful weapons but absolutely no defense and no mobility," Lenat said. "They just sat there. Basically, if they were hit once they would sink. And what happened is that the enemy would take its shots, and every one of those shots would sink our ships. But it didn't matter, because we had so many." Lenat won the tournament in a runaway.

The next year, Lenat entered once more, only this time the rules had changed. Fleets could no longer just sit there. Now one of the criteria of success in battle was fleet "agility." Eurisko went back to work. "What Eurisko did was say that if any of our ships got damaged it would sink itself—and that would raise fleet agility back up again," Lenat said. Eurisko won again.

...The other gamers were people steeped in military strategy and history... Eurisko, on the other hand, knew nothing but the rule book. It had no common sense... [But] not knowing the conventions of the game turned out to be an advantage.

[Lenat explained:] "What the other entrants were doing was filling in the holes in the rules with real-world, realistic answers. But Eurisko didn't have that kind of preconception..." So it found solutions that were, as Lenat freely admits, "socially horrifying": send a thousand defenseless and immobile ships into battle; sink your own ships the moment they get damaged.

[-]lukeprog11y90

Or, an example of human Munchkinism:

While playing Rollercoaster Tycoon one time, I remember that I was tasked with the mission of getting a higher approval rating than the park next door. Rather than make my park better, I instead built a rollercoaster that launched people at 100mph into my rival's park. Since technically those people died in my rival's park, their approval rating would plummet and people would rush to my park and straight into my deathcoaster, which only caused their rating to drop lower and lower. I did this for an hour until the game said I'd won.

[-]danieldewey11y40

Nice find! This will come in handy.

[-]fubarobfusco11y00

Sounds like the sort of strategy that evolution would invent. Or rather, already has, repeatedly — "build a lot of cheap little war machines and don't mind the casualties" is standard operating procedure for a lot of insects.

But yeah, it's an awesome lesson in "the AI optimizes for what you tell it to optimize for, not for what humans actually want."

[-]lavalamp12y80

Overall, I thought it was very good. I agree that "super optimizer" is more likely to create the correct impression in the average person than "super intelligence", and will stop using the latter term.

The bit about the "golem genie" seems forced, though-- I'm not sure it actually clarifies things. It seems like such a direct analogy; I'd expect that people that understand "superoptimizer" won't need the analogy, and those who don't understand, won't be helped by it. For the latter group of people, it might help to introduce the golem before talking about superoptimization at all. It's quite possible I'm wrong

[-]Kaj_Sotala12y70

Reading this, I felt a strange sense of calm coming over me: we finally have a really good introductory article to the issue, and SingInst finally has people who can write such articles.

I feel like humanity's future is in good hands, and that SI now has a realistic chance of attracting enough mainstream academic interest to make a difference.

Also, this paragraph:

Neuroeconomists and other cognitive neuroscientists can continue to uncover how human values are encoded and modified in the brain. Philosophers and mathematicians can develop more sophisticated value extrapolation algorithms, building on the literature concerning reflective equilibrium and “ideal preference” or “full information” theories of value. Economists, neuroscientists, and AI researchers can extend current results in choice modelling (Hess and Daly 2010) and preference elicitation (Domshlak et al. 2011) to extract preferences from human behavior and brain activity. Decision theorists can work to develop a decision theory that is capable of reasoning about decisions and values subsequent to self-modification: a “reflective” decision theory.

made me feel like SI might now have a clue of how to usefully put extra money into use if they got it, something that I was doubtful about before.

[-]Giles12y60

I like this - I feel it does a decent job of showing how your neuroscience posts fit into the FAI/intelligence explosion narrative. A few minor comments:

Using this term, it should be clear that a machine superoptimizer will not necessarily be modest or honest

I like the "superoptimizer" terminology, but this sentence makes it sound like we can expect superintelligence to behave differently merely by calling it something different. I realise this isn't what you mean - I just feel it would be better rephrased in terms of "this avoids bias-inducing loaded terminology".

Thus, though some utilitarians have proposed that all we value is pleasure, our intuitive negative reaction to hypothetical worlds in which pleasure is (more or less) maximized suggests that pleasure is not the only thing we value.

Very minor point: it would be nice to add a citation here: someone who says that orgasmium is suboptimal or that most people think orgasmium is suboptimal.

Consider the “crying baby” scenario (Greene et al. 2004):

What is it about this particular example that casts doubt on the homuncular "self"? I can believe that we have many cognitive modules that give competing answers to the crying baby dilemma, but how can I tell that just by reading it? (And doesn't the same thing happen for every moral dilemma we read?)

[-]AlexSchell12y60

You use the "Golem Genie" in an odd way (it figures in only a tiny portion of the paper). You introduce the thought experiment (to elicit a sense of urgency and concrete importance, I assume), and point out the analogy to superintelligence. With the exception of a few words on hedonistic utilitarianism, all the specific examples of moral theories resulting in unwanted consequences when implemented are talked about with reference to superintelligence, never mentioning the Genie again. If you want to keep the Genie part, I would keep it until you've gone through all the moral theories you discuss, and only at the end point out the analogy to superintelligence.

[-]Kaj_Sotala12y50

One very tiny nitpick:

———. “Infinite Ethics.” Unpublished manuscript, 2009.

Is "unpublished" the right term to use here? It hasn't been published in a peer-reviewed source, but in much usage, published online does count as published.

[-]jmmcd12y20

Agreed -- either way, give the URL.

Even smaller nitpicks concerning formatting and layout:

The footnotes are in a sans-serif font, and larger than the main text, and all the text is ragged-right, and there are lonely section headings (eg "The Golem Genie"), and footnotes split across pages.

The intro is basically a slightly expanded version of the abstract. That is common in academic publications, but not in good ones.

The paper seems to end rather than reach a conclusion. As with all the above, this is a criticism of form, not content.

[-]timtyler12y40

If a machine superoptimizer’s goal system is programmed to maximize pleasure, it might not tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience, but we think it would do something undesirable like that.

Step 1 - for many minds without too short a horzion - is to conquer the galaxy to make sure there are no aliens arount that might threaten their entire value system. Tiling the local universe with tiny happy digital minds could easily turn out to be a recipe for long-term disaster - resulting in a universe with happiness levels being dictated by others.

[-]Kaj_Sotala12y20

More generally, it seems that rules are unlikely to seriously constrain the actions of a machine superoptimizer. First, consider the case in which rules about allowed actions or consequences are added to a machine’s design “outside of” its goals. A machine superoptimizer will be able to circumvent the intentions of such rules in ways we cannot imagine, with far more disastrous effects than those of a lawyer who exploits loopholes in a legal code. A machine superoptimizer would recognize these rules as obstacles to achieving its goals, and would do everything in its considerable power to remove or circumvent them. It could delete the section of its source code that contains the rules, or it could create new machines that don’t have the constraint written into them. This approach requires humans to out-think a machine superoptimzer (Muehlhauser 2011).

This part feels like it should have a cite to Omohundro's Basic AI Drives paper, which contains these paragraphs:

If we wanted to prevent a system from improving itself, couldn’t we just lock up its hardware and not tell it how to access its own machine code? For an intelligent system, impediments like these just become problems to solve in the process of meeting its goals. If the payoff is great enough, a system will go to great lengths to accomplish an outcome. If the runtime environment of the system does not allow it to modify its own machine code, it will be motivated to break the protection mechanisms of that runtime. For example, it might do this by understanding and altering the runtime itself. If it can’t do that through software, it will be motivated to convince or trick a human operator into making the changes. Any attempt to place external constraints on a system’s ability to improve itself will ultimately lead to an arms race of measures and countermeasures.

Another approach to keeping systems from self-improving is to try to restrain them from the inside; to build them so that they don’t want to self-improve. For most systems, it would be easy to do this for any specific kind of self-improvement. For example, the system might feel a “revulsion” to changing its own machine code. But this kind of internal goal just alters the landscape within which the system makes its choices. It doesn’t change the fact that there are changes which would improve its future ability to meet its goals. The system will therefore be motivated to find ways to get the benefits of those changes without triggering its internal “revulsion”. For example, it might build other systems which are improved versions of itself. Or it might build the new algorithms into external “assistants” which it calls upon whenever it needs to do a certain kind of computation. Or it might hire outside agencies to do what it wants to do. Or it might build an interpreted layer on top of its machine code layer which it can program without revulsion. There are an endless number of ways to circumvent internal restrictions unless they are formulated extremely carefully.

[-]gwern12y10

Also, Legg’s formal definition of intelligence is drawn from a dualistic “agent-environment” model of optimal agency (Legg 2008: 40) that does not represent its own computation as occurring in a physical world with physical limits and costs. Our notion of optimization power is inspired by Yudkowsky (2008b).

Might be good to link to some papers on problems with the RL agents - horizon, mugging, and the delusion box http://lesswrong.com/lw/7fl/link_report_on_the_fourth_conference_on/

The Golem Genie is not explained - you're writing it as neither a positive nor negative agent: it is not an evil genie/demon and that's a intuition pump that should be avoided as an anthropomorphism.

Second, because the existence of zero-sum games means that the satisfaction of one human’s preferences can conflict with the satisfaction of another’s (Geckil and Anderson 2009).

And negative-sum games too, presumably, like various positional or arms races. (Don't have any citations, I'm afraid.)

[-]CharlesR12y10

My impression: You spend a long time discussing the problem, but very little on what solutions need to look like. It just ends.

[-]timtyler12y10

To avoid anthropomorphic bias and other previously mentioned problems with the word "intelligence," in this chapter we will use the term "machine superoptimizer" in place of "machine superintelligence."

That seems a bit over-the-top to me - "superintelligence" is fine.

[-]lukeprog12y00

Update: This link now points to a preprint.

[-]lukeprog12y00

Update: This link now points to a 02-26-2012 draft of 'The Singularity and Machine Ethics'.

[-]timtyler12y00

AI researcher J. Storrs Hall suggests that our machines may be more moral than we are, and cites as partial evidence the fact that in humans “criminality is strongly and negatively correlated with IQ” (Hall 2007, 340). But machine intelligence has little to do with IQ or with the human cognitive architectures and social systems that might explain a correlation between human criminality and IQ.

So: there are other reasons to expect that as well: more effective repuatation systems seem likely to make all actors more moral, including the companies that make computers and robots. Machines are a component of society - and they will be for quite a while.

Moderation Log