TheAncientGeek comments on On Terminal Goals and Virtue Ethics - Less Wrong

67 Post author: Swimmer963 18 June 2014 04:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (205)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheAncientGeek 20 June 2014 06:34:05PM 1 point [-]

Yep, lots of stuff which very difficult in absolute terms, but not obviously more difficult relatively than Solve Human Morality.

Comment author: [deleted] 20 June 2014 07:47:22PM *  1 point [-]

The problem is not to "Solve Human Morality", the problem is to make an AI that will do what humans end up having wanted. Since this is a problem for which we can come up with solid definitions (just to plug my own work :-p), it must be a solvable problem. If it looks impossible or infeasible, that is simply because you are taking the wrong angle of attack.

Stop trying to figure out a way to avoid the problem and solve it.

For one thing, taboo the words "morality" and "ethics", and solve the simpler, realer problem: how do you make an AI do what you intend it to do when you convey some wish or demand in words? As Eliezer has said, humans are Friendly to each-other in this sense: when I ask another human to get me a pizza, the entire apartment doesn't get covered in a maximal number of pizzas. Another human understands what I really mean.

So just solve that: what reasoning structures does another agent need to understand what I really mean when I ask for a pizza?

But at least stop blatantly trolling LessWrong by trying to avoid the problem by saying blatantly stupid stuff like "Oh, I'll just put an off-switch on an AI, because obviously no agent of human-level intelligence would ever try to prevent the use of an off-switch by, you know, breaking it, or covering it up with a big metal box for protection."

Comment author: [deleted] 21 June 2014 04:47:05PM *  2 points [-]

The problem is not to "Solve Human Morality", the problem is to make an AI that will do what humans end up having wanted.

Is it? Why take on either of those gargantuan challenges? Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk. And no one at MIRI or on LW has proved this approach dangerous except by making crazy unrealistic assumptions, e.g. in this case why would you ever put the off-switch in a region of the AI's environment?

As you and Eliezer say, humans are Friendly to each other already. So have humans moderate the actions of the AI, in a controlled setup designed to prevent AI learning to manipulate the humans (break the feedback loop).

Comment author: [deleted] 21 June 2014 04:52:32PM 1 point [-]

Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk.

I consider this semi-reasonable, and in fact, wouldn't even feel the need to watch it like a hawk. Without a decision-outputting algorithm, it's not an agent, it's just a learner: it can't possibly damage human interests.

I say "semi" reasonable, because there is still the issue of understanding debug output from the Oracle's internal knowledge representations, and putting it to some productive usage.

I also consider a proper Friendly AI to be much more "morally profitable", in the sense of yielding a much greater benefit than usage of an Oracle Learner by untrustworthy humans.

Comment author: [deleted] 21 June 2014 06:01:24PM 2 points [-]

This becomes an issue of strategy. I assume the end goal is a positive singularity. The MIRI approach seems to be: design and build a provably "safe" AGI, then cede all power to it and hope for the best as it goes "FOOM" and moves us through the singularity. A strategy I would advocate for instead is: build an Oracle AI as soon as it is possible to do so with adequate protections, and use its super-intelligence to design singularity technologies which enable (augmented?) humans to pass through the singularity.

I prefer the latter approach as it can be done with today's knowledge and technology, and does not rely on mathematical breakthroughs on an indeterminate timescale which may or may not even be possible or result in a practical AGI design. The latter approach instead depends on straight-forward computer science and belts-and-suspenders engineering on a predictable timescale.

If I were executive director of MIRI, I would continue the workshops, because there is a non-zero probability that breakthrough might be made that radically simplifies the safe AGI design space. However I'd definitely spend more than half of the organizations budget and time on a strategy with a definable time-scale and an articulatable project plan, such as the Oracle-AGI-to-Intelligence-Augmentation approach I advocate, although others are possible.

Comment author: [deleted] 21 June 2014 07:25:48PM *  1 point [-]

Well that's where the "positive singularity" and "Friendly (enough) AGI" goals separate: if you choose the route to a "positive singularity" of human intelligence augmentation, you still face the problems of human irrationality, of human moral irrationality (lack of moral caring, moral akrasia, morals that are not aligned with yours, etc), but you now also face the issue of what happens to human evaluative judgement under the effects of intelligence augmentation. Can humans be modified while maintaining their values? We honestly don't know.

(And I for one am reasonably sure that nobody wise should ever make me their Singularity-grade god-leader, on grounds that my shouldness function, while not nearly as completely alien as Clippy's, is still relatively unusual, somewhere on an edge of a bell curve, and should therefore not be trusted with the personal or collective future of anyone who doesn't have a similar shouldness function. Sure, my meta-level awareness of this makes me Friendly, loosely speaking, but we humans are very bad at exercising perfect meta-level awareness of others' values all the time, and often commit evaluative mind-projection fallacies.)

What I would personally do, at this stage, is just to maintain a distribution (you know probability was gonna enter somewhere) over potential routes to a positive outcome. Plan and act according to the full distribution, through institutions like FHI and FLI and such, while still focusing the specific, achieve-a-single-narrow-outcome optimization power of MIRI's mathematical talents on building provably Friendly AGIs. Update early and often on whatever new information is available.

For instance, the more I look into AGI and cognitive science research, the more I genuinely feel the "Friendly AI route" can work quite well. From my point of view, it looks more like a research program than an impossible Herculian task (admittedly, the difference is often kinda hard to see to those who've never served time in a professional research environment), whereas something like safe human augmentation is currently full of unknown unknowns that are difficult to plan around.

And as much as I generally regard wannabe-ems with a little disdain for their flippant "what do I need reality for!?" views, I do think that researching human mind uploading would help discover a lot of the neurological and cognitive principles needed to build a Friendly AI (ie: what cognitive algorithms are we using to make evaluative judgements?), while also helping create avenues for agents with human motivations to "go FOOM" themselves, just in case, so that's worthwhile too.

Comment author: [deleted] 21 June 2014 11:43:33PM *  2 points [-]

The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That's an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.

FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I'm sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.

It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That'd be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.

But that's not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we'd better start investing heavily in alternatives.

The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.

We don't have time to be dicking around doing basic research on whiteboards.

Comment author: Eliezer_Yudkowsky 23 June 2014 07:13:43PM 5 points [-]

Aaaand there's the "It's too late to start researching FAI, we should've started 30 years ago, we may as well give up and die" to go along with the "What's the point of starting now, AGI is too far away, we should start 30 years later because it will only take exactly that amount of time according to this very narrow estimate I have on hand."

If the overlap between your credible intervals on "How much time we have left" and "How much time it will take" do not overlap, then you either know a heck of a lot I don't, or you are very overconfident. I usually try not to argue from "I don't know and you can't know either" but for the intersection of research and AGI timelines I can make an exception.

Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, "Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let's start now EOM."

Comment author: [deleted] 23 June 2014 10:25:15PM *  2 points [-]

I think that's a gross simplification of the possible outcomes.

Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, "Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let's start now EOM."

I think you need better planning.

There's a great essay that has been a featured article on the main page for some time now called Levels of Action. Applied to FAI theory:

Level 1: Directly ending human suffering.

Level 2: Constructing an AGI capable of ending human suffering for us.

Level 3: Working on the computer science aspects of AGI theory.

Level 4: Researching FAI theory, which constrains the Level 3 AGI theory.

But for that high-level basic research to have any utility, these levels must be connected to each other: there must be a firm chain where FAI theory informs AGI designs, which are actually used in the construction of an AGI tasked with ending human suffering in a friendly way.

From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!

That makes a certain amount of intuitive sense, having stages laid out end-to-end in chronological order. However as a trained project manager I must tell you this is a recipe for disaster! The problem is that the design space branches out at each link, but without the feedback of follow-on steps, inefficient decision making will occur at earlier stages. The space of working FAI theories is much, much larger than the FAI-theory-space which results in practical AGI designs which can be implemented prior to the UFAI competition and are suitable for addressing real-world issues of human suffering as quickly as possible.

Some examples from the comparably large programs of the Manhattan project and Apollo moonshot are appropriate, if you'll forgive the length (skip to the end for a conclusion):

The Manhattan project had one driving goal: drop a bomb on Berlin and Tokyo before the GIs arrived, hopefully ending the war early. (Of course Germany surrendered before the bomb was finished, and Tokyo ended up so devastated by conventional firebombing that Hiroshima and Nagasaki were selected instead, but the original goal is what matters here.) The location of the targets meant that the bomb had to be small enough to fit in a conventional long-distance bomber, and the timeline meant that the simpler but less efficient U-235 designs were preferred. A program was designed, adequate resources allocated, and the goal achieved on time.

On the other hand it is easy to imagine how differently things might have gone if the strategy was reversed; if instead the US military decided to institute a basic research program into nuclear physics and atomic structure, before deciding on the optimal bomb reactions, then doing detailed bomb design before creating the industry necessary to produce enough material for a working weapon. Just looking at the first stage, there is nothing a priori which makes it obvious that U-235 and Pu-239 are the "interesting" nuclear fuels to focus on. Thorium, for example, was more naturally abundant and already being extracted as a by product of rare earth metal extraction, its reactions generate less lethal radiation and long-lasting waste products, and does generate U-233 which could be used in a nuclear bomb. However the straight-forward military and engineering requirements of making a bomb on schedule, and successfully delivering it on target favored U-235 and Pu-239 based weapon designs, which focused focused the efforts of the physicists involved on those fuel pathways. The rest is history.

The Apollo moonshot is another great example. NASA had a single driving goal: deliver a man to the moon before 1970, and return him safely to Earth. There's a lot of decisions that were made in the first few years driven simply by time and resources available: e.g. heavy-lift vs orbital assembly, direct return vs lunar rendezvous, expendable vs. reuse, staging vs. fuel depots. Ask Wernher von Braun what he imagined an ideal moon mission would look like, and you would have gotten something very different than Apollo. But with Apollo NASA made the right tradeoffs with respect to schedule constraints and programmatic risk.

The follow-on projects of Shuttle and Station are a completely different story, however. They were designed with no articulated long-term strategy, which meant they tried to be everything to everybody and as a result were useful to no one. Meanwhile the basic research being carried out at NASA has little, if anything to do with the long-term goals of sending humans to Mars. There's an entire division, the Space Biosciences group, which does research on Station about the long-term effects of microgravity and radiation on humans, supposedly to enable a long-duration voyage to Mars. Never mind that the microgravity issue is trivially solved by spinning the spacecraft with nothing more than a strong steel rope as a tether, and the radiation issue is sufficiently mitigated by having a storm shelter en route and throwing a couple of Martian sandbags on the roof once you get there.

There's an apocryphal story about the US government spending millions of dollars to develop the "Space Pen" -- a ballpoint pen with ink under pressure to enable writing in microgravity environments. Much later at some conference an engineer in that program meets his Soviet counterpart and asks how they solved that difficult problem. The cosmonauts used a pencil.

Sadly the story is not true -- the "Space Pen" was a successful marketing ploy by inventor Paul Fisher without any ties to NASA, although it was used by NASA and the Russians on later missions -- but it does serve to illustrate the point very succinctly. I worry that MIRI is spending its days coming up with space pens when a pencil would have done just fine.

Let me provide some practical advice. If I were running MIRI, I would still employ mathematicians working on the hail-Mary of a complete FAI theory -- avoiding the Löbian obstacle etc. -- and run the very successful workshops, though maybe just two a year. But beyond that I would spend all remaining resources on a pragmatic AGI design programme:

1) Have a series of workshops with AGI people to do a review of possible AI-influenced strategies for a positive singulatiry -- top-down FAI, seed AI to FAI, Oracle AI to FAI, Oracle AI to human augmentation, teaching a UFAI morals in a nursery environment, etc.

2) Have a series of workshops, again with AGI people to review tactics: possible AGI architectures & the minimal seed AI for each architecture, probabilistically reliable boxing setups, programmatic security, etc.

Then use the output of these workshops -- including reliable constraints on timelines -- to drive most of the research done by MIRI. For example, I anticipate that reliable unfriendly Oracle AI setups will require probabilistically auditable computation, which itself will require a strongly typed, purely functional virtual machine layer from which computation traces can be extracted and meaningfully analyzed in isolation. This is the sort of research MIRI could sponsor a grad student or Ph.d postdoc to perform.

BTW, other gripe: I have yet to see adequate arguments for the "can we realistically avoid having to do this?" from MIRI which aren't strawman arguments.

Comment author: Eliezer_Yudkowsky 24 June 2014 06:25:47PM 0 points [-]

From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!

Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, "Attack the most promising problems that present themselves, at every point; don't actually build things which you don't yet know how to make not destroy the world, at any point." Right now this means working on unbounded problems because there are no bounded problems which seem more relevant and more on the critical path. If at any point we can build something to test ideas, of course we will; unless our state of ignorance is such that we can't test that particular idea without risking destroying the world, in which case we won't, but if you're really setting out to test ideas you can probably figure out some other way to test them, except for very rare highly global theses like "The intelligence explosion continues past the human level." More local theses should be testable.

See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.

Comment author: shminux 23 June 2014 10:52:41PM -1 points [-]

While I don't know much about your AGi expertise, I agree that MIRI is missing an experienced top-level executive who knows how to structure, implement and risk-mitigate an ambitious project like FAI and has a track record to prove it. Such a person would help prevent flailing about and wasting time and resources. I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they've got. Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don't even come close in terms of ambition, risk, effort or complexity.

Comment author: TheAncientGeek 23 June 2014 07:55:36PM -1 points [-]

Do we need to do this = wild guess.

The whole things a Drake Equation

Comment author: [deleted] 23 June 2014 10:10:55AM *  2 points [-]

Ok, let me finally get around to answering this.

FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in "philosophy" or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.

In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.

Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem".

If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.

We don't have time to be dicking around doing basic research on whiteboards.

Luckily, we don't need to dick around.

Comment author: Eliezer_Yudkowsky 23 June 2014 07:14:53PM 2 points [-]

I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about.

Name three. FAI contains a number of counterintuitive difficulties and it's unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn't MIRI citing it, is far more probable from my perspective and prior.

Comment author: [deleted] 24 June 2014 08:19:36AM *  2 points [-]

I wouldn't say that there's someone out there directly solving FAI problems without having explicitly intended to do so. I would say there's a lot we can build on.

Keep in mind, I've seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.

One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.

That's only two, but I'm a comparative beginner at this stuff and Eld Science isn't very good at focusing on our problems, so I expect that there's actually more to discover and I'm just limited by lack of time and knowledge to do the literature searches.

By the way, I'm already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.

Comment author: DefectiveAlgorithm 24 June 2014 09:10:11AM *  1 point [-]

an Oracle AI you can trust

That's a large portion of the FAI problem right there.

EDIT: To clarify, by this I don't mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.

Comment author: [deleted] 24 June 2014 09:33:46AM -1 points [-]

In-context, what was meant by "Oracle AI" is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.

Comment author: [deleted] 23 June 2014 06:46:11PM *  1 point [-]

Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem".

1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of "correct" behavior according to a goal set that is by necessity included in the modifications as well.

2) Designing such a high level set of goals which ensure "friendliness".

Comment author: TheAncientGeek 24 June 2014 08:59:09AM 0 points [-]

Designing, not evolving?

Comment author: [deleted] 24 June 2014 08:10:02AM 0 points [-]

(1) is naturalized induction, logical uncertainty, and getting around the Loebian Obstacle.

(2) is the cognitive science of evaluative judgements.

Comment author: TheAncientGeek 23 June 2014 07:39:01PM 0 points [-]

You don't need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.

Comment author: TheAncientGeek 21 June 2014 04:00:17PM *  -2 points [-]

I'm not arguing that AI will necessary be safe. I am arguing that the failure modes in'vestigated by MIRI aren't likely. It is worthwhile to research effectivev off switches. It is not worthwhile to endlessly refer to a dangerous AI of a kind no one with a smidgeon of sense would build.

Comment author: [deleted] 21 June 2014 04:47:18PM 2 points [-]

Bzzzt. Wrong. You still haven't explained how to create an agent that will faithfully implement my verbal instruction to bring me a pizza. You have a valid case in the sense of pointing out that there can easily exist a "middle ground" between the Superintelligent Artificial Ethicist (Friendly AI in its fullest sense), the Superintelligent Paper Clipper (a perverse, somewhat unlikely malprogramming of a real superintelligence), and the Reward-Button Addicted Reinforcement Learner (the easiest unfriendly AI to actually build). What you haven't shown is how to actually get around the Addicted Reinforcement Learner <i>and</i> the paper-clipper and actually build an agent that can be sent out for pizza without breaking down at all.

Your current answers seem to be, roughly, "We get around the problem by expecting future AI scientists to solve it for us." However, we are the AI scientists: if we don't figure out how to make AI deliver pizza on command, who will?

Comment author: TheAncientGeek 21 June 2014 07:11:40PM *  1 point [-]

You keep misreading me. I am not claiming that to gave a solution. I am claiming that MIRI is overly pessimistic about the problem, and offering an over engineered solution. Inasmuch ad you say there is a middle ground, you kind if agree.

Comment author: [deleted] 21 June 2014 07:15:40PM 0 points [-]

The thing is, MIRI doesn't claim that a superintelligent world-destroying paperclipper is the most likely scenario. It's just illustrative of why we have an actual problem: because you don't need malice to create an Unfriendly AI that completely fucks everything up.

Comment author: TheAncientGeek 21 June 2014 07:22:48PM 1 point [-]

To make reliable predictions, more realistic examples are needed.

Comment author: [deleted] 21 June 2014 07:34:07PM 3 points [-]

So how did you like CATE, over in that other thread? That AI is non-super-human, doesn't go FOOM, doesn't acquire nanotechnology, can't do anything a human upload couldn't do... and still can cause quite a lot of damage simply because it's more dedicated than we are, suffers fewer cognitive flaws than us, has more self-knowledge than us, and has no need for rest or food.

I mean, come on: what if a non-FOOMed but Unfriendly AI becomes as rich as Bill Gates? After all, if Bill Gates did it while human, than surely an AI as smart as Bill Gates but without his humanity can do the same thing, while causing a bunch more damage to human values because it simply does not feel Gates' charitable inclinations.