ialdabaoth comments on On Terminal Goals and Virtue Ethics - LessWrong

67 Post author: Swimmer963 18 June 2014 04:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (205)

You are viewing a single comment's thread. Show more comments above.

Comment author: ialdabaoth 20 June 2014 05:51:41PM 6 points [-]

It need not amount to anything more complex than obey all instructions on this channel, where the instructions are no more complex than "shut yourself down"

And "always keep this channel open" and "don't corrupt any sensor data that outputs to this channel" and "don't send yourself commands on this channel" and "don't build anything so that it will send you a signal on this channel" and "don't build anything that will build anything that will eventually send you a signal on this channel unless a signal on this channel tells you to do it".

... and I can STILL think of more ways to corrupt that kind of hack.

Comment author: Lumifer 20 June 2014 06:10:41PM 3 points [-]

Not to mention that if you don't want script kiddies to have too much fun, you will need to authenticate the instructions on that channel which another very large can of very wriggly worms...

Comment author: TheAncientGeek 20 June 2014 06:34:05PM 1 point [-]

Yep, lots of stuff which very difficult in absolute terms, but not obviously more difficult relatively than Solve Human Morality.

Comment author: [deleted] 20 June 2014 07:47:22PM *  1 point [-]

The problem is not to "Solve Human Morality", the problem is to make an AI that will do what humans end up having wanted. Since this is a problem for which we can come up with solid definitions (just to plug my own work :-p), it must be a solvable problem. If it looks impossible or infeasible, that is simply because you are taking the wrong angle of attack.

Stop trying to figure out a way to avoid the problem and solve it.

For one thing, taboo the words "morality" and "ethics", and solve the simpler, realer problem: how do you make an AI do what you intend it to do when you convey some wish or demand in words? As Eliezer has said, humans are Friendly to each-other in this sense: when I ask another human to get me a pizza, the entire apartment doesn't get covered in a maximal number of pizzas. Another human understands what I really mean.

So just solve that: what reasoning structures does another agent need to understand what I really mean when I ask for a pizza?

But at least stop blatantly trolling LessWrong by trying to avoid the problem by saying blatantly stupid stuff like "Oh, I'll just put an off-switch on an AI, because obviously no agent of human-level intelligence would ever try to prevent the use of an off-switch by, you know, breaking it, or covering it up with a big metal box for protection."

Comment author: [deleted] 21 June 2014 04:47:05PM *  2 points [-]

The problem is not to "Solve Human Morality", the problem is to make an AI that will do what humans end up having wanted.

Is it? Why take on either of those gargantuan challenges? Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk. And no one at MIRI or on LW has proved this approach dangerous except by making crazy unrealistic assumptions, e.g. in this case why would you ever put the off-switch in a region of the AI's environment?

As you and Eliezer say, humans are Friendly to each other already. So have humans moderate the actions of the AI, in a controlled setup designed to prevent AI learning to manipulate the humans (break the feedback loop).

Comment author: [deleted] 21 June 2014 04:52:32PM 1 point [-]

Another perfectly reasonable approach is to task the AI with nothing more than data processing with no effectors in the real world (Oracle AI), and watch it like a hawk.

I consider this semi-reasonable, and in fact, wouldn't even feel the need to watch it like a hawk. Without a decision-outputting algorithm, it's not an agent, it's just a learner: it can't possibly damage human interests.

I say "semi" reasonable, because there is still the issue of understanding debug output from the Oracle's internal knowledge representations, and putting it to some productive usage.

I also consider a proper Friendly AI to be much more "morally profitable", in the sense of yielding a much greater benefit than usage of an Oracle Learner by untrustworthy humans.

Comment author: [deleted] 21 June 2014 06:01:24PM 2 points [-]

This becomes an issue of strategy. I assume the end goal is a positive singularity. The MIRI approach seems to be: design and build a provably "safe" AGI, then cede all power to it and hope for the best as it goes "FOOM" and moves us through the singularity. A strategy I would advocate for instead is: build an Oracle AI as soon as it is possible to do so with adequate protections, and use its super-intelligence to design singularity technologies which enable (augmented?) humans to pass through the singularity.

I prefer the latter approach as it can be done with today's knowledge and technology, and does not rely on mathematical breakthroughs on an indeterminate timescale which may or may not even be possible or result in a practical AGI design. The latter approach instead depends on straight-forward computer science and belts-and-suspenders engineering on a predictable timescale.

If I were executive director of MIRI, I would continue the workshops, because there is a non-zero probability that breakthrough might be made that radically simplifies the safe AGI design space. However I'd definitely spend more than half of the organizations budget and time on a strategy with a definable time-scale and an articulatable project plan, such as the Oracle-AGI-to-Intelligence-Augmentation approach I advocate, although others are possible.

Comment author: [deleted] 21 June 2014 07:25:48PM *  1 point [-]

Well that's where the "positive singularity" and "Friendly (enough) AGI" goals separate: if you choose the route to a "positive singularity" of human intelligence augmentation, you still face the problems of human irrationality, of human moral irrationality (lack of moral caring, moral akrasia, morals that are not aligned with yours, etc), but you now also face the issue of what happens to human evaluative judgement under the effects of intelligence augmentation. Can humans be modified while maintaining their values? We honestly don't know.

(And I for one am reasonably sure that nobody wise should ever make me their Singularity-grade god-leader, on grounds that my shouldness function, while not nearly as completely alien as Clippy's, is still relatively unusual, somewhere on an edge of a bell curve, and should therefore not be trusted with the personal or collective future of anyone who doesn't have a similar shouldness function. Sure, my meta-level awareness of this makes me Friendly, loosely speaking, but we humans are very bad at exercising perfect meta-level awareness of others' values all the time, and often commit evaluative mind-projection fallacies.)

What I would personally do, at this stage, is just to maintain a distribution (you know probability was gonna enter somewhere) over potential routes to a positive outcome. Plan and act according to the full distribution, through institutions like FHI and FLI and such, while still focusing the specific, achieve-a-single-narrow-outcome optimization power of MIRI's mathematical talents on building provably Friendly AGIs. Update early and often on whatever new information is available.

For instance, the more I look into AGI and cognitive science research, the more I genuinely feel the "Friendly AI route" can work quite well. From my point of view, it looks more like a research program than an impossible Herculian task (admittedly, the difference is often kinda hard to see to those who've never served time in a professional research environment), whereas something like safe human augmentation is currently full of unknown unknowns that are difficult to plan around.

And as much as I generally regard wannabe-ems with a little disdain for their flippant "what do I need reality for!?" views, I do think that researching human mind uploading would help discover a lot of the neurological and cognitive principles needed to build a Friendly AI (ie: what cognitive algorithms are we using to make evaluative judgements?), while also helping create avenues for agents with human motivations to "go FOOM" themselves, just in case, so that's worthwhile too.

Comment author: [deleted] 21 June 2014 11:43:33PM *  2 points [-]

The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That's an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.

FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I'm sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.

It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That'd be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.

But that's not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we'd better start investing heavily in alternatives.

The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.

We don't have time to be dicking around doing basic research on whiteboards.

Comment author: Eliezer_Yudkowsky 23 June 2014 07:13:43PM 5 points [-]

Aaaand there's the "It's too late to start researching FAI, we should've started 30 years ago, we may as well give up and die" to go along with the "What's the point of starting now, AGI is too far away, we should start 30 years later because it will only take exactly that amount of time according to this very narrow estimate I have on hand."

If the overlap between your credible intervals on "How much time we have left" and "How much time it will take" do not overlap, then you either know a heck of a lot I don't, or you are very overconfident. I usually try not to argue from "I don't know and you can't know either" but for the intersection of research and AGI timelines I can make an exception.

Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, "Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let's start now EOM."

Comment author: [deleted] 23 June 2014 10:10:55AM *  2 points [-]

Ok, let me finally get around to answering this.

FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in "philosophy" or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.

In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.

Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem".

If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.

We don't have time to be dicking around doing basic research on whiteboards.

Luckily, we don't need to dick around.

Comment author: TheAncientGeek 21 June 2014 04:00:17PM *  -2 points [-]

I'm not arguing that AI will necessary be safe. I am arguing that the failure modes in'vestigated by MIRI aren't likely. It is worthwhile to research effectivev off switches. It is not worthwhile to endlessly refer to a dangerous AI of a kind no one with a smidgeon of sense would build.

Comment author: [deleted] 21 June 2014 04:47:18PM 2 points [-]

Bzzzt. Wrong. You still haven't explained how to create an agent that will faithfully implement my verbal instruction to bring me a pizza. You have a valid case in the sense of pointing out that there can easily exist a "middle ground" between the Superintelligent Artificial Ethicist (Friendly AI in its fullest sense), the Superintelligent Paper Clipper (a perverse, somewhat unlikely malprogramming of a real superintelligence), and the Reward-Button Addicted Reinforcement Learner (the easiest unfriendly AI to actually build). What you haven't shown is how to actually get around the Addicted Reinforcement Learner <i>and</i> the paper-clipper and actually build an agent that can be sent out for pizza without breaking down at all.

Your current answers seem to be, roughly, "We get around the problem by expecting future AI scientists to solve it for us." However, we are the AI scientists: if we don't figure out how to make AI deliver pizza on command, who will?

Comment author: TheAncientGeek 21 June 2014 07:11:40PM *  1 point [-]

You keep misreading me. I am not claiming that to gave a solution. I am claiming that MIRI is overly pessimistic about the problem, and offering an over engineered solution. Inasmuch ad you say there is a middle ground, you kind if agree.

Comment author: [deleted] 21 June 2014 07:15:40PM 0 points [-]

The thing is, MIRI doesn't claim that a superintelligent world-destroying paperclipper is the most likely scenario. It's just illustrative of why we have an actual problem: because you don't need malice to create an Unfriendly AI that completely fucks everything up.

Comment author: TheAncientGeek 21 June 2014 07:22:48PM 1 point [-]

To make reliable predictions, more realistic examples are needed.

Comment author: [deleted] 21 June 2014 07:34:07PM 3 points [-]

So how did you like CATE, over in that other thread? That AI is non-super-human, doesn't go FOOM, doesn't acquire nanotechnology, can't do anything a human upload couldn't do... and still can cause quite a lot of damage simply because it's more dedicated than we are, suffers fewer cognitive flaws than us, has more self-knowledge than us, and has no need for rest or food.

I mean, come on: what if a non-FOOMed but Unfriendly AI becomes as rich as Bill Gates? After all, if Bill Gates did it while human, than surely an AI as smart as Bill Gates but without his humanity can do the same thing, while causing a bunch more damage to human values because it simply does not feel Gates' charitable inclinations.