PhilosophyTutor comments on Siren worlds and the perils of over-optimised search - Less Wrong

27 Post author: Stuart_Armstrong 07 April 2014 11:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (411)

You are viewing a single comment's thread. Show more comments above.

Comment author: PhilosophyTutor 07 May 2014 11:05:03AM 1 point [-]

I won't argue against the claim that we could conceivably create an AI without knowing anything about how to create an AI. It's trivially true in the same way that we could conceivably turn a monkey loose on a typewriter and get strong AI.

I also agree with you that if we got an AI that way we'd have no idea how to get it to do any one thing rather than another and no reason to trust it.

I don't currently agree that we could make such an AI using a non-functioning brain model plus "a bit of evolution". I am open to argument on the topic but currently it seems to me that you might as well say "magic" instead of "evolution" and it would be an equivalent claim.

Comment author: Stuart_Armstrong 07 May 2014 05:04:18PM 0 points [-]

Why are you confident that an AI that we do develop will not have these traits? You agree the mindspace is large, you agree we can develop some cognitive abilities without understanding them. If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments, that the AI will be likely developed for a military or commercial purpose, I don't see why you'd have high confidence that they will converge on a safe design?

Comment author: XiXiDu 07 May 2014 05:54:32PM 2 points [-]

If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments...I don't see why you'd have high confidence that they will converge on a safe design?

Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way?

If programs fail in the real world then they are not working well. You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

Comment author: Stuart_Armstrong 12 May 2014 11:07:27AM 0 points [-]

Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way?

If it fails at being powerful, we don't have to worry about it, so I feel free to ignore those probabilities.

You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

Comment author: TheAncientGeek 12 May 2014 12:22:46PM 0 points [-]

So you're not pursuing the claim that a SAI will probably be dangerous, you are just worried that it might be?

Comment author: Stuart_Armstrong 12 May 2014 04:31:12PM 0 points [-]

My claim has always been that the probability that an SAI will be dangerous is too high to ignore. I fluctuate on the exact probability, but I've never seen anything that drives it down to a level I feel comfortable with (in fact, I've never seen anything drive it below 20%).

Comment author: XiXiDu 13 May 2014 09:28:12AM -2 points [-]

You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

This line of reasoning still seems flawed to me. It's just like saying that you can build an airplane that can fly and land, autonomously, except that your plane is going to forcefully crash into a nuclear power plant.

The gist of the matter is that there are a vast number of ways that you can fail at predicting your programs behavior. Most of these failure modes are detrimental to the overall optimization power of the program. This is because being able to predict the behavior of your AI, to the extent necessary for it to outsmart humans, is analogous to predicting that your airplane will fly without crashing. Eliminating humans, in order to optimize the economy, is about as likely as your autonomous airplane crashing into a nuclear power plant, in order to land safely.

Comment author: nshepperd 13 May 2014 11:58:03AM *  3 points [-]

I don't know why you think you can predict the likely outcome of an artificial general intelligence by making surface analogies to things that aren't even optimization processes. People have been using analogies to "predict" nonsense for centuries.

In this case there are a variety of reasons that a programmer might succeed at preventing a UAV from crashing into a nuclear power plant, yet fail at preventing AGI from eliminating all humans. Mainly revolving around the fact that most programmers wouldn't even consider the "eliminate all humans" option as a serious possibility until it had already occurred, while the problem of physical obstructions is explicitly a part of the UAV problem definition. That itself has to do with the fact that an AGI can represent internally features of the world that weren't even considered by the designers (due to general intelligence).

As an aside, serious misconfigurations or unintended results of computer programs happen all the time today, but you don't generally hear or care about them because they don't end the world.

Comment author: [deleted] 12 May 2014 04:03:20PM -1 points [-]

But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...

This is why the Wise employ normative uncertainty and the learning of utility functions from data, rather than hardcoding verbal instructions that only make sense in light of a complete human mind and social context.

Comment author: Stuart_Armstrong 12 May 2014 04:29:15PM 2 points [-]

employ normative uncertainty and the learning of utility functions from data

Indeed. But the more of the problem you can formalise and solve (eg maintaining a stable utility function over self-improvements) the more likely the learning approach is to succeed.

Comment author: [deleted] 12 May 2014 08:17:23PM 1 point [-]

Well yes, of course. I mean, if you can't build an agent that was capable of maintaining its learned utility while becoming vastly smarter (and thus capable of more accurately learning and enacting capital-G Goodness), then all that utility-learning was for nought.

Comment author: XiXiDu 13 May 2014 12:10:22PM *  0 points [-]

The very idea underlying AI is enabling people to get a program to do what they mean without having to explicitly encode all details. What AI risk advocates do is to turn the whole idea upside down, claiming that, without explicitly encoding what you mean, your program will do something else. The problem here is that it is conjectured that the program will do what it was not meant to do in a very intelligent and structured manner. But this can't happen when it comes to intelligently designed systems (as opposed to evolved systems), because the nature of unintended consequences is overall chaotic.

How often have you heard of intelligently designed programs that achieved something highly complex and marvelous, but unintended, thanks to the programmers being unable to predict the behavior of the program? I don't know of any such case. But this is exactly what AI risk advocates claim will happen, namely that a program designed to do X (calculate 1+1) will perfectly achieve Y (take over the world).

If artificial general intelligence will eventually be achieved by some sort of genetic/evolutionary computation, or neuromorphic engineering, then I can see how this could lead to unfriendly AND capable AI. But an intelligently designed AI will either work as intended or be incapable of taking over the world (read: highly probable).

This of course does not ensure a positive singularity (if you believe that this is possible at all), since humans might use such intelligently and capable AIs to wreck havoc (ask the AI to do something stupid, or something that clashes with most human values). So there is still a need for "friendly AI". But this is quite different from the idea of interpreting "make humans happy" as "tile the universe with smiley faces". Such a scenario contradicts the very nature of intelligently designed AI, which is an encoding of “Understand What Humans Mean” AND “Do What Humans Mean”. More here.

Comment author: [deleted] 13 May 2014 05:23:00PM *  1 point [-]

If artificial general intelligence will eventually be achieved by some sort of genetic/evolutionary computation, or neuromorphic engineering, then I can see how this could lead to unfriendly AND capable AI. But an intelligently designed AI will either work as intended or be incapable of taking over the world (read: highly probable).

Alexander, have you even bothered to read the works of Marcus Hutter and Juergen Schmidhuber, or have you spent all your AI-researching time doing additional copy-pastas of this same argument every single time the subject of safe or Friendly AGI comes up?

Your argument makes a measure of sense if you are talking about the social process of AGI development: plainly, humans want to develop AGI that will do what humans intend for it to do. However, even a cursory look at the actual research literature shows that the mathematically most simple agents (ie: those that get discovered first by rational researchers interested in finding universal principles behind the nature of intelligence) are capital-U Unfriendly, in that they are expected-utility maximizers with not one jot or tittle in their equations for peace, freedom, happiness, or love, or the Ideal of the Good, or sweetness and light, or anything else we might want.

(Did you actually expect that in this utterly uncaring universe of blind mathematical laws, you would find that intelligence necessitates certain values?)

No, Google Maps will never turn superintelligent and tile the solar system in computronium to find me a shorter route home from a pub crawl. However, an AIXI or Goedel Machine instance will, because these are in fact entirely distinct algorithms.

In fact, when dealing with AIXI and Goedel Machines we have an even bigger problem than "tile everything in computronium to find the shortest route home": the much larger problem of not being able to computationally encode even a simple verbal command like "find the shortest route home". We are faced with the task of trying to encode our values into a highly general, highly powerful expected-utility maximizer at the level of, metaphorically speaking, pre-verbal emotion.

Otherwise, the genie will know, but not care.

Now, if you would like to contribute productively, I've got some ideas I'd love to talk over with someone for actually doing something about some few small corners of Friendliness subproblems. Otherwise, please stop repeating yourself.

Comment author: XiXiDu 14 May 2014 11:15:34AM -1 points [-]

Alexander, have you even bothered to read the works of Marcus Hutter and Juergen Schmidhuber...

I asked several people what they think about it, and to provide a rough explanation. I've also had e-Mail exchanges with Hutter, Schmidhuber and Orseau. I also informally thought about whether practically general AI that falls into the category “consequentialist / expected utility maximizer / approximation to AIXI” could ever work. And I am not convinced.

If general AI, which is capable of a hard-takeoff, and able to take over the world, requires less lines of code, in order to work, than to constrain it not to take over the world, then that's an existential risk. But I don't believe this to be the case.

Since I am not a programmer, or computer scientist, I tend to look at general trends, and extrapolate from there. I think this makes more sense than to extrapolate from some unworkable model such as AIXI. And the general trend is that humans become better at making software behave as intended. And I see no reason to expect some huge discontinuity here.

Here is what I believe to be the case:

(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.

(2) Error detection and prevention is such a capability.

(3) Something that is not better than humans at preventing errors is no existential risk.

(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.

(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.

Here is what I doubt:

(1) Present-day software is better than previous software generations at understanding and doing what humans mean.

(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.

(3) If there is better software, there will be even better software afterwards.

(4) Magic happens.

(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.

Comment author: jimrandomh 14 May 2014 12:31:57PM 5 points [-]

Since I am not a programmer, or computer scientist

This is a much bigger problem for your ability to reason about this area than you think.

Comment author: XiXiDu 14 May 2014 01:53:21PM 1 point [-]

Since I am not a programmer, or computer scientist

This is a much bigger problem for your ability to reason about this area than you think.

A relevant quote from Eliezer Yudkowsky (source):

I am tempted to say that a doctorate in AI would be negatively useful, but I am not one to hold someone’s reckless youth against them – just because you acquired a doctorate in AI doesn’t mean you should be permanently disqualified.

And another one (source):

I also think that evaluation by academics is a terrible test for things that don’t come with blatant overwhwelming unmistakable undeniable-even-to-humans evidence – e.g. this standard would fail MWI, molecular nanotechnology, cryonics, and would have recently failed ‘high-carb diets are not necessarily good for you’. I don’t particularly expect this standard to be met before the end of the world, and it wouldn’t be necessary to meet it either.

So since academic consensus on the topic is not reliable, and domain knowledge in the field of AI is negatively useful, what are the prerequisites for grasping the truth when it comes to AI risks?

Comment author: [deleted] 14 May 2014 04:51:22PM *  -2 points [-]

I also informally thought about whether practically general AI that falls into the category “consequentialist / expected utility maximizer / approximation to AIXI” could ever work. And I am not convinced.

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power. Strangely, doing so will not make it conform to your ideas about "eventual future AGI", because this one is actually existing AGI, and reality doesn't have to listen to you.

If general AI, which is capable of a hard-takeoff, and able to take over the world, requires less lines of code, in order to work, than to constrain it not to take over the world, then that's an existential risk.

That is exactly the situation we face, your refusal to believe in actually-existing AGI models notwithstanding. Whine all you please: the math will keep on working.

Since I am not a programmer, or computer scientist,

Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself. I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them. And I found that the actual evidence of the AGI research record showed MIRI's claims to be basically correct, modulo Eliezer's claims about an intelligence explosion taking place versus Hutter's claim that an eventual optimal agent will simply scale itself up in intelligence with the amount of computing power it can obtain.

That's right, not everyone here is some kind of brainwashed cultist. Many of us have exercised basic skepticism against claims with extremely low subjective priors. But we exercised our skepticism by doing the background research and checking the presently available object-level evidence rather than by engaging in meta-level speculations about an imagined future in which everything will just work out.

Take a course at your local technical college, or go on a MOOC, or just dust off a whole bunch of textbooks in computer-scientific and mathematical subjects, study the necessary knowledge to talk about AGI, and then you get to barge in telling everyone around you how we're all full of crap.

Comment author: V_V 14 June 2014 10:07:54PM 2 points [-]

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.

I think you are underestimating this by many orders of magnitudes.

Comment author: private_messaging 14 June 2014 07:32:08PM *  2 points [-]

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.

Which one are you talking about, to be completely exact?

I am a trained computer scientist

then use that training and figure out how many galaxies worth of computing power it's going to take.

Comment author: Lumifer 14 May 2014 05:13:39PM 3 points [-]

Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself.

That suggestion would make LW a sad and lonely place.

Are you sure you mean it?

I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them. And I found that the actual evidence of the AGI research record showed MIRI's claims to be basically correct

So, why MIRI's claims aren't accepted by the mainstream, then? Is it because all the "trained computer scientiests" are too dumb or too lazy to see the truth? Or is it the case that the "evidence" is contested, ambiguous, and inconclusive?

Comment author: XiXiDu 18 May 2014 10:41:44AM *  2 points [-]

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power. Strangely, doing so will not make it conform to your ideas about "eventual future AGI", because this one is actually existing AGI, and reality doesn't have to listen to you.

I consider efficiency to be a crucial part of the definition of intelligence. Otherwise, as someone else told you in another comment, unlimited computing power implies that you can do "an exhaustive brute-force search through the entire solution space and be done in an instant."

That is exactly the situation we face, your refusal to believe in actually-existing AGI models notwithstanding. Whine all you please: the math will keep on working.

I'd be grateful if you could list your reasons (or the relevant literature) for believing that AIXI related research is probable enough to lead to efficient artificial general intelligence (AGI) in order for it to make sense to draw action relevant conclusions from AIXI about efficient AGI.

I do not doubt the math. I do not doubt that evolution (variation + differential reproduction + heredity + mutation + genetic drift) underlies all of biology. But that we understand evolution does not mean that it makes sense to call synthetic biology an efficient approximation of evolution.

Comment author: TheAncientGeek 14 May 2014 05:46:18PM 1 point [-]

Did you check the claim that we have something dangerously unfriendly?

Comment author: David_Gerard 14 June 2014 07:58:10PM 0 points [-]

Too bad. I can download an inefficient but functional subhuman AGI from Github. Making it superhuman is just a matter of adding an entire planet's worth of computing power.

what

Comment author: XiXiDu 14 May 2014 05:19:34PM *  0 points [-]

I am a trained computer scientist, and I held lots of skepticism about MIRI's claims, so I used my training and education to actually check them.

Why don't you make your research public? Would be handy to have a thorough validation of MIRI's claims. Even if people like me wouldn't understand it, you could publish it and thereby convince the CS/AI community of MIRI's mission.

Then I recommend you shut up about matters of highly involved computer science until such time as you have acquired the relevant knowledge for yourself.

Does this also apply to people who support MIRI without having your level of insight?

But we exercised our skepticism by doing the background research and checking the presently available object-level evidence...

If only you people would publish all this research.

Comment author: TheAncientGeek 13 May 2014 06:24:26PM *  -1 points [-]

Of course we haven't discovered anything dangerously unfriendly...

Or anything that can't be boxed. Remind me how AIs are supposed to out of boxes?

Comment author: [deleted] 13 May 2014 07:57:54PM 1 point [-]

Of course we haven't discovered anything dangerously unfriendly...

Of course we have, it's called AIXI. Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

Or anything that can't be boxed. Remind me how AIs are supposedmtomgetnout of boxes?

AIs can be causally boxed, just like anything else. That is, as long as the agent's environment absolutely follows causal rules without any exception that would leak information about the outside world into the environment, the agent will never infer the existence of a world outside its "box".

But then it's also not much use for anything besides Pac-Man.

Comment author: Eugine_Nier 20 May 2014 03:07:43AM *  4 points [-]

Of course we have, it's called AIXI.

Given how slow and dumb it is, I have a hard time seeing an approximation to AIXI as a threat to anyone, except maybe itself.

Comment author: gwern 14 May 2014 03:09:07AM 10 points [-]

Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

FWIW, I think that would make for a pretty interesting post.

Comment author: EHeller 20 May 2014 03:47:11AM 3 points [-]

Of course we have, it's called AIXI. Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

I think you'll have serious trouble getting an AIXI approximation to do much of anything interesting, let alone misbehave. The computational costs are too high.

Comment author: private_messaging 22 May 2014 04:09:42AM *  -1 points [-]

Do you even know what "monte carlo" means? It means it tries to build a predictor of environment by trying random programs. Even very stupid evolutionary methods do better.

Once you throw away this whole 'can and will try absolutely anything' and enter the domain of practical software, you'll also enter the domain where the programmer is specifying what the AI thinks about and how. The immediate practical problem of "uncontrollable" (but easy to describe) AI is that it is too slow by a ridiculous factor.

Comment author: TheAncientGeek 13 May 2014 08:06:54PM *  -1 points [-]

That would be the AIXI that is uncomputable?

And don't AIs get out of boxes by talking their way out, round here?

Comment author: MugaSofer 20 May 2014 11:31:04AM 0 points [-]

AIXI. Do I need to download a Monte Carlo implementation from Github and run it on a university server with environmental access to the entire machine and show logs of the damn thing misbehaving itself to convince you?

Is that ... possible?

Comment author: MugaSofer 20 May 2014 11:24:16AM 1 point [-]

we haven't discovered anything dangerously unfriendly... Or anything that can't be boxed.

Since many humans are difficult to box, I would have to disagree with you there.

And, obviously, not all humans are Friendly.

An intelligent, charismatic psychopath seems like they would fit both your criteria. And, of course, there is no shortage of them. We can only be thankful they are too rare relative to equivalent semi-Friendly intelligences, and too incompetent, to have done more damage than all the deaths and so on.

Comment author: TheAncientGeek 20 May 2014 01:08:18PM *  0 points [-]

Most humans are easy to box, since they can be contained jn prisons.

How likly is an .AI to be psychopathic that is not designed to be psychopathic?

Comment author: XiXiDu 14 May 2014 08:48:12AM *  -1 points [-]

However, even a cursory look at the actual research literature shows that the mathematically most simple agents (ie: those that get discovered first by rational researchers interested in finding universal principles behind the nature of intelligence) are capital-U Unfriendly, in that they are expected-utility maximizers...

If I believed that anything as simple as AIXI could possibly result in practical general AI, or that expected utility maximizing was at all feasible, then I would tend to agree with MIRI. I don't. And I think it makes no sense to draw conclusions about practical AI from these models.

...if you are talking about the social process of AGI development: plainly, humans want to develop AGI that will do what humans intend for it to do.

This is crucial.

Did you actually expect that in this utterly uncaring universe of blind mathematical laws, you would find that intelligence necessitates certain values?

That's largely irrelevant and misleading. Your autonomous car does not need to feature an encoding of an amount of human values that correspondents to its level of autonomy.

Otherwise, the genie will know, but not care.

That post has been completely debunked.

ETA: Fixed a link to expected utility maximization.

Comment author: TheAncientGeek 14 May 2014 07:38:58PM *  0 points [-]

There's the famous example of the .AI trained to spot tanks that actually leant to spot sunny days. That seems to underlie a lot of MIRI thinking, although at the same time the point is disguised by emphasesing explicit coding over training.

Comment author: RichardKennaway 13 May 2014 01:21:33PM 1 point [-]

The very idea underlying AI is enabling people to get a program to do what they mean without having to explicitly encode all details.

I have never seen AI characterised like that before. Sounds like moonshine to me. Programming languages, libraries, and development environments yes, that's what they're for, but those don't take away the task of having to explicitly and precisely think about what you mean, they just automate the routine grunt work for you. An AI isn't going to superintelligently (that is to say,magically) know what you mean, if you didn't actually mean anything.

Comment author: TheAncientGeek 13 May 2014 02:57:05PM *  0 points [-]

Non AI systems uncontroversially require explicit coding. How would you characterise .AI systems, then?

Comment author: RichardKennaway 14 May 2014 09:05:51AM 0 points [-]

Non AI systems uncontroversially require explicit coding. How would you characterise .AI systems, then?

XiXiDu's characterisation seems suitable enough: programs able to perform tasks normally requiring human intelligence. One might add "or superhuman intelligence", as long as one is not simply wishing for magic there. This is orthogonal to the question of how you tell such a system what you want it to do.

Comment author: TheAncientGeek 14 May 2014 09:16:42AM 0 points [-]

Indeed. But there is a how to-do-it definition of .AI, and it is kind of not aboutt explicit coding, for instance, if a student takes an .AI course as part of a degree, they are not taught explicit coding all over again. They are taught about learning algorithms, neural networks, etc.

Comment author: [deleted] 13 May 2014 05:29:02PM 0 points [-]

They definitely require some amount of explicit coding of their values. You can try to reduce the burden of such explicit value-loading through various indirect means, such as value learning, indirect normativity, extrapolated volition, or even reinforcement learning (though that's the most primitive and dangerous form of value-loading). You cannot, however, dodge the bullet.

Comment author: TheAncientGeek 13 May 2014 05:33:30PM -1 points [-]

Because?

Comment author: XiXiDu 13 May 2014 03:23:38PM *  -2 points [-]

Programming languages, libraries, and development environments yes, that's what they're for, but those don't take away the task of having to explicitly and precisely think about what you mean, they just automate the routine grunt work for you.

What does improvement in the field of AI refer to? I think it isn't wrong to characterize it as the development of programs able to perform tasks normally requiring human intelligence.

I believe that companies like Apple would like their products, such as Siri, to be able to increasingly understand what their customers expect their gadgets to do, without them having to learn programming.

In this context it seems absurd to imagine that when eventually our products become sophisticated enough to take over the world, they will do so due to objectively stupid misunderstandings.

Comment author: RichardKennaway 14 May 2014 08:57:24AM *  0 points [-]

What does improvement in the field of AI refer to? I think it isn't wrong to characterize it as the development of programs able to perform tasks normally requiring human intelligence.

That's a reasonably good description of the stuff that people call AI. Any particular task, however, is just an application area, not the definition of the whole thing. Natural language understanding is one of those tasks.

The dream of being able to tell a robot what to do, and it knowing exactly what you meant, goes beyond natural language understanding, beyond AI, beyond superhuman AI, to magic. In fact, it seems to me a dream of not existing -- the magic AI will do everything for us. It will magically know what we want before we ask for it, before we even know it. All we do in such a world is to exist. This is just another broken utopia.

Comment author: XiXiDu 14 May 2014 12:39:52PM *  0 points [-]

The dream of being able to tell a robot what to do, and it knowing exactly what you meant, goes beyond natural language understanding, beyond AI, beyond superhuman AI, to magic.

I agree. All you need is a robot that does not mistake "earn a college degree" for "kill all other humans and print an official paper confirming that you earned a college degree".

All trends I am aware of indicate that software products will become better at knowing what you meant. But in order for them to constitute an existential risk they would have to become catastrophically worse at understanding what you meant while at the same time becoming vastly more powerful at doing what you did not mean. But this doesn't sound at all likely to me.

What I imagine is that at some point we'll have a robot that can enter a classroom, sit down, and process what it hears and sees in such a way that it will be able to correctly fill out a multiple choice test at the end of the lesson. Maybe the robot will literally step on someones toes. This will then have to be fixed.

What I don't think is that the first robot entering a classroom, in order to master a test, will take over the world after hacking school's WLAN and solving molecular nanotechnology. That's just ABSURD.

Comment author: Furcas 14 May 2014 07:28:52PM -1 points [-]

So there is still a need for "friendly AI". But this is quite different from the idea of interpreting "make humans happy" as "tile the universe with smiley faces".

It just blows my mind that after the countless hours you've spent reading and writing about the Friendly AI problem, not to mention the countless hours people have spent patiently explaining (and re- re- re- re-explaining) it to you, that you still don't understand what the FAI problem is. It's unbelievable.

Comment author: TheAncientGeek 12 May 2014 04:12:27PM 0 points [-]

Yeah, but hardcoding is an easier sell to people who know how to code but have never done .AI... Its like political demagogues selling unworkable but easily understood ideas.

Comment author: [deleted] 12 May 2014 08:21:07PM 0 points [-]

Not really, no. Most people don't recognize the "hidden complexity of wishes" in Far Mode, or when it's their wishes. However, I think if I explain to them that I'll be encoding my wishes, they'll quickly figure out that my attempts to hardcode AI Friendliness are going to be very bad for them. Human intelligence evolved for winning arguments when status, wealth, health, and mating opportunities are at issue: thus, convince someone to treat you as an opponent, and leave the correct argument lying right where they can pick it up, and they'll figure things out quickly.

Hmmm... I wonder if that bit of evolutionary psychology explains why many people act rude and nasty even to those close to them. Do we engage more intelligence when trying to win a fight than when trying to be nice?

Comment author: PhilosophyTutor 07 May 2014 10:43:58PM *  1 point [-]

(EDIT: See below.) I'm afraid that I am now confused. I'm not clear on what you mean by "these traits", so I don't know what you think I am being confident about. You seem to think I'm arguing that AIs will converge on a safe design and I don't remember saying anything remotely resembling that.

EDIT: I think I figured it out on the second or third attempt. I'm not 100% committed to the proposition that if we make an AI and know how we did so that we can definitely make sure it's fun and friendly, as opposed to fundamentally uncontrollable and unknowable. However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them. People who subscribe to various "FOOM" theories about AI coming out of nowhere will probably disagree with this as is their right, but I don't find any of those theories plausible.

I also I hope I didn't give the impression that I thought it was meaningfully possible to create a God-like AI without understanding how to make AI. It's conceivable in that such a creation story is not a logical contradiction like a square circle or a colourless green dream sleeping furiously, but that is all. I think it is actually staggeringly unlikely that we will make an AI without either knowing how to make an AI, or knowing how to upload people who can then make an AI and tell use how they did it.

Comment author: Stuart_Armstrong 12 May 2014 11:17:30AM 0 points [-]

However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them.

Significant is not the same as sufficient. How low do you think the probability of negative AI outcomes is, and what are your reasons for being confident in that estimate?

Comment author: [deleted] 07 May 2014 06:59:15PM 1 point [-]

Why are you confident that an AI that we do develop will not have these traits?

For the same reason a jet engine doesn't have comfy chairs: with all machines, you develop the core physical and mathematical principles first, and then add human comforts.

The core mathematical and physical principles behind AI are believed, not without reason, to be efficient cross-domain optimization. There is no reason for an arbitrarily-developed Really Powerful Optimization Process to have anything in its utility function dealing with human morality; in order for it to be so, you need your AI developers to be deliberately aiming at Friendly AI, and they need to actually know something about how to do it.

And then, if they don't know enough, you need to get very, very, very lucky.

Comment author: TheAncientGeek 07 May 2014 07:13:20PM *  0 points [-]

That's what happens when Friendly is used to mean both Fun and Safe.

Early jets didn't have comfy chairs, but they did have electors seats. Safety was a concern.

If an .AI researchers feels their .AI might kill them, they will have every motivation to build in safety features.

That has nothing g to do with making an .AI Your Plastic Pal Who's Fun To Be With.

Comment author: [deleted] 07 May 2014 07:29:22PM 1 point [-]

It's an open question whether we could construct a utility function that is, in the ultimate analysis, Safe without being Fun.

Personally, I'm almost hoping the answer is no. I'd love to see the faces of all the world's Very Serious People as we ever-so-seriously explain that if they don't want to be killed to the last human being by a horrible superintelligent monster, they're going to need to accept Fun as their lord and savior ;-).

Comment author: TheAncientGeek 07 May 2014 07:39:05PM -2 points [-]

Almost everything about FAI is anon question. What's you get ifyou multiply a bunch of open questions together?

Comment author: TheAncientGeek 07 May 2014 06:33:13PM 1 point [-]

MIRIs arguments aren't about deliberate weaponisation, they are about the inadvertent creation of dangerous .AI by competent and well intentioned people.

The weaponisation of .AI has almost happenedalready the form of stuxnet and it is significant that there were a lot safeguards built into it. .AI researchers seemed be aware enough.