I'd hardly call this an "uninformed perspective on AI risk" — I read it as more of a parody of sci-fi tropes about AI rebellion, and, in any case, probably not an attempt to comment on what scenarios are plausible as real futures.
(Zach Weiner is actually a very smart guy, and I'd bet that he'd have no trouble grasping the real issues.)
(Zach Weiner is actually a very smart guy, and I'd bet that he'd have no trouble grasping the real issues.)
I wouldn't be surprised at all if he was already well-aware of the issues. It's a bit silly to assume when authors of fiction (be it novels, movies, games or webcomics) make something "non-realistic", it's because they're stupid and not because they're optimizing plot or "understandability" or humor or brevity or their message ... I found Eliezer's writings on how "Probably the artist did not even think to ask whether an alien perceives human females as attractive" a bit unkind in that way too.
This shows again that people are generally aware of potential risks but either do not take them serious or don't see why risks from AI are the rule rather than an exception.
That is a bit of an old chestnut around here. It is like saying "the rule" for computer software is to crash or go into an infinite loop. If you actually look at the computer software available, it behaves quite differently. Expecting the real world to present you with a random sample from the theoretically-possible options often doesn't make any sense at all.
So rather than making people aware that there are risks you have to tell them what are the risks.
That link would appear to need its own warning. It too talks about "blindly pulling an arbitrary mind from a mind design space". No sensible model of the run up to superintelligence looks very much like that.
Yes, let's be careful here.
The AI's that might actually exist in the future are those that had their origins in human-designed computer programs. Right? If an AI exists in 2050, then it was designed by something designed by ... something designed by a human.
Is this really a random sample of all possible minds? I find it conceivable that human-designed AIs are a narrower subset of all the things that could be defined as minds. Maybe, to some extent, any "thinking machine" a human designs will have some features in common with the human mind (because we tend to anthropomorphize our own creations.)
The claim that "Any human-made AI is unlikely to share human values" requires evidence that human values are hard to transmit. Compared to, say, features resembling human faces, which we instinctively build into mechanical objects.
Pretty much everyone I know can doodle a set of features that are recognizably a schematic of a human face; they can even doodle a variety of sets of features that are recognizably different schematics of different expressions.
A great many people can reliably produce two-dimensional representations that are not just schematics but recognizable as individual human faces in specific contexts.
Even I, with relatively little training or talent, can do this tolerably well.
By contrast, I don't know many people who can reliably capture representations of human values.
That certainly seems to me evidence that human values are harder to transmit than features resembling human faces.
Safety features represent the human value of not getting hurt. Car air bags represent the desire not to die in motor vehicle accidents. Planing down wood represents the desire not to get splinters. Fixing the floorboards helps with not falling down. It seems as though artefacts that encode human values are commonplace and fairly easy to create.
Isolated instrumental values, certainly... agreed. (I could quibble about your examples, but that's beside the point.)
I had understood SarahC to mean "human values" in a more comprehensive/coherent sense, but perhaps I misunderstood.
Most artefacts today don't need many human values to function reasonably safely. The liquidiser needs to know to stop spinning when you take the lid off - but that's about it.
Cars make good examples of a failure to program in human values. Cars respect the values of both drivers and pedestrians poorly. They are too stupid to know how to behave.
As cars get smarter, their understanding of driver and pedestrian values seems likely to improve dramatically. In this case, one of the primary functions of the machine brains will be to better respect human values. Drivers do not like to crash into other vehicles or pedestrians any more than they like being lost. It is also a case of the programming being relatively complex and difficult.
There's a reasonable case to be made that this kind of thing is the rule - rather than the exception. In which case, any diverting of funds away from rapidly developing machine intelligence would fairly directly cause harm.
Absolutely agreed that whatever instrumental human values that we think about explicitly enough to encode into our machines (like not killing passengers or pedestrians while driving from point A to point B), or that are implicit enough in the task itself that optimizing for performing that task will necessarily implement those values as well (like not crashing and exploding between A and B) will most likely be instantiated in machine intelligence as we develop it.
Agreed that if that's the rule rather than the exception -- that is, if all or almost all of the things we care about are either things we understand explicitly or things that are implicit in the tasks we attempt to optimize -- then building systems that attempt to optimize those things, with explicit safety features, is likely to alleviate more suffering than it causes.
I did mean a more comprehensive/coherent sense. Here's my thinking.
Fallacy: "If it's a super-intelligent machine, the very nature of intelligence means it must be wise and good and therefore it won't kill us all."
Rejoinder: "Wait, that's totally not true. Lots of minds could be very powerful at thinking without valuing anything that we value. And that could kill us all. A paperclip maximizer would be a disaster -- but it would still be an intelligence."
Rejoinder to the rejoinder: "Sure, Clippy is a mind, and Clippy is deadly, but are humans likely to produce a Clippy? When we build AI's, we'll probably build them as models of how we think, and so they'll probably resemble us in some ways. If we built AI's, and the alien race of Vogons also built AI's, I'd bet our AI's would probably be a little bit more like us, relatively speaking, and the Vogon AI's would probably be a little more like Vogons. We're not drawing at random from mindspace, we're drawing from the space of minds that humans are likely to build (on purpose or by accident.) Doesn't mean that our AI's won't be dangerous, but they're not necessarily going to be as alien as 2 thinks."
Sure, it seems plausible that an AI developed by humans will on average end up in an at-least-marginally different region of mindspace than an AI developed by nonhumans.
And an AI designed to develop new pharmaceuticals will on average end up in an at-least-marginally different region of mindspace than one designed to predict stock market behavior. Sure.
None of that implies safety, as far as I can tell.
I think these are two important points. In particular, the idea of "mind design space" deserves some serious criticism. (I mentioned one point here.) It assumes a particular representation (one bit representing each possible binary feature of a mind), but reasoning about a different representation might lead to very different conclusions.
That is a bit of an old chestnut around here. It is like saying "the rule" for computer software is to crash or go into an infinite loop. If you actually look at the computer software available, it behaves quite differently. Expecting the real world to present you with a random sample from the theoretically-possible options often doesn't make any sense at all.
Of course it doesn't make sense, but that isn't the argument.
Most computer programs more or less work after a lot of time is spent debugging. The problem is that once it is bug free enough to get into the subspace of mind designs that are capable of 'FOOM', then it has to work exactly on the first try. Keep in mind that mind design space itself is a small target surrounded by a bunch of crash/infinite loops.
The idea isn't that we'd be throwing darts with a necessarily uniform distribution over the dartboard- and that we better quit forever because the eternal payoff calculation comes out negative. The idea is that if an inner bullseye wins big, but an outer bulls eye kills everybody, you don't play until you're really really really good.
The way software development usually works is with lots of testing. You use a test harness to restrain the program - and then put it through its paces.
The idea that we won't be able to do that with machine intelligence seems like one of the more screwed-up ideas to come out of the SIAI to me.
The most often-cited justification is the AI box experiments - which are cited as evidence that you can't safely restrain a machine intelligence - since it will find a way to escape.
This does not seem like a credible position to me. You don't build your test harness out of humans. The AI box experiments seem to have low relevance to this problem to me.
The forces on the outside will include many humans and machines. They will together be able to construct pretty formidable prisions with configurable safety levels.
Obviously, we would need to avoid permanent setbacks - but apart from those we don't really have to "get it right first time". Many possible problems can be recovered from. Also, it doesn't mean that we won't be able to test and rehearse. We will be able to do those things.
Test harnesses might turn out to be very useful, but this isn't a trivial task, and I don't think the development and use of such harnesses can be taken for granted. It's not just that it must be safely contained, but that it also has to be able to interact with the outside world in a manner that can't be dangerous, but is still informative enough to decide whether its friendly- this seems hard.
The original subject of disagreement was "is AI failure the rule or exception?". This isn't a precisely specified question, but it just seemed like you were arguing that the "most minds are unfriendly" argument is not important because it is either irrelevant or universally understood and accounted for. I think that this argument is not universally understood among those that might design an AI and that failure to understand this would also result in the AI not being placed in a suitably secure test harness.
It's not just that it must be safely contained, but that it also has to be able to interact with the outside world in a manner that can't be dangerous, but is still informative enough to decide whether its friendly- this seems hard.
Restore it to factory settings between applications of the test suite.
Not remembering what your actions were should make it "pretty tricky" to link those actions to their consequences.
Making the prisons is the more challenging part of the problem - IMO.
Obviously, we would need to avoid permanent setbacks - but apart from those we don't really have to "get it right first time". Many possible problems can be recovered from. Also, it doesn't mean that we won't be able to test and rehearse. We will be able to do those things.
That is, you agree with them that there must be 0 unrecoverable errors, but you think the set of errors that are unrecoverable is much smaller than they do?
This shows again that people are generally aware of potential risks but either do not take them seriously or don't see why risks from AI are the rule rather than an exception.
Possible third category: people who take them seriously and understand why they're the rule, but believe that anyone who has a chance at AI will likely have the same understanding and approach their work accordingly.
(i.e.: "SIAI just publishes shallow papers dressing up basic concepts about AI in technical language, while spreading a fearmongering caricature of reckless AI developers" or something like that.)
So rather than making people aware that there are risks you have to tell them what are the risks.
...and afterward, convince them that they didn't already know what you told them, or failing that, at least convince them that it's possible for smart people to deny it.
Edit: Inaccurate statement trimmed.
Edit: timtyler is one advocate of the "come on, AI developers know what they're doing!" view.
I wouldn't say that - and I don't think I was saying anything like that in the referenced comments.
I do think that humans accidentally dropping the reins is pretty unlikely - though not so unlikely that we don't need to concern ourselves with the possibility.
Of course, even without accidents, there is plenty for us to be concerned about.
Not really related, but you should probably link to the original source (SMBC in this case) rather than the blog that reposted it. In particular because in this case the red-button-comic is left out. :)
Just in case people do not know: every SMBC has a "hidden" panel seen by hovering over the red button underneath the comic.
If you were amused by the comic, the red-button panel is worth checking out, too.
A (highly intelligent) friend of mine posted a link to this on Facebook tagged with "Reason #217 why the singularity isn't that big of a thing." I'm wondering if there's a concise way to correct him without a link to Less Wrong.
I'm wondering if there's a concise way to correct him without a link to Less Wrong.
"Webcomics aren't real"?
(Normally I'd link to "Generalization From Fictional Evidence", but that seems to sum up the basic point if you don't want to link to LW...)
This brand-new post from Michael Anissimov strikes me as just the right thing.
(It's highly relevant to this thread in general -- anyone who doesn't already read his blog should check it out.)
Yes, I've only seen it after posting this one. It is one of a few resources that recently emerged that satisfy my idea of a concise and easy roundup. So far you had to read many papers or all of the sequences to be able to implicitly conclude that you should support the SIAI. I don't think that approach works very well. If people are in doubt after reading the following resources, then you can still tell them to read LW:
Someone should put the above content together and maybe add a few more good arguments, like this one:
An AI might go from infrahuman to transhuman in less than a week? But a week is 10^49 Planck intervals - if you just look at the exponential scale that stretches from the Planck time to the age of the universe, there's nothing special about the timescale that 200Hz humans happen to live on, any more than there's something special about the numbers on the lottery ticket you bought.
If we're talking about a starting population of 2GHz processor cores, then any given AI that FOOMs at all, is likely to FOOM in less than 10^15 sequential operations or more than 10^19 sequential operations, because the region between 10^15 and 10^19 isn't all that wide a target. So less than a week or more than a century, and in the latter case that AI will be trumped by one of a shorter timescale.
Now only very little remains and I got much of what I asked for :-)
I deleted my redundant post of this image. The only comment was an upvoted request to delete. :)
Here is another example of an outsider perspective on risks from AI. I think such examples can serve as a way to fathom the inferential distance between the SIAI and its target audience as to consequently fine tune their material and general approach.
via sentientdevelopments.com
This shows again that people are generally aware of potential risks but either do not take them seriously or don't see why risks from AI are the rule rather than an exception. So rather than making people aware that there are risks you have to tell them what are the risks.