Comment author: gattsuru 06 September 2013 03:37:48AM 4 points [-]

Which approach gives a higher expected value? Formal specification is compatible with Eliezer's ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief. "Tell the AI in English" can fail, but the worst case is closer to a "With Folded Hands" scenario than to paperclips.

I don't think that's how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.

Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don't believe his position says that a near-Friendly-Complete specification is very risky -- after all, a "smart" AGI would know what you really meant -- but that such a specification would be superfluous.

Whether Mr. Loosemore is correct isn't cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.

The problem is that I don't see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of "human pleasure = brain dopamine levels", not least of all because there are people who'd want to be wireheads and there's a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.

Comment author: Peterdjones 12 September 2013 10:37:25AM 1 point [-]

The problem is that I don't see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of "human pleasure = brain dopamine levels", not least of all because there are people who'd want to be wireheads and there's a massive amount of physiological research showing human pleasure to be caused by dopamine levels.

I don't think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn't been either. Both are addressing intentionally friendly or neutral AI that goes wrong.

I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.

Wouldn't it care about getting things right?

Comment author: linkhyrule5 12 September 2013 07:28:57AM 0 points [-]

Granted. And it may be that additional knowledge/intelligence makes yourself more vulnerable a Gatekeeper.

Comment author: Peterdjones 12 September 2013 08:24:20AM 0 points [-]

Trying to think this out in terms of levels of smartness alone is very unlikely to be helpful.

Comment author: RobbBB 10 September 2013 06:08:50PM 4 points [-]

It's a problem of sequence. The superintelligence will be able to solve Semantics-in-General, but at that point if it isn't already safe it will be rather late to start working on safety. Tasking the programmers to work on Semantics-in-General makes things harder if it's a more complex or roundabout way of trying to address Indirect Normativity; most of the work on understanding what English-language sentences mean can be relegated to the SI, provided we've already made it safe to make an SI at all.

Comment author: Peterdjones 11 September 2013 08:07:03AM 0 points [-]

Then solve semantics in a seed.

Comment author: Broolucks 10 September 2013 05:34:38PM *  3 points [-]

Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.

That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the AI could even be considered intelligent. Even the thoroughly mediocre AI that currently exists tries to guess what you mean, e.g. by giving you directions to the closest Taco Bell, or guessing whether you mean AM or PM. This is not anthropomorphism: doing what we want is a sine qua non condition for AI to prosper.

Suppose that you ask me to knit you a sweater. I could take the instruction literally and knit a mini-sweater, reasoning that this minimizes the amount of expended yarn. I would be quite happy with myself too, but when I give it to you, you're probably going to chew me out. I technically did what I was asked to, but that doesn't matter, because you expected more from me than just following instructions to the letter: you expected me to figure out that you wanted a sweater that you could wear. The same goes for AI: before it can even understand the nuances of human happiness, it should be good enough to knit sweaters. Alas, the AI you describe would make the same mistake I made in my example: it would knit you the smallest possible sweater. How do you reckon such AI would make it to superintelligence status before being scrapped? It would barely be fit for clerk duty.

My answer: who knows? We've given it a deliberately vague goal statement (even more vague than the last one), we've given it lots of admittedly contradictory literature, and we've given it plenty of time to self-modify before giving it the goal of self-modifying to be Friendly.

Realistically, AI would be constantly drilled to ask for clarification when a statement is vague. Again, before the AI is asked to make us happy, it will likely be asked other things, like building houses. If you ask it: "build me a house", it's going to draw a plan and show it to you before it actually starts building, even if you didn't ask for one. It's not in the business of surprises: never, in its whole training history, from baby to superintelligence, would it have been rewarded for causing "surprises" -- even the instruction "surprise me" only calls for a limited range of shenanigans. If you ask it "make humans happy", it won't do jack. It will ask you what the hell you mean by that, it will show you plans and whenever it needs to do something which it has reasons to think people would not like, it will ask for permission. It will do that as part of standard procedure.

To put it simply, an AI which messes up "make humans happy" is liable to mess up pretty much every other instruction. Since "make humans happy" is arguably the last of a very large number of instructions, it is quite unlikely that an AI which makes it this far would handle it wrongly. Otherwise it would have been thrown out a long time ago, may that be for interpreting too literally, or for causing surprises. Again: an AI couldn't make it to superintelligence status with warts that would doom AI with subhuman intelligence.

Comment author: Peterdjones 10 September 2013 06:14:37PM -2 points [-]

That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to.

We want to select Ais that are friendly, and understand us, and this has already started happenning.

Comment author: player_03 10 September 2013 06:22:44AM *  4 points [-]

I posted elsewhere that this post made me think you're anthropomorphizing; here's my attempt to explain why.

egregiously incoherent behavior in ONE domain (e.g., the Dopamine Drip scenario)

the craziness of its own behavior (vis-a-vis the Dopamine Drip idea)

if an AI cannot even understand that "Make humans happy" implies that humans get some say in the matter

Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.

As FeepingCreature pointed out, that solution would in fact make people happy; it's hardly inconsistent or crazy. The AI could certainly predict that people wouldn't approve, but it would still go ahead. To paraphrase the article, the AI simply doesn't care about your quibbles and concerns.

For instance:

people might consider happiness to be something that they do not actually want too much of

Yes, but the AI was told, "make humans happy." Not, "give humans what they actually want."

people might be allowed to be uncertain or changeable in their attitude to happiness

Yes, but the AI was told, "make humans happy." Not, "allow humans to figure things out for themselves."

subtleties implicit in that massive fraction of human literature that is devoted to the contradictions buried in our notions of human happiness

Yes, but blah blah blah.


Actually, that last one makes a point that you probably should have focused on more. Let's reconfigure the AI in light of this.

The revised AI doesn't just have natural language parsing; it's read all available literature and constructed for itself a detailed and hopefully accurate picture of what people tend to mean by words (especially words like "happy"). And as a bonus, it's done this without turning the Earth into computronium!

This certainly seems better than the "literal genie" version. And this time we'll be clever enough to tell it, "give humans what they actually want." What does this version do?

My answer: who knows? We've given it a deliberately vague goal statement (even more vague than the last one), we've given it lots of admittedly contradictory literature, and we've given it plenty of time to self-modify before giving it the goal of self-modifying to be Friendly.

Maybe it'll still go for the Dopamine Drip scenario, only for more subtle reasons. Maybe it's removed the code that makes it follow commands, so the only thing it does is add the quote "give humans what they actually want" to its literature database.

As I said, who knows?


Now to wrap up:

You say things like "'Make humans happy' implies that..." and "subtleties implicit in..." You seem to think these implications are simple, but they really aren't. They really, really aren't.

This is why I say you're anthropomorphizing. You're not actually considering the full details of these "obvious" implications. You're just putting yourself in the AI's place, asking yourself what you would do, and then assuming that the AI would do the same.

Comment author: Peterdjones 10 September 2013 05:47:38PM *  0 points [-]

My answer: who knows? We've given it a deliberately vague goal statement (even more vague than the last one), we've given it lots of admittedly contradictory literature, and we've given it plenty of time to self-modify before giving it the goal of self-modifying to be Friendly.

Humans generally manage with those constraints. You seem to be doing something that is kind of the opposite of anthropomorphising -- treatiing an entity that is stipulated as having at least human intelligence as if were as literal and rigid as a non-AI computer.

Comment author: RobbBB 10 September 2013 05:16:11PM *  3 points [-]

So it's impossible to directly or indirectly code in the compex thing called semantics, but possible to directly or indirectly code in the compex thing called morality?

Read the first section of the article you're commenting on. Semantics may turn out to be a harder problem than morality, because the problem of morality may turn out to be a subset of the problem of semantics. Coding a machine to know what the word 'Friendliness' means (and to care about 'Friendliness') is just a more indirect way of coding it to be Friendly, and it's not clear why that added indirection should make an already risky or dangerous project easy or safe. What does indirect indirect normativity get us that indirect normativity doesn't?

Comment author: Peterdjones 10 September 2013 05:25:46PM *  0 points [-]

Semantcs isn't optional. Nothing could qualify as an AGI,let alone a super one, unless it could hack natural language. So Loosemore architectures don't make anything harder, since semantics has to be solved anyway.

Comment author: RobbBB 10 September 2013 04:46:04PM *  1 point [-]

"code in the high-level sentence, and let the AI figure it out."

http://lesswrong.com/lw/rf/ghosts_in_the_machine/

"Maybe we gave it the low-level expansion of 'happy' that we or our seed AI came up with 'together with' an instruction that it is meant to capture the meaning of the high-level statement"

If the AI is too dumb to understand 'make us happy', then why should we expect it to be smart enough to understand 'figure out how to correctly understand "make us happy", and then follow that instruction'? We have to actually code 'correctly understand' into the AI. Otherwise, even when it does have the right understanding, that understanding won't be linked to its utility function.

"Maybe the AI will value getting things right because it is rational."

http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/

Comment author: Peterdjones 10 September 2013 05:06:45PM 1 point [-]

"code in the high-level sentence, and let the AI figure it out."

http://lesswrong.com/lw/rf/ghosts_in_the_machine/

So it's impossible to directly or indirectly code in the compex thing called semantics, but possible to directly or indirectly code in the compex thing called morality? What? What is your point? You keep talking as if I am suggesting there is someting that can be had for free, without coding. I never even remotely said that.

If the AI is too dumb to understand 'make us happy', then why should we expect it to be smart enough to understand 'figure out how to correctly understand "make us happy", and then follow that instruction'? We have to actually code 'correctly understand' into the AI. Otherwise, even when it does have the right understanding, that understanding won't be linked to its utility function.

I know. A Loosemore architecture AI has to treat its directives as directives. I never disputed that. But coding "follow these plain English instructions" isn't obviously harder or more fragile than coding "follow <<long expansion of human preferences>>". And it isn't trivial, and I didn't say it was.

Comment author: wedrifid 10 September 2013 12:54:57PM 1 point [-]

Could you point the interested reader to your critique of his work?

Comments can likely be found on this site from years ago. I don't recall anything particularly in depth or memorable. It's probably better to just look at things that Ben Goertzel says and making one's own judgement. The thinking he expresses is not of the kind that impresses me but other's mileage may vary.

I don't begrudge anyone their right to their beauty contests but I do observe that whatever it is that is measured by identifying the degree of affiliation with Ben Goertzel is something wildly out of sync with the kind of thing I would consider evidence of credibility.

Comment author: Peterdjones 10 September 2013 01:30:24PM -2 points [-]

Is there some assumption here that association with Ben Goertzel should be considered evidence in favour of an individual's credibility on AI?

If one's mileage varies, why not?

Comment author: wedrifid 10 September 2013 11:46:10AM -2 points [-]

At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel's status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.

Is there some assumption here that association with Ben Goertzel should be considered evidence in favour of an individual's credibility on AI? That seems backwards.

Comment author: Peterdjones 10 September 2013 12:33:44PM -1 points [-]

Goertzel appears to be a respected figuer in the field. Could you point the interested reader to your critique of his work?

[Link] The Bayesian argument against induction.

4 Peterdjones 18 July 2011 09:52PM

In 1983 Karl Popper and David Miller published an argument to the effect that probability theory could be used to disprove induction. Popper had long been an opponent of induction. Since probability theory in general, and Bayes in particular is often seen as rescuing induction from the standard objections, the argument is significant.

It is being discussed over at the Critical Rationalism site.

View more: Prev | Next