Comment author: Eliezer_Yudkowsky 09 September 2013 05:25:56PM 5 points [-]

XiXiDu wasn't attempting or requesting anonymity - his LW profile openly lists his true name - and Alexander Kruel is someone with known problems (and a blog openly run under his true name) whom RobbBB might not know offhand was the same person as "XiXiDu" although this is public knowledge, nor might RobbBB realize that XiXiDu had the same irredeemable status as Loosemore.

I would not randomly out an LW poster for purposes of intimidation - I don't think I've ever looked at a username's associated private email address. Ever. Actually I'm not even sure offhand if our registration process requires/verifies that or not, since I was created as a pre-existing user at the dawn of time.

I do consider RobbBB's work highly valuable and I don't want him to feel disheartened by mistakenly thinking that a couple of eternal and irredeemable semitrolls are representative samples. Due to Civilizational Inadequacy, I don't think it's possible to ever convince the field of AI or philosophy of anything even as basic as the Orthogonality Thesis, but even I am not cynical enough to think that Loosemore or Kruel are representative samples.

Comment author: Juno_Watt 22 January 2014 12:01:57PM *  -1 points [-]

Alexander Kruel is someone with known problems

Hmmm...

Comment author: Yvain 14 March 2009 12:34:00AM *  40 points [-]

A lot of dojos preserve to some degree the social standards of Eastern countries where the sensei's sensei came from. And in Eastern countries, it's much less acceptable to try to question your teacher, or change things, or rock the boat, or show any form of weakness. I taught school in Japan for a while, and the first thing I learned was that naively asking "Any questions?" or "Any opinions on this?" or even "Anyone not understand?" was a waste of time.

Western cultures are a lot better at this, but not ideal. There's still pressure not to be the one person who asks all the questions all the time, and there's pressure not to say anything controversial out of the blue because you lose more status if you're wrong than you gain if you're right. I think part of the problem is that there really are dumb or egotistical people who, if given the chance will protest that they know a much better way to do everything and will waste the time of everyone else, and our society's decided to .make a devil's bargain to keep them under control.

The best solution to this is to found a new culture, live isolated from the rest of the world for a century developing different cultural norms, and then start the rationality dojo there. Of possible second-best solutions:

  • My Favorite Liar. Tell people that you're going to make X deliberately incorrect statements every training session and they've got to catch them.

  • Clickers. One of my lecturers uses these devices sort of like remote controls. You can input information into them and it gets sent wirelessly and anonymously to the lecturer's laptop. The theory is that if he says "Raise your hand if you don't understand this" or even "...if you disagree with this", no one will, but if he says "Enter whether or not you understand this into your clicker" he may get three or four "don't understand" responses. Anonymous suggestion boxes are a low-tech form of the same principle.

  • I always found the concept of Crocker's Rules very interesting. I also remember hearing of a community (wish I could remember which) in which it was absolutely forbidden to give negative feedback under certain circumstances, and the odd social dynamics that created. In a dojo-like setting, there might be situations when either of these two rules could be ritually enacted - for example, a special Crocker Hat, such that anyone wearing that hat was known to be under Crocker's Rules, and a special No Negative Feedback Hat (but with a flashier name, like White Crane Hat of Social Invincibility), which someone could wear when questioning the master or something and be absolutely immune to any criticism.

Comment author: Juno_Watt 28 September 2013 01:09:16PM 0 points [-]

My Favorite Liar. Tell people that you're going to make X deliberately incorrect statements every training session and they've got to catch them.

I can think of only one example of someone who actually did this, and that was someone generally classed a a mystic.

Comment author: RobbBB 12 September 2013 05:36:28PM *  1 point [-]

That problem has got to be solved somehow at some stage, because something that couldn't pass a Turing Test is no AGI.

Not so! An AGI need not think like a human, need not know much of anything about humans, and need not, for that matter, be as intelligent as a human.

To see this, imagine we encountered an alien race of roughly human-level intelligence. Would a human be able to pass as an alien, or an alien as a human? Probably not anytime soon. Possibly not ever.

(Also, passing a Turing Test does not require you to possess a particularly deep understanding of human morality! A simple list of some random things humans consider right or wrong would generally suffice.)

Why is that a problem? Is anyone suggesting AGI can be had for free?

The problem I'm pointing to here is that a lot of people treat 'what I mean' as a magical category. 'Meaning' and 'language' and 'semantics' are single words in English, which masks the complexity of 'just tell the AI to do what I mean'.

Ok. NL is hard. Everyone knows that. But its got to be solved anyway.

Nope!

Yeah. But it wouldn't be an AGI or an SI if it couldn't pass a TT.

It could certainly be an AGI! It couldn't be an SI -- provided it wants to pass a Turing Test, of course -- but that's not a problem we have to solve. It's one the SI can solve for itself.

A problem which has been solved over and over by humans.

No human being has ever created anything -- no system of laws, no government or organization, no human, no artifact -- that, if it were more powerful, would qualify as Friendly. In that sense, everything that currently exists in the universe is non-Friendly, if not outright Unfriendly.

Humans don't need to be loaded apriori with what makes other humans happy, they only need to know general indicators, like smiles and statements of approval.

All or nearly all humans, if they were more powerful, would qualify as Unfriendly.

Moreover, by default, relying on a miscellaneous heap of vaguely moral-sounding machine learning criteria will lead to the end of life on earth. 'Smiles' and 'statements of approval' are not adequate roadmarks, because those are stimuli the SI can seize control of in unhumanistic ways to pump its reward buttons.

"Intelligence on its own does not imply Friendliness."

That is an open question.

No, it isn't. And this is a non sequitur. Nothing else in your post calls orthogonality into question.

Comment author: Juno_Watt 13 September 2013 08:57:36AM *  1 point [-]

Not so! An AGI need not think like a human, need not know much of anything about humans, and need not, for that matter, be as intelligent as a human.

Is that a fact? No, it's a matter of definition. It's scarecely credible you are unaware that a lot of people think the TT is critical to AGI.

The problem I'm pointing to here is that a lot of people treat 'what I mean' as a magical category.

I can't see any evidence of anyone invlolved in these discussions doing that. It looks like a straw man to me.

Ok. NL is hard. Everyone knows that. But its got to be solved anyway.

Nope!

An AI you can't talk to has pretty limited usefulness, and it has pretty limited safety too, since you don;t even have the option of telling it to stop, or expaling to it why you don;t like what it is doing. Oh, and isn't EY assumign that an AGi will have NLP? After all, it is supposed to be able to talk its way out of the box.

It's one the SI can solve for itself.

It can figure out semantics for itslef. Values are a subsert of semantics...

No human being has ever created anything -- no system of laws, no government or organization, no human, no artifact -- that, if it were more powerful, would qualify as Friendly. I

Wherer do you get this stuff from? Modern societies, with their complex legal and security systems are much less violent than ancient socieites. To take ut one example.

All or nearly all humans, if they were more powerful, would qualify as Unfriendly.

Gee. Then I guess they don't have an architecutre with a basic drive to be friendly.

'Smiles' and 'statements of approval' are not adequate roadmarks, because those are stimuli the SI can seize control of in unhumanistic ways to pump its reward buttons.

Why don't humans do that?

No, it isn't.

Uh-huh. MIRI has settled that centuries-aold quesiton for once and all has it?

And this is a non sequitur.

It can't be a non-sequitur, since it is not an arguemnt but a statement of fact.

Nothing else in your post calls orthogonality into question.

So? It wasn't relevant anywhere else.

Comment author: JGWeissman 16 May 2012 05:13:20PM 2 points [-]

If an agent with goal G1 acquires sufficient "philosophical ability", that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue goal G. For that to happen, I find it unlikely that goal G is anything other than a clarification of goal G1 in light of some confusion revealed by the "philosophical ability", and I find it extremely unlikely that there is some universal goal G that works for any goal G1.

Comment author: Juno_Watt 12 September 2013 04:36:31PM -3 points [-]

Let G1="Figure out the right goal to have"

Comment author: JGWeissman 16 May 2012 06:31:24PM 4 points [-]

One major element of philosophical reasoning seems to be a distaste for and tendency to avoid arbitrariness.

If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal?

I suspect we humans are driven to philosophize about what our goals ought to be by our lack of introspective access, and that searching for some universal goal, rather than what we ourselves want, is a failure mode of this philosophical inquiry.

Comment author: Juno_Watt 12 September 2013 04:30:16PM -1 points [-]

If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal?

Avoiding arbitrariness is useful to epistemic rationality and therefore to instrumental rationality. If an AI has rationality as a goal it will avoid arbitrariness, whether or not that assists with G1.

Comment author: ArisKatsaris 05 September 2013 05:04:48PM *  7 points [-]

But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting "meditating" to "masturbating"?

As Robb said you're confusing mistake in the sense of "The program is doing something we don't want to do" with mistake in the sense of "The program has wrong beliefs about reality".

I suppose a different way of thinking about these is "A mistaken human belief about the program" vs "A mistaken computer belief about the human". We keep talking about the former (the program does something we didn't know it would do), and you keep treating it as if it's the latter.

Let's say we have a program (not an AI, just a program) which uses Newton's laws in order to calculate the trajectory of a ball. We want it to calculate this in order to have it move a tennis racket and hit the ball back. When it finally runs, we observe that the program always avoids the ball rather than hit it back. Is it because it's calculating the trajectory of the ball wrongly? No, it calculates the trajectory very well indeed, it's just that an instruction in the program was wrongly inserted so that the end result is "DO NOT hit the ball back".

It knows what the "trajectory of the ball" is. It knows what "hit the ball" is. But it's program is "DO NOT hit the ball" rather than "hit the ball". Why? Because of a human mistaken belief on what the program would do, not the program's mistaken belief.

Comment author: Juno_Watt 12 September 2013 04:07:56PM -4 points [-]

And you are confusing self-improving AIs with conventional programmes.

Comment author: RobbBB 05 September 2013 04:23:35PM *  13 points [-]

But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting "meditating" to "masturbating"?

Those are only 'mistakes' if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it's not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).

A machine programmed to terminally value the outputs of a modern-day autocorrect will never self-modify to improve on that algorithm or its outputs (because that would violate its terminal values). The fact that this seems silly to a human doesn't provide any causal mechanism for the AI to change its core preferences. Have we successfully coded the AI not to do things that humans find silly, and to prize un-silliness before all other things? If not, then where will that value come from?

A belief can be factually wrong. A non-representational behavior (or dynamic) is never factually right or wrong, only normatively right or wrong. (And that normative wrongness only constrains what actually occurs to the extent the norm is one a sufficiently powerful agent in the vicinity actually holds.)

Maybe that distinction is the one that's missing. You're assuming that an AI will be capable of optimizing for true beliefs if and only if it is also optimizing for possessing human norms. But, by the is/ought distinction, there is no true beliefs about the physical world that will spontaneously force a being that believes it to become more virtuous, if it didn't already have a relevant seed of virtue within itself.

Comment author: Juno_Watt 12 September 2013 04:06:29PM -2 points [-]

Those are only 'mistakes' if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it's not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).

You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.

A self-improving AI needs a goal. A goal of self-improvement alone would work. A goal of getting things right in general would work too, and be much safer, as it would include getting our intentions right as a sub-goal.

Comment author: nshepperd 05 September 2013 12:31:27PM 6 points [-]

GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.

GAI is a program. It always does what it's programmed to do. That's the problem—a program that was written incorrectly will generally never do what it was intended to do.

FWIW, I find your statements 3,4,5 also highly objectionable, on the grounds that you are lumping a large class of things under the blank label "errors". Is an "error" doing something that humans don't want? Is it doing something the agent doesn't want? Is it accidentally mistyping a letter in a program, causing a syntax error, or thinking about something heuristically and coming to the wrong conclusion, then making carefully planned decision based on that mistake? Automatic proof systems don't save you if you what you think you need to prove isn't actually what you need to prove.

Comment author: Juno_Watt 12 September 2013 03:17:33PM -4 points [-]

GAI is a program. It always does what it's programmed to do. That's the problem—a program that was written incorrectly will generally never do what it was intended to do.

So self-correcting software is impossible. Is self improving software possible?

Comment author: ArisKatsaris 05 September 2013 10:25:21AM *  10 points [-]

Present-day software is better than previous software generations at understanding and doing what humans mean.

http://www.buzzfeed.com/jessicamisener/the-30-most-hilarious-autocorrect-struggles-ever
No fax or photocopier ever autocorrected your words from "meditating" to "masturbating".

Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.

Every bit of additional functionality requires huge amounts of HUMAN development and testing, not in order to compile and run (that's easy), but in order to WORK AS YOU WANT IT TO.

I can fully believe that a superhuman intelligence examining you will be fully capable of calculating "what you mean" "what you want" "what you fear" "what would be funniest for a buzzfeed artcle if I pretended to misunderstand your statement as meaning" "what would be best for you according to your values" "what would be best for you according to your cat's values" "what would be best for you according to Genghis Khan's values" .

No program now cares about what you mean. You've still not given any reason for the future software to care about "what you mean" over all those other calculation either.

Comment author: Juno_Watt 12 September 2013 03:13:26PM -3 points [-]

You've still not given any reason for the future software to care about "what you mean" over all those other calculation either.

Software that cares what you mean will be selected for by market forces.

Comment author: nshepperd 05 September 2013 12:17:29PM 15 points [-]

Present day software is a series of increasing powerful narrow tools and abstractions. None of them encode anything remotely resembling the values of their users. Indeed, present-day software that tries to "do what you mean" is in my experience incredibly annoying and difficult to use, compared to software that simply presents a simple interface to a system with comprehensible mechanics.

Put simply, no software today cares about what you want. Furthermore, your general reasoning process here—define some vague measure of "software doing what you want", observe an increasing trend line and extrapolate to a future situation—is exactly the kind of reasoning I always try to avoid, because it is usually misleading and heuristic.

Look at the actual mechanics of the situation. A program that literally wants to do what you mean is a complicated thing. No realistic progression of updates to Google Maps, say, gets anywhere close to building an accurate world-model describing its human users, plus having a built-in goal system that happens to specifically identify humans in its model and deduce their extrapolated goals. As EY has said, there is no ghost in the machine that checks your code to make sure it doesn't make any "mistakes" like doing something the programmer didn't intend. If it's not programmed to care about what the programmer wanted, it won't.

Comment author: Juno_Watt 12 September 2013 03:11:47PM -1 points [-]

Present day software may not have got far with regard to the evaluative side of doing what you want, but the XiXiDu's point seems to be that it is getting better at the semantic side. Who was it who said the value problem is part of the semantic problem?

View more: Next