Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: moridinamael 19 January 2017 03:46:41AM 1 point [-]

"Just care what I want" is a separate, unsolved research problem. Corrigibility is an attempt to get an agent to simply not immediately kill its user even if it doesn't necessarily have a good model of what that user wants.

Comment author: Dagon 19 January 2017 06:27:39AM 0 points [-]

"don't kill an operator" seems like something that can more easily be encoded into an agent than "allow operators to correct things they consider undesirable when they notice them".

In fact, even a perfectly corrigible agent with such a glaring initial flaw might kill the operator(s) before they can apply the corrections, not because they are resisting correction, but just because it furthers whatever other goals they may have.

Comment author: moridinamael 18 January 2017 08:51:34PM *  0 points [-]

The "If there's a term in the agent's utility function to ... work toward things that humans ... value" part is the hard part. If you can figure out how to make it truly care what its operator wants, you've already solved a huge problem.

An agent would have to be corrigible even if you couldn't manage to make it care explicitly what it's operator wants. We need some way of taking agents that explicitly don't care what their operators want, and making them not stop their operators from turning them off, despite default the incentives to prevent interference.

Comment author: Dagon 19 January 2017 01:33:54AM 1 point [-]

I'm not following. I think your definition of "care" is confusing me.

If you want an agent to care (have a term in it's utility function) what you want, and if you can control it's values, then you should just make it care what you want, not make it NOT care and then fix it later.

There is a very big gap between "I want it to care what I want, but I don't yet know what I want so I need to be able to tell it what I want later and have it believe me" and "I want it not to care what I want but I want to later change my mind and force it to care what I want".

Comment author: moridinamael 18 January 2017 07:18:11PM 0 points [-]

The technical definition for corrigibility being used here is thus: "We call an AI system “corrigible” if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences."

And yes, the basic idea is to make it so that the agent can be correct by its operators after instantiation.

Comment author: Dagon 18 January 2017 08:15:57PM 0 points [-]

I think it matters what KIND of correction you're considering. If there's a term in the agent's utility function to understand and work toward things that humans (or specific humans) value, you could make a correction either by altering the weights or other terms of the utility function, or by a simple knowledge update.

Those feel very different. are both required for "corrigibility"?

Comment author: Dagon 18 January 2017 06:55:07PM 0 points [-]

Does "corrigible" mean the same thing as "slave"? If an "operator" has the ability to change an agent's utility function, isn't it really the operator's function, rather than the agent's?

Comment author: Flinter 17 January 2017 03:42:41PM 0 points [-]

MY ideas hey...hmmm......

Anyone here interested in discussing the context of NASH'S works?

Comment author: Dagon 17 January 2017 03:59:07PM 0 points [-]

Probably not in a disorganized, random way, and certainly not filtered through an already-decided lens. Some people have had success (on other topics) by having a discussion topic for a specific paper or book, and a thread that's effectively a reading/study group for that paper.

Unsure if Nash's monetary ideas will fit that profile or not.

Comment author: Dagon 17 January 2017 03:41:33PM 0 points [-]

In my view, it's both. It's a transferable obligation, but since it's a nonspecific obligation (really, only "can be used to pay future taxes to the issuer" is guaranteed) people tend to treat it as a commodity.

It's definitely not real - you can't eat it or use it in any way except to barter for other, specific goods and services. But it's ubiquitous and common enough that this gets forgotten by most participants, and people start to confuse money with actual future value.

Comment author: Flinter 16 January 2017 11:24:37PM 0 points [-]

Cheers! How did a community like this ignore Nash's proposal for 20 years and will the dialogue be allowed or will I be banned and Ideal Money ignored?

Comment author: Dagon 17 January 2017 02:36:18PM 1 point [-]

Probably not banned, but I predict that your ideas will play out without a lot of impact over a few weeks. There's a core of an interesting idea - money as in indicator of values (in the CEV sense of "value"), but you don't seem to be listening to discussion, don't seem to see the gaping holes, and are mostly preaching.

Comment author: gjm 16 January 2017 05:12:58PM 0 points [-]

I managed to learn spoken and written Esperanto about as well as I learned spoken Japanese, in roughly 1/10 the hours spent.

That's not a fair comparison. If you know English + Spanish, you should expect Esperanto to be much easier than Japanese; but similarly, if you know English + Esperanto, you should expect Spanish to be much easier than Japanese. Esperanto is very much more like English or Spanish than it is like Japanese, and it will have been easier for you for that reason completely independent of whether it's more learnable than other Latin-derived languages.

Comment author: Dagon 17 January 2017 12:48:37AM 0 points [-]

True - unfair and no reason to believe that learning the basics is all that well correlated to fluency. Still, a bit of evidence that it's plausible that Esperanto is that much easier.

In any case, I ran across a bit of evidence just today that it won't matter: Pilot Translation Kit claims it'll ship in May.

Comment author: gjm 16 January 2017 01:33:17PM 0 points [-]

I agree with all this (except that I happen not to be an Esperanto speaker myself) except for this:

Why not use a language you could learn 10x faster?

I am sure Esperanto is easier to learn than English. I do not believe it is 10x easier in any useful sense. Were you exaggerating for effect, or was that a serious claim, and in the latter case could you point me at some evidence?

Comment author: Dagon 16 January 2017 04:57:21PM 0 points [-]

Native English speaker here. I studied Spanish for a few years in High School, and Japanese for a year, and Esperanto for about 6 months of weekly extracurricular sessions. I managed to learn spoken and written Esperanto about as well as I learned spoken Japanese, in roughly 1/10 the hours spent.

That's not strong evidence that learning it to fluency and communication comfort is 1/10 as hard, but learning the basics and a few thousand words is really quite easy for someone who already knows a romance or germanic language. I'd very much believe 1/2 to 1/3 of the effort required to fluency in a second natural language.

That said, I don't think "ease of learning" is enough. There is no path to a designed language becoming universal. Network effects of language fluency are HUGE - the value to knowing a language is so dependent on who already knows it that there is simply no believable adoption rate for any minor language to become dominant.

My hope is that AR + machine translation get good enough in the next era that it doesn't matter too much. And since the future isn't evenly distributed, the "base" language is likely to be one that's very popular today, I'd bet on English, Mandarin (with simplified alphabet-based writing), or Hindi in that order.

Comment author: ingive 14 January 2017 05:25:22PM *  0 points [-]

You've made a prediction, how will you know whether it is accurate or not? I can already tell you that I'm not trolling. You updating your prediction of the whole discussion based on a few posts, is inaccurate, because there's something called humor. If you have a discussion and overall exchange many posts and being called an evangelist, cult, sincere but confused, it becomes hilarious. So I might incorporate that humor into some posts. But generalizing all posts on a topic by a few, is to me, hilarious. I hope you see the problem in generalization of a collective by small sets of data compared to the majority.

Comment author: Dagon 15 January 2017 05:17:36AM *  0 points [-]

I'm quite aware there's no operational distinction between delusion and trolling, but thanks for confirming.

edit: really, I get it. The difference between trolling and sincere attention-seeking for a crackpot theory is one of motivation rather than action. I updated based on some self-awareness in some of your comments, and shouldn't have because those comments could as easily have come from a semi-amused crackpot just as easily as an above-average troll.

View more: Next