Comment author: Kaj_Sotala 01 March 2016 07:55:57AM 1 point [-]

My recent paper touches upon preference aggregation a bit in section 8, BTW, though it's mostly focused on the question of figuring out a single individual's values. (Not sure how relevant that is for your comments, but thought maybe a little.)

Comment author: halcyon 05 March 2016 09:58:10PM 0 points [-]

Thanks, I'll look into it.

(And all my ranting still didn't address the fundamental difficulty: There is no rational way to choose from among different projections of values held by multiple agents, projections such as Rawlsianism and utilitarianism.)

Comment author: Kaj_Sotala 26 February 2016 05:39:02PM *  1 point [-]

From the CEV paper:

Different classes of satisfactory initial definitions may fall into different selfconsistent attractors for optimal definitions of volition. Or they may all converge to essentially the same endpoint. A CEV might survey the “space” of initial dynamics and self-consistent final dynamics, looking to see if one alternative obviously stands out as best; extrapolating the opinions humane philosophers might have of that space. But if there are multiple, self-consistent, satisficing endpoints, each of them optimal under their own criterion—okay. Whatever. As long as we end up in a Nice Place to Live.

And yes, the programmers’ choices may have a huge impact on the ultimate destiny of the human species. Or a bird, chirping in the programmers’ window. Or a science fiction novel, or a few lines spoken by a character in an anime, or a webcomic. Life is chaotic, small things have large effects. So it goes.

Which you could sum up as "CEV doesn't get around that problem, it treats it as irrelevant - the point isn't to find a particular good solution that's unique and totally non-arbitrary, it's just to find even one of the good solutions. If arbitrary reasons shift us from Good World #4 to Good World #36, who cares as long as they both really are good worlds".

Comment author: halcyon 28 February 2016 08:43:52PM *  0 points [-]

Although what if we told each party to submit goals rather than non-goal preferences? If the AI has access to a model specifying which actions lead to which consequences, then it can search for those actions that maximize the number of goals fulfilled regardless of which party submitted them, or perhaps takes a Rawlsian approach of trying to maximize the number of goals fulfilled that were submitted by whichever party will have the least number of goals fulfilled if that sequence of actions were taken, etc. That seems very imaginable to me. You can then have heuristics that constrain the search space and stuff. You can also have non-goal preferences in addition to goals if the parties have any of those.

In that light, it seems to me that the problem was inferring goals from a set of preferences which were not purely non-goal preferences but were actually presented with some unspecified goals in mind. Eg. One party wanted chocolate, but said, "I want to go to the store" instead. If that was the source of the original problem, then we can see why we might need an AI to solve it, since it calls for some lightweight mind reading. Of course, a CEV-implementing AI would have to be a mind reader anyway, since we don't really know what our goals ultimately are given everything we could know about reality.

This still does not guarantee basic morality, but parties should at least recognize some of their ultimate goals in the end result. They might still grumble about the result not being exactly what they wanted, but we can at least scold them for lacking a spirit of compromise.

All this presupposes that enough of our actions can be reduced to ultimate goals that can be discovered, and I don't think this process guarantees we will be satisfied with the results. For example, this might erode personal freedom to an unpleasant degree. If we would choose to live in some world X if we were wiser and nicer than we are, then it doesn't necessarily follow that X is a Nice Place to Live as we are now. Changing ourselves to reach that level of niceness and wisdom might require unacceptably extensive modifications to our actual selves.

Comment author: Kaj_Sotala 26 February 2016 05:45:29PM 1 point [-]

What is the current replacement for CEV anyway?

There isn't really one as far as I know; "The Value Learning Problem" discusses some of the questions involved, but seems to mostly at be the point of defining the problem rather than trying to answer it. (This seems appropriate to me; trying to answer the problem at this point seems premature.)

Comment author: halcyon 28 February 2016 03:32:58PM 1 point [-]

Thanks. That makes sense to me.

Comment author: Kaj_Sotala 26 February 2016 05:39:02PM *  1 point [-]

From the CEV paper:

Different classes of satisfactory initial definitions may fall into different selfconsistent attractors for optimal definitions of volition. Or they may all converge to essentially the same endpoint. A CEV might survey the “space” of initial dynamics and self-consistent final dynamics, looking to see if one alternative obviously stands out as best; extrapolating the opinions humane philosophers might have of that space. But if there are multiple, self-consistent, satisficing endpoints, each of them optimal under their own criterion—okay. Whatever. As long as we end up in a Nice Place to Live.

And yes, the programmers’ choices may have a huge impact on the ultimate destiny of the human species. Or a bird, chirping in the programmers’ window. Or a science fiction novel, or a few lines spoken by a character in an anime, or a webcomic. Life is chaotic, small things have large effects. So it goes.

Which you could sum up as "CEV doesn't get around that problem, it treats it as irrelevant - the point isn't to find a particular good solution that's unique and totally non-arbitrary, it's just to find even one of the good solutions. If arbitrary reasons shift us from Good World #4 to Good World #36, who cares as long as they both really are good worlds".

Comment author: halcyon 28 February 2016 03:21:21PM *  1 point [-]

The real difficulty is that when you combine two sets of preferences, each of which make sense on their own, you get a set of preferences that makes no sense whatsoever: http://plato.stanford.edu/entries/economics/#5.2 https://www.google.com/search?q=site%3Aplato.stanford.edu+social+choice&ie=utf-8&oe=utf-8

There is no easy way to resolve this problem. There is also no known method that takes such an inconsistent set of preferences as input and gives a consistent set of preferences as output such that the output would be recognizable to either party who contributed an original set of preferences as furthering any of their original goals. These random decisions are required so often in cases where there isn't an unanimous agreement that in practice, there would be a large component of arbitrariness every single time CEV tries to arrive at a uniform set of preferences by extrapolating volitions of multiple agents into the future.

This doesn't mean the problem is unresolvable, just that it's an AI problem in its own right, but given these problems, wouldn't it be better to pick whichever Nice Place to Live is the safest to reach instead of bothering with CEV? I say this because I'm not sure Nice Place to Live can be defined in terms of CEV, as any CEV-approved output. Because of the preference aggregation problem, I'm not certain that a world that is provably CEV-abiding also provably avoids flagrant immorality. Two moral frameworks when aggregated by a non-smart algorithm might give rise to an immoral framework, so I'm not sure the essence of the problem is resolved just by CEV as explained in the paper.

Comment author: ChristianKl 24 February 2016 12:46:30AM 1 point [-]

I think that's on the list of MIRI open research problems.

Comment author: halcyon 24 February 2016 02:24:17PM 0 points [-]

Interesting. In that case, would you say an AI that provably implements CEV's replacement is, for that reason, provably Friendly? That is, AIs implementing CEV's replacement form an analytical subset of Friendly AIs? What is the current replacement for CEV anyway? Having some technical material would be even better. If it's open to the public, then I'd like to understand how EY proposes to install a general framework similar to CEV at the "initial dynamic" stage that can predictably generate a provably Friendly AI without explicitly modeling the target of its Friendliness.

Comment author: Manfred 24 February 2016 01:11:09AM 1 point [-]

Any proofs will be like... assuming that if some laws of aerodynamics and range of conditions hold, proving that a certain plane design will fly. Which of course has some trouble because we don't know the equivalent of aerodynamics either.

Comment author: halcyon 24 February 2016 02:12:25PM 0 points [-]

That would seem to be the best possible solution, but I have never heard aeroplane engineers claim that their designs are "provably airworthy". If you take the aeroplane design approach, then isn't "provably Friendly" a somewhat misleading claim to make, especially when you're talking about pushing conditions to the extreme that you yourself admit are beyond your powers of prediction? The aeroplane equivalent would be like designing a plane so powerful that its flight changes the atmospheric conditions of the entire planet, but then the plane uses a complicated assembly of gyroscopes or something to continue flying in a straight line. However, if you yourself cannot predict which specific changes the flight of the plane will make, then how can you claim that you can prove that particular assembly of gyroscopes is sufficient to keep the plane on the preplanned path? On the other hand, if you can prove which specific changes the plane's flight will make that are relevant to its flight, then you have a mathematical definition of the target atmosphere at a sufficient depth of resolution to design such an assembly. Does MIRI think it can come up with an equivalent mathematical model of humanity with respect to AI?

Comment author: Lumifer 23 February 2016 09:05:15PM 1 point [-]

That's the reason EY came up with the concept of CEV -- Coherent Extrapolated Volition.

Comment author: halcyon 23 February 2016 10:49:42PM *  2 points [-]

The SEP says that preferences cannot be aggregated without additional constraints on how the aggregation is to be done, and the end result changes depending on things like the order of aggregation, so these additional constraints take on the quality of arbitrariness. How does CEV get around that problem?

Comment author: halcyon 23 February 2016 08:52:13PM *  0 points [-]

I have a question: It seems to me that Friendliness is a function of more than just an AI. To determine whether an AI is Friendly, it would seem necessary to answer the question: Friendly to whom? If that question is unanswered, then "Friendly" seems like an unsaturated function like "2+". In the LW context, the answer to that question is probably something along the lines of "humanity". However, wouldn't a mathematical definition of "humanity" be too complex to let us prove that some particular AI is Friendly to humanity? Even if the answer to "To whom?" is "Eliezer Yudkowsky", even that seems like it would be a rather complicated proof to say the least.

Comment author: halcyon 29 May 2015 01:56:16AM 2 points [-]

I wonder if this objection to MIRI's project has been made so far: EY recognizes that placing present day humans in an environment reached by CEV would be immoral, right? Doesn't this call into question the desirability of instant salvation? Perhaps what is really desirable is reaching the CEV state, but doing so only gradually. Otherwise, we might never reach our CEV state, and we arguably do want to reach it eventually. We can still have a friendly AI, but perhaps it's role should be to slowly guide us to the CEV state while making sure we don't get into deep trouble in the mean time. Eg. We shouldn't be maimed for life as the result of an instant's inattention, etc.

Tarski's truth sentences and MIRI's AI

1 halcyon 09 August 2014 07:28PM

(Disclaimer: I have no training in or detailed understanding of these subjects. I first heard of Tarski from the Litany of Tarski, and then I Googled him.)

In his paper The Semantic Conception of Truth, Tarski says that he analyzes the claim, '"Snow is white" is true if and only if snow is white' as being expressed in two different languages. The whole claim in single quotes is expressed in a metalanguage, while "snow is white" is in another language.

For Tarski's proof to succeed, it is (if I understood him correctly) both necessary and sufficient for the metalanguage to be logically richer than the other language in certain ways. What these ways are is, according to Tarski, difficult to make general statements about without actually following his very involved technical proof.

If I remember correctly, this implies that the two languages cannot be identical. Tarski seems to be of the opinion that for a given language satisfying specific conditions, concepts of truth, synonymy, meaning, etc. can be defined for it in a metalanguage that is richer than it in logical devices, establishing a hierarchy of truth defining languages.

My main question is, since MIRI aims to mathematically prove Friendliness in recursively self-improving AI, is "essential richness" in language handling ability something we should expect to see increasing in the class of AIs MIRI is interested in, or is that unnecessary for MIRI's purposes? I understand that semantically defining truth and meaning may not be important either way. My principal motive is curiosity.

View more: Prev | Next