The point of the proof is that if there is an established procedure that takes as input people's stated utilities about certain choices, and outputs a Pareto outcome, then it must be possible to game it by lying. The motivations of the players aren't taken into account once their preferences are stated.
Rather than X or Y succeeding at gaming it by lying, however, it seems that a disinterested objective procedure that selects by Pareto optimalness and symmetry would then output a (0.6, 0.6) outcome in both cases, causing a -0.35 utility loss for the liar in the first case and a -0.1 utility loss for the liar in the second.
Is there a direct reason that such an established procedure would be influenced by a perceived (0.95, 0.4) option to not choose an X=Y Pareto outcome? (If this is confirmed, then indeed my current position is mistaken. <nods>)
He can only lie about how much he values the point - not about how much the other player values it.
I may be missing something: for Figure 5, what motivation does Y have to go along with perceived choice (0.95, 0.4), given that in this situation Y does not possess the information possessed (and true) in the previous situation that '(0.95, 0.4)' is actually (0.95, 0.95)?
In Figure 2, (0.6, 0.6) appears symmetrical and Pareto optimal to X. In Figure 5, (0.6, 0.6) appears symmetrical and Pareto optimal to Y. In Figure 2, X has something to gain by choosing/{allowing the choice of} (0.95, 0.4) over (0.6, 0.6) and Y has something to gain by choosing/{allowing the choice of} (0.95, 0.95) over (0.6, 0.6), but in Figure 5, while X has something to gain by choosing/{allowing the choice of} (0.6, 0.4) over (0.5, 0.5), Y has nothing to gain by choosing/{allowing the choice of} (0.95, 0.4) over (0.6, 0.6).
Is there a rule(/process) that I have overlooked?
Going through the setup again, it seems as though in the first situation (0.95, 0.95) would be chosen while looking to X as though Y was charitably going with (0.95, 0.4) instead of insisting on the symmetrical (0.6, 0.6), and that in the second situation Y would insist on the seemingly-symmetrical-and-(0.6, 0.6) (0.4, 0.6) instead of going along with X's desired (0.6, 0.4) or even the actually-symmetrical (0.5, 0.5) (since that would appear {non-Pareto optimal}/{Pareto suboptimal} to Y).
In practice (though the clothing may have an unrelated advantage), the clothing one wears has no effect on the validity of the logical arguments used in reasoning/debate.
The purpose of the clothing is to make people aware of the dangers of cultishness, even though wearing identical clothing all else equal encourages cultishness. All else is not equal, it is a worthwhile cost to bring the issue to the fore and force people to compensate by thinking non-cultishly (not counter-cultishly).
A novice rationalist approached the master Ougi and said, "Master, I worry that our rationality dojo is... well... a little cultish."
"That is a grave concern," said Ougi.
The novice waited a time, but Ougi said nothing more.
So the novice spoke up again: "I mean, I'm sorry, but having to wear these robes, and the hood - it just seems like we're the bloody Freemasons or something."
"Ah," said Ougi, "the robes and trappings."
Note how Ougi waited for the novice to explain himself, Ougi wanted to know if the thought patterns or clothing was causing the concern.
There is no direct relationship between clothing and probability theory, but there is a relationship that goes through the human. Human beliefs are influenced by social factors.
The student, more-or-less enlightened by this
The student learned only what was nearly explicitly said, that there is no direct relationship between clothing and probability theory.
The student failed to learn the lesson about cultishness. He is only said to have reached the rank of grad student, not master, unlike the student in the other koan. Bouzo "would only discuss rationality while wearing a clown suit." Only when wearing a clown suit - this is cultish countercultishness. To avoid giving the impression that understanding or cultishness have to do with mystic clothing, he never tried to increase understanding while in mystic clothing, lest that cause cultishness - oops for him.
What causes cultishness depends deeply on the audience, what meta-level of contrarianism each person is at, whether they will bristle at or be swept along by cultishness when wearing the same uniform, etc.
This is different from the first koan because the student is not a role model. One should not assume that the characters in a leader's koan are role models and think of how to justify their behavior. Instead, one must independently ask if it makes sense for Bouzo to only discuss rationality while wearing a clown suit in the context of the story. Like in most contexts, the answer to that question is "no, that's silly."
A very interesting perspective: Thank you!
I think that there is an actual impossibility lurking here.
Imagine the situation in which the pupil is fixed in this view "To learn from my teacher, it suffices to learn his words off by heart." and never does anything more. The teacher notices the problem, and tries to help the pupil by telling him that it is not enough to learn the words, you must understand the meaning, and put the teachings into practice.
The pupil is grateful for the teaching and writes it in his note book "To learn from my teacher, it is not enough to learn the words, I must understand the meaning, and put the teachings into practice." It is only twenty four words, many of them short. The pupil puts the teaching into his spaced repetition software, and is soon word perfect, although he continues to ignore the meaning of the teachings he memorizes.
What more can the teacher do? Can words point beyond words? Obviously yes, but if the pupil looks at the finger and not at the moon, is there anything that the teacher can say that will get through to the pupil? It may be that it is actually impossible for the teacher to help the pupil until the pupil is changed by the impact of external events.
I think that I have understood the first koan. I am still mystified by the second koan. I suspect that it relates to things that Eliezer has experienced or seen. Eliezer has examples of followership in mind and the koans would be clear in context.
' I am still mystified by the second koan.': The novice associates {clothing types which past cults have used} with cults, and fears that his group's use of these clothing types suggests that the group may be cultish.
In practice (though the clothing may have an unrelated advantage), the clothing one wears has no effect on the validity of the logical arguments used in reasoning/debate.
The novice fears a perceived connection between the clothing and cultishness (where cultishness is taken to be a state of faith over rationality, or in any case irrationality). The master reveals the lack of effect of clothing on the subjects under discussion with the extreme example of the silly hat, pointing out the absurdity of wearing it affecting one's ability to effectively use probability theory (or any practical use of rationality for that matter).
This is similar to the first koan, {in which}/{in that} what matters is whether the (mental/conceptual) tools actually /work/ and yield useful results.
The student, more-or-less enlightened by this, takes it to heart and serves as an example to others by always discussing important concepts in absurd clothing, to get across to his own students(, others whom he interacts with, et cetera) that the clothing someone wears has nothing to do with the validity/accuracy of their ideas.
(Or, at least, that's my interpretation.)
Edit: A similar way of describing this may be to imagine that the novice is treating clothing-cult correlation as though it were causation, and the master points out with use of absurdity that there cannot be clothing->cult causation for the same reason that there cannot be silly_hat->comprehension causation. (What counts being the usefulness of the hammer, the validity of the theories used, rather than unrelated things which coincide with them.)
Depending on the cost, it at least seems to be worth knowing about. If one doesn't have it then one can be assured on that point, whereas if one does have it then one at least has appropriate grounds on which to second-guess oneself.
(I have been horrified in the past by tales of {people who may or may not have inherited a dominant gene for definite early disease-related death} who all refused to be tested, thus dooming themselves to a lives of fear and uncertainty. If they were going to have entirely healthy lives then they would have lived in fear and uncertainty instead of being able to enjoy them, and if they were giong to die early then they would have lived in fear and uncertainty (and stressful, gradually-increasing denial/acceptance) rather than quickly getting used to the idea, resetting their baseline, getting their loose ends in order and living as appropriate for their expected remaining lifespan. Whether or not one does (or can do) anything about one's state doesn't change that oneself having more information about oneself can (in most circumstances?) only be helpful.)
'I haven't seen a post on LW about the grue paradox, and this surprised me since I had figured that if any arguments would be raised against Bayesian LW doctrine, it would be the grue problem.':
If of relevance, note http://lesswrong.com/lw/q8/many_worlds_one_best_guess/ .
'The second AI helped you more, but it constrained your destiny less.': A very interesting sentence. <nods>
On other parts, I note that the commitment to a range of possible actions can be seen as larger-scale than to a single action, even before which one is taken is chosen.
A particular situation that comes to mind, though:
Person X does not know of person Y, but person Y knows of person X. Y has an emotional (or other) stake in a tiebreaking vote that X will make; Y cannot be present on the day to observe the vote, but sets up a simple machine to detect what vote is made and fire a projectile through the head of X if X makes one vote rather than another (nothing happening otherwise).
Let it be given that in every universe that X votes that certain way, X is immediately killed as a result. It can also safely be assumed that in those universes Y is arrested for murder.
In a certain universe, X votes the other way, but the machine is later discovered. No direct interference with X has taken place, but Y who set up the machine (pointed at X's head, X's continued life unknowingly dependent on X's vote) presumably is guilty of a felony of some sort (which though, I wonder?).
Regardless of motivation, to have committed to potentially carry out a certain thing against X is treated as similarly serious to that of in fact having it carried out (or attempted to be carried out).
(This, granted, may focus on a concept within the above article without addressing the entire issue of planning another entity's life.)
I did a similar experiment on myself when I went on an organized trip to Israel. When we stopped at the Whaling (Western) Wall, I decided to test out my rationality. As you know, you're suppose to write down a wish on a piece of paper and put it in the wall i.e. another way of praying. I decided to write down "I wish my family would die in 2 weeks," and put it in the wall to see if I can do it.
To my surprise, I did feel a bit weird, a little anxious, but after a while I was fine. It is hard to overcome the emotions induced by our biases, but can be done with practice.
Just curious, would anyone not write the note (that I wrote)? Assuming you'd be compensated for your effort to write it and put it in the wall.
Thought 1: If hypothetically one's family was going to die in an accident or otherwise (for valid causal wish-unrelated reasons), the added mental/emotional effect on oneself would be something to avoid in the first place. Given that one is infallible, one can never assert absolute knowledge of non-causality (direct or indirect), and that near-infinitesimal consideration could haunt one. Compare this possibility to the ease, normally, of taking other routes and thus avoiding that risk entirely.
...other thoughts are largely on the matter of integrity... respect and love felt for family members, thus not wishing to badmouth them or officially express hope for their death even given that neither they nor anyone else could hear it... hmm.
Pragmatically, one could cite a concern regarding taken behaviours influencing ease of certain thoughts: I do not particularly want to become someone who can more easily write a request that my family members die.
There are various things that I might wish that I would not carry out if I had the power to directly (and secretly) do so, but generally if doing such a thing I would prefer to wish for something I actually wanted (/would carry out if I had the power to do so myself), on the off-chance that some day if I do such to the knowledge of another the other is inclined to help me reach it in some way.
Given the existence of compensation, there is yet the question of what compensation would be sufficient to make me do something that made me feel sullied. Incidentally notable that I note there are many things that would make others feel sullied that I would do with no discomfort at all.
...a general practice of acting in a consistent way... a perception of karma not as something which operates outside normal causality, but instead similar-to-luck just those parts of normal causality that one cannot be aware of... ah, I've reached the point of redundancy were I to continue typing.
To check, does 'in order for it to be safe' refer to 'safe from the perspectives of multiple humans', compared to 'safe from the perspective of the value-set source/s'? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Another example that comes to mind regarding a conflict of priorities: 'If your brain was this much more advanced, you would find this particular type of art the most sublime thing you'd ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can't appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.'
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Digression: If such an entity acts according to a smarter-me's will, then theoretically existing does the smarter-me necessarily 'exist' as simulated/interpreted by the entity?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn't want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as 'The smarter-me wouldn't aid my selfishness!' and 'The smarter-me would act selfishly in ways which don't benefit anyone since it isn't sapient!'. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent. I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn't in fact sapient/{in existance}... though, arguably, that also wouldn't be too bad in terms of absolute results... hmm.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
(0.6, 0.6) is not Pareto. The "equal Pareto outcome" is the point (19/31,19/31) which is about (0.62,0.62). This is a mixed outcome, the weighted sum of (0,1) and (0.95,0.4) with weights 11/31 and 20/31. In reality, for y's genuine utility, this would be 11/31(0,1) + 20/31(0.95,0.95)=(19/31,30/31), giving y a utility of about 0.97, greater than the 0.95 he would have got otherwise.
<checks the 19/31 through y = (-0.6*20/19)x + 1; nods>
<nods with realisation at the selected x=y=19/31 point corresponding to a different location in the true depiction>
(Assuming that it stays on the line of 'what is possible', in any case a higher Y than otherwise, but finding it then according to the constant X--1 - ((19/31) * (1/19)), 30/31, yes...)
I confess I do not understand the significance of the terms mixed outcome and weighted sum in this context, I do not see how the numbers 11/31 and 20/31 have been obtained, and I do not presently see how the same effect can apply in the second situation in which the relative positions of the symmetric point and its (Pareto?) lines have not been shifted, but I now see how in the first situation the point selected can be favourable for Y! (This representing convincing of the underlying concept that I was doubtrful of.) Thank you very much for the time taken to explain this to me!