My confidence bounds were 75% and 98% for defect, so my estimate was diametrically opposed to yours. If the admittedly low sample size of these comments is any indication, we were both way off.
Why do you think most would cooperate? I would expect this demographic to do a consequentialist calculation, and find that an isolated cooperation has almost no effect on expected value, whereas an isolated defection almost quadruples expected value.
Nice job on the survey. I loved the cooperate/defect problem, with calibration questions.
I defected, since a quick expected value calculation makes it the overwhelmingly obvious choice (assuming no communcation between players, which I am explicitly violating right now). Judging from comments, it looks like my calibration lower bound is going to be way off.
I agree that the statement is not crystal clear. It makes it possible to confuse the (change in the average) with the (average of the change).
Mathematically speaking, we represent our beliefs as a probability distribution on the possible outcomes, and change it upon seeing the result of a test (possibly for every outcome). The statement is that “if we average the possible posterior probability distributions weighted by how likely they are, we will end up with our original probability distribution.”
If that were not the case, it would imply that we were fail...
For the moment, I'm going to strike the comment from the post. I don't want to ascribe a viewpoint to VincentYu that he doesn't actually hold.
I added a section called "Deciding how to decide" that (hopefully) deals with this issue appropriately. I also amended the conclusion, and added you as an acknowledgement.
I'm not sure why it got moved: maybe not central to the thesis of LW, or maybe not high enough quality. I'm going to add some discussion of counter-arguments to the limit method. Maybe that will make a difference.
I noticed that the discussion picked up when it got moved, and I learned some useful stuff from it, so I'm not complaining.
Ok, I think I've got it. I'm not familiar with VNM utility, and I'll make sure to educate myself.
I'm going to edit the post to reflect this issue, but it may take me some time. It is clear (now that you point it out) that we can think of the ill-posedness coming from our insistence that the solution conform to aggregative utilitarianism, and it may be possible to sidestep the paradox if we choose another paradigm of decision theory. Still, I think it's worth working as an example, because, as you say, AU is a good general standard, and many readers will be familiar with it. At the minimum, this would be an interesting finite AU decision problem.
Thanks for all the time you've put into this.
I would like to include this issue in the post, but I want to make sure I understand it first. Tell me if this is right:
...It is possible mathematically to represent a countably infinite number of immortal people, as well as the process of moving them between spheres. Further, we should not expect a priori that a problem involving such infinities would have a solution equivalent to those solutions reached by taking infinite limits of an analogous finite problem. Some confusion arises when we introduce the concept of “utility” to determine which of the two c
The final section has been edited to reflect the concerns of some of the commenters.
Thanks to whomever moved this to Discussion. From the FAQ, I wasn't sure where to put it. This is better, in retrospect.
Thanks! Do you guys want to copy edit my journal papers? ;)
You're completely right! As stated, the problem is ill posed, i.e. it has no unique solution, so we didn't solve it.
Instead, we solved a similar problem by introducing a new parameter, \alpha. It was useful because we gained a mathematical description that works for very large n and s, and which matches our intuition about the problem.
It is important to recognize, as you point out, that that taking limits does not solve the problem. It just elucidates why we can't solve it as stated.
I agree that it's a lot to cover, but I wanted to work a full example. We talk a lot on LW about decision analysis and paradoxes in the abstract, but I'm coming from a math/physics background, and it's much more helpful for me to see concrete examples. I assume some other people feel the same way.
Self-referential problems would be an interesting area to study, but I'm not familiar with the techniques. I suspect you're right, though.
Fixed. Thanks for reading so closely. It's amazing how many little mistakes can survive after 10 read-throughs.
By the way, are you talking about this meme, or is there another problem with monkeys and bananas?
Great problem, thanks for mentioning it!
I think the answer to "how many balls did you put in the vase as T->\infty" and "How many balls have been destroyed as T->\infty" both have well defined answers. It's just a fallacy to assume that the "total number of balls in the vase as T->\infty" is equal to the difference between these quantities in their limits.
My parents stopped me from skipping a grade, and apart from a few math tricks, we didn't work on additional material at home. I fell into a trap of "minimum effort for maximum grade," and got really good at guessing the teacher's password. The story didn't change until graduate school, when I was unable to meet the minimum requirements without working, and that eventually led me to seek out fun challenges on my own.
I now have a young son of my own, and will not make the same mistake. I'm going to make sure he expects to fail sometimes, and that I praise his efforts to go beyond what's required. No idea if it will work.
MrMind explains in better language below.
The plots were done in Mathematica 9, and then I added the annotations in PowerPoint, including the dashed lines. I had to combine two color functions for the density plot, since I wanted to highlight the fact that the line s=n represented indifference. Here's the code:
r = 1; ua = 1;ub = -1; f1[n, s] := (ns - s^2r ) (ua - ub); Show[DensityPlot[-f1[n, s], {n, 0, 20}, {s, 0, 20}, ColorFunction -> "CherryTones", Frame -> False, PlotRange -> {-1000, 0}], DensityPlot[f1[n, s], {n, 0, 20}, {s, 0, 20}, ColorFunction -> "BeachColors", Frame -> False, PlotRange -> {-1000, 0}]]
No, I mean a function whose limit doesn't equal its defined value at infinity. As a trivial example, I could define a utility function to be 1 for all real numbers in [-inf,+inf) and 0 for +inf. The function could never actually be evaluated at infinity, so I'm not sure what it would mean, but I couldn't claim that the limit was giving me the "correct" answer.
Thanks for clearing up the countability. It's clear that there are some cases where taking limits will fail (like when the utility is discontinuous at infinity), but I don't have an intuition about how that issue is related to countability.
In the above example, the number of people and the number of days they live were uncountable, if I'm not mistaken. The take-home message is that you do not get an answer if you just evaluate the problem for sets like that, but you might if you take a limit.
Conclusions that involve infinity don't map uniquely on to finite solutions because they don't supply enough information. Above, "infinite immortal people" refers to a concept that encapsulates three different answers. We had to invent a new parameter, alpha, which was not supplied in the origi...
Here is some clarification from Zinsser himself (ibid.):
..."Who am I writing for? It's a fundamental question, and it has a fundamental answer: You're writing for yourself. Don't try to visualize the great mass audience. There is no such audience - every reader is a different person.
This may seem to be a paradox. Earlier I warned that the reader is... impatient... . Now I'm saying you must write for yourself and not be gnawed by worry over whether the reader is tagging along. I'm talking about two different issues. One is craft, the other is attitude.
On Writing Well, by William Zinsser
Every word should do useful work. Avoid cliché. Edit extensively. Don’t worry about people liking it. There is more to write about than you think.
It makes no sense to call something “true” without specifying prior information. That would imply that we could never update on evidence, which we know not to be the case for statements like “2 + 3 = 5.” Much of the confusion comes from different people meaning different things by the proposition “2 + 3 = 5,” which we can resolve as usual by tabooing the symbols.
Consider the propositions "
A =“The next time I put two sheep and three sheep in a pen, I will end up with five sheep in the pen.”
B = “The universe works as if in all cases, combining two of s...
As usual, I'm late to the discussion.
The probability that a counterfactual is true should be handled with the same probabilistic machinery we always use. Once the set of prior information is defined, it can be computed as usual with Bayes. The confusing point seems to be that the prior information is contrary to what actually occurred, but there's no reason this should be different than any other case with limited prior information.
For example, suppose I drop a glass above a marble floor. Define:
sh = “my glass shattered”
f = “the glass fell to the floor und...
I'm confused about why this problem is different from other decision problems.
Given the problem statement, this is not an acausal situation. No physics is being disobeyed - Kramers Kronig still works, relativity still works. It's completely reasonable that my choice could be predicted from my source code. Why isn't this just another example of prior information being appropriately applied to a decision?
Am I dodging the question? Does EY's new decision theory account for truly acausal situations? If I based my decision on the result of, say, a radioactive decay experiment performed after Omega left, could I still optimize?
Ha - thanks. FIxed. But I guess if other people want to Skype in from around the world, they're welcome to.
Yes, we are running on corrupted hardware at about 100 Hz, and I agree that defining broad categories to make first-cut decisions is necessary.
But if we were designing a morality program for a super-intelligent AI, we would want to be as mathematically consistent as possible. As shminux implies, we can construct pathological situations that exploit the particular choice of discontinuities to yield unwanted or inconsistent results.
I think it would be possible to have an anti-Occam prior if the total complexity of the universe is bounded.
Suppose we list integers according to an unknown rule, and we favor rules with high complexity. Given the problem statement, we should take an anti-Occam prior to determine the rule given the list of integers. It doesn't diverge because the list has finite length, so the complexity is bounded.
Scaling up, the universe presumably has a finite number of possible configurations given any prior information. If we additionally had information that led us to take an Anti-Occam prior, it would not diverge.
I'm also looking for a discussion of the symmetry related to conservation of probability through Noether's theorem. A quick Google search only finds quantum mechanics discussions, which relate it to spatial invariances, etc.
If there's no symmetry, it's not a conservation law. Surely someone has derived it carefully. Does anyone know where?
The idea that the utility should be continuous is mathematically equivalent to the idea that an infinitesimal change on the discomfort/pain scale should give an infinitesimal change in utility. If you don't use that axiom to derive your utility funciton, you can have sharp jumps at arbitrary pain thresholds. That's perfectly OK - but then you have to choose where the jumps are.
I think that in physics we would deal with this as a mapping problem. Jonh's and Mary's beliefs about the planet live in different spaces, and we need to pick a basis on which to project them in order to compare them. We use language as the basis. But then when we try to map between concepts, we find that the problem is ill posed: it doesn't have a unique solution because the maps are not all 1:1.
Nice job writing the survey - fun times. I kind of want to hand it out to my non-LW friends, but I don't want to corrupt the data.
Thanks, I'll check it out.
Bravo, Eliezer. Anyone who says the answer to this is obvious is either WAY smarter than I am, or isn't thinking through the implications.
Suppose we want to define Utility as a function of pain/discomfort on the continuum of [dust speck, torture] and including the number of people afflicted. We can choose whatever desiderata we want (e.g. positive real valued, monotonic, commutative under addition).
But what if we choose as one desideratum, "There is no number n large enough such that Utility(n dust specks) > Utility(50 yrs torture)." What doe...
Also, I suggest you read Torture vs Dust Specks. I found it to be very troubling, and would love to talk about it at the meeting.
Is this the same as Jaynes' method for construction of a prior using transformation invariance on acquisition of new evidence?
Does conservation of expected evidence always uniquely determine a probability distribution? If so, it should eliminate a bunch of extraneous methods of construction of priors. For example, you would immediately know if an application of MaxEnt was justified.
That thought occurred to me too, and then I decided that EY was using "entropy" as "the state to which everything naturally tends" But after all, I think it's possible to usefully extend the metaphor.
There is a higher number of possible cultish microstates than non-cultish microstates, because there are fewer logically consistent explanations for a phenomenon than logically inconsistent ones. In each non-cultish group, rational argument and counter-argument should naturally push the group toward one describing observed reality. By contrast, cultish groups can fill up the rest of concept-space.
You can't remember whether or not bleggs exist in real life.
Maybe this is covered in another post, but I'm having trouble cramming this into my brain, and I want to make sure I get this straight:
Consider a thingspace. We can divide the thingspace into any number of partially-overlapping sets that don’t necessarily span the space. Each set is assigned a word, and the words are not unique.
Our job is to compress mental concepts in a lossy way into short messages to send between people, and we do so by referring to the words. Inferences drawn from the message have associated uncertanties that depended on the characteri...
That's very helpful, thanks. I'm trying to shove everything I read here into my current understanding of probability and estimation. Maybe I should just read more first.
There are a couple things I still don't understand about this.
Suppose I have a bent coin, and I believe that P(heads) = 0.6. Does that belief pay rent? Is it a "floating belief?" It is not, in principle, falsifiable. It's not a question of measurement accuracy in this case (unless you're a frequentist, I guess). But I can gather some evidence for or against it, so it's not uninformative either. It is useful to have something between grounded and floating beliefs to describe this belief.
Second, when LWers talk about beliefs, or "the map,"...
I think this is the kind of causal loop he has in mind. But a key feature of the hypothesis is that you can't predict what's meant to happen. In that case, he's equally good at predicting any outcome, so it's a perfectly uninformative hypothesis.
Is this what CFAR is trying to do?
I would be interested to hear what other members of the community think about this. I accidentally found Bayes after being trained as a physicist, which is not entirely unlike traditional rationality. But I want to teach my brother, who doesn't have any science or rationality background. Has anyone had success with starting at Bayes and going from there?
I jest, but the sense of the question is serious. I really do want to teach the people I'm close to how to get started on rationality, and I recognize that I'm not perfect at it either. Is there a serious conversation somewhere on LW about being an aspiring rationalist living in an irrational world? Best practices, coping mechanisms, which battles to pick, etc?
My mother's husband professes to believe that our actions have no control over the way in which we die, but that "if you're meant to die in a plane crash and avoid flying, then a plane will end up crashing into you!" for example.
After explaining how I would expect that belief to constrain experience (like how it would affect plane crash statistics), as well as showing that he himself was demonstrating his unbelief every time he went to see a doctor, he told me that you "just can't apply numbers to this," and "Well, you shouldn't tempt fate."
My question to the LW community is this: How do you avoid kicking people in the nuts all of the time?
Think of them as 3-year-olds who won't grow up until after the Singularity. Would you kick a 3-year-old who made a mistake?
I agree with you, a year and a half late. In fact, the idea can be extended to EY's concept of "floating beliefs," webs of code words that are only defined with respect to one another, and not with respect to evidence. It should be noted that if at any time, a member of the web is correlated in some way with evidence, then so is the entire web.
In that sense, it doesn't seem like wasted effort to maintain webs of "passwords," as long as we're responsible about updating our best guesses about reality based on only those beliefs that are evidence-related. In the long term, given enough memory capacity, it should speed our understanding.
Unless I misunderstand, this story is a parable. EY is communicating with a handwaving example that the effectiveness of a code doesn't depend on the alphabet used. In the code used to describe the plate phenomenon, “magic” and “heat conduction” are interchangeable symbols which formally carry zero information, since the coder doesn't use them to discriminate among cases.
I’m sincerely confused as to why comments center on the motivations of the students and the professor. Isn't that irrelevant? Or did EY mean for the discussion to go this way? Does it matter?
It's true: if you're optimizing for altruism, cooperation is clearly better.
I guess it's not really a "dilemma" as such, since the optimal solution doesn't depend at all on what anyone else does. If you're trying to maximize EV, defect. If you're trying to maximize other people's EV, cooperate.