Comment author: twanvl 20 July 2014 08:51:54PM 0 points [-]

This game is exactly equivalent to the standard one where player one chooses from (A,B,C) and player two chooses from (X,Y), with the payoff for (A,X) and for (A,Y) equal to (3,0). When choosing what choice to make, player two can ignore the case where player one chooses A, since the payoffs are the same in that case.

And as others have said, the pure strategy (A,X) is a Nash equilibrium.

Comment author: Squark 25 February 2014 09:55:29AM *  1 point [-]

I'm probably explaining it poorly in the post. P0 is not just a function of statements in F. P0 is a probability measure on the space of truth assignments i.e. functions {statement in F} -> {truth, false}. This probability measure is defined by making the truth value of each statement an independent random variable with 50/50 distribution.

PD is produced from P0 by imposing the condition "there is no contradiction of length <= D" on the truth assignment, i.e. we set the probability of all truth assignments that violate the condition to 0 and renormalize the probabilities of all other assignments. In other words P_D(s) = # {D-consistent truth assignments in which s is assigned true} / # {D-consistent truth assignments}.

Technicality: There is an infinite number of statements so there is an infinite number of truth assignments. However there is only a finite number of statements that can figure in contradictions of length <= D. Therefore all the other statements can be ignored (i.e. assumed to have independent probabilities of 1/2 like in P_0). More formally, the sigma-algebra of measurable sets on the space of truth assignments is generated by sets of the form {truth assignment T | T(s) = true} and {truth assignment T | T(s) = false}. The set of D-consistent truth assignments is in this sigma algebra and has positive probability w.r.t. our measure (as long as F is D-consistent) so we can use this set to form a conditional probability measure.

Comment author: twanvl 25 February 2014 03:07:41PM 0 points [-]

Thanks, that cleared things up.

Comment author: Squark 24 February 2014 08:21:58PM 1 point [-]

"there are no contradictions of length <= D" is not a statement in F, it is a statement about truth assignments. I'm evaluating the probability that s is assigned "true" by the random truth assignment under the condition that this truth assignment is free of short contradictions.

Comment author: twanvl 24 February 2014 09:53:50PM 1 point [-]

Right, but P_0(s) is defined for statements s in F. Then suddenly you talk about P_0(s | there is no contradiction of length <= D), but the thing between parentheses is not a statement in F. So, what is the real definition of P_D? And how would I compute it?

Comment author: twanvl 24 February 2014 02:04:52PM 1 point [-]

PD(s) := P0(s | there are no contradictions of length <= D).

You have not actually defined what P_0(a | b) means. The usual definition would be P_0(a | b) = P_0(a & b) / P_0(b). But then, by definition of P_0, P_0(a & b) = 0.5 and P_0(b) = 0.5, so P_0(a | b) = 1. Also, the statement "there are no contradictions of length <= D" is not even a statement in F.

Comment author: twanvl 19 February 2014 11:58:11AM 1 point [-]

As I understand it, the big difference between Bayesian and frequentist methods is in what they output. A frequentist methods gives you a single prediction $zt$, while a Bayesian method gives you a probability distribution over the predictions, $p(zt)$. If your immediate goal is to minimize a known (or approximable) loss function, then frequentist methods work great. If you want to combine the predictions with other things as part of a larger whole, then you really need to know the uncertainty of your prediction, and ideally you need the entire distribution.

For example, when doing OCR, you have some model of likely words in a text, and a detector that tells you what character is present in an image. To combine the two, you would use the probability of the image containing a certain character and multiply it by the probability of that character appearing at this point in an English sentence. Note that I am not saying that you need to use a fully Bayesian model to detect characters, just that you somehow need to estimate your uncertainty and be able to give alternative hypotheses.

In summary, combining multiple models is where Bayesian reasoning shines. You can easily paste multiple models together and expect to get a sensible result. On the other hand, for getting the best result efficiently, state of the art frequentist methods are hard to beat. And as always, the best thing is to combine the two as appropriate.

Comment author: twanvl 16 January 2014 03:59:51PM 1 point [-]

I am not convinced by the problematic example in the "Scientific Induction in Probabilistic Mathematics" writeup. Let's say that there are n atoms ϕ(1)..ϕ(n). If you don't condition, then because of symmetry, all consistent sets S drawn from the process have equal probability. So the prior on S is uniform and the probability of ϕ(i) is therefore 1/2, by

P(ϕ(i)) = ∑{S} 1[ϕ(i)∈S] * P(S)

n a consistent set S drawn from the process is exactly 1/2 for all i, this must be true by symmetry because μ(x)=μ(¬x). Now what you should do to condition on some statement X is simply throw out the sets S which don't satisfy that statement, i.e.

P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * P(S) * 1[X(S)] / ∑{S} P(S) * 1[X(S)]

Since the prior on S was uniform, it will still be uniform on the restricted set after conditioning. So

P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * 1[X(S)] / ∑{S} 1[X(S)]

Which should just be 90% in the example where X is "90% of the ϕ are true"

The mistake in the writeup is to directly define P(S|X) in an inconsistent way.

To avoid drowning in notation, let's consider a simpler example with the variables a, b and c. We will first pick a or ¬a uniformly, then b or ¬b, and finally c or ¬c. Then we try to condition on X="exactly one of a,b,c is true". You obviously get prior probabilities P(S) = 1/8 for all consistent sets.

If you condition the right way, you get P(S) = 1/3 for the sets with one true attom, and P(S)=0 for the other sets. So then

P(a|X) = P(a|{a,¬b,¬c})P({a,¬b,¬c}|X) + P(a|{¬a,b,¬c})P({¬a,b,¬c}|X) + P(a|{¬a,¬b,c})P({¬a,¬b,c}|X)
= 1/3

What the writeup does instead is first pick a or ¬a uniformly. If it picks a, we know that b and c are false. If we pick ¬a we continue. The uniform choice of a is akin to saying that

P({a,b,c}|X) = P(a) * P({b,c}|a,X).

But that first term should be P(a|X), not P(a)!

Comment author: twanvl 17 January 2014 02:59:51PM 0 points [-]

After writing this I realize that there is a much simpler prior on finite sets S of consistent statements: simply have a prior over all sets of statements, and keep only the consistent ones. If your language is chosen such that it contains X if and only if it also contains ¬X, then this is equivalent to choosing a truth value for each basic statement, and a uniform prior over these valuations would work fine.

Comment author: twanvl 16 January 2014 03:59:51PM 1 point [-]

I am not convinced by the problematic example in the "Scientific Induction in Probabilistic Mathematics" writeup. Let's say that there are n atoms ϕ(1)..ϕ(n). If you don't condition, then because of symmetry, all consistent sets S drawn from the process have equal probability. So the prior on S is uniform and the probability of ϕ(i) is therefore 1/2, by

P(ϕ(i)) = ∑{S} 1[ϕ(i)∈S] * P(S)

n a consistent set S drawn from the process is exactly 1/2 for all i, this must be true by symmetry because μ(x)=μ(¬x). Now what you should do to condition on some statement X is simply throw out the sets S which don't satisfy that statement, i.e.

P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * P(S) * 1[X(S)] / ∑{S} P(S) * 1[X(S)]

Since the prior on S was uniform, it will still be uniform on the restricted set after conditioning. So

P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * 1[X(S)] / ∑{S} 1[X(S)]

Which should just be 90% in the example where X is "90% of the ϕ are true"

The mistake in the writeup is to directly define P(S|X) in an inconsistent way.

To avoid drowning in notation, let's consider a simpler example with the variables a, b and c. We will first pick a or ¬a uniformly, then b or ¬b, and finally c or ¬c. Then we try to condition on X="exactly one of a,b,c is true". You obviously get prior probabilities P(S) = 1/8 for all consistent sets.

If you condition the right way, you get P(S) = 1/3 for the sets with one true attom, and P(S)=0 for the other sets. So then

P(a|X) = P(a|{a,¬b,¬c})P({a,¬b,¬c}|X) + P(a|{¬a,b,¬c})P({¬a,b,¬c}|X) + P(a|{¬a,¬b,c})P({¬a,¬b,c}|X)
= 1/3

What the writeup does instead is first pick a or ¬a uniformly. If it picks a, we know that b and c are false. If we pick ¬a we continue. The uniform choice of a is akin to saying that

P({a,b,c}|X) = P(a) * P({b,c}|a,X).

But that first term should be P(a|X), not P(a)!

Comment author: Viliam_Bur 19 November 2013 11:55:27AM *  28 points [-]

People probably need two kinds of communities -- let's call them "feelings-oriented community" and "outcome-oriented community" (or more simply "home" and "work", but that has some misleading connotations).

A "feelings-oriented community" is a community of people who meet because they enjoy being together and feel safe with each other. The examples are a functional family, a church group, friends meeting in a pub, etc.

An "outcome-oriented community" is a community that has an explicit goal, and people genuinely contribute to making that goal happen. The examples are a business company, an NGO, a Toastmasters meetup, etc.

The important part is what really happens inside the members' heads, not what they pretend to do. For example, you could have an NGO with twelve members, where two of them want to have the work done, but the remaining ten only come to socialize. Of course, even those ten will verbally support the explicit goals of the organization, but they will be much more relaxed about timing, care less about verifying the outcomes, etc. For them, the explicit goals are merely a source of identity and a pretext to meet people professing similar values; for them, the community is the real goal. If they had a magic button which would instantly solve the problem, making the organization obviously obsolete, they wouldn't push it. The people who are serious about the goal would love to see it completed as soon as possible, so they can move to some other goals. (I have seen a similar tension in a few organizations, and the usual solution seems to be the serious members forming an "organization within an organization", keeping the other ones around them for social and other purposes.)

As an evolutionary just-so story, we have a tribe composed of many different people, and within the tribe we have a hunters group, containing the best hunters. Members of the tribe are required to follow the norms of the tribe. Hunters must be efficient in their jobs. But hunters don't become a separate tribe... they go hunting for a while, and then return back to their original tribe. The tribe membership is for life, or at least for a long time; it provides safety and fulfills the emotional needs. Each hunting expedition is a short-termed event; it requires skills and determination. If a hunter breaks his legs, he can no longer be a hunter; but he still remains a member of his tribe.

I think a healthy way of living should be modelled like this; on two layers. To have a larger tribe based on shared values (rationality and altruism), and within this tribe a few working groups, both long-term (MIRI, CFAR) and short-term (organizers of the next meetup). Of course it could be a few overlapping tribes (the rationalists, the altruists), but the important thing is that you keep your social network even if you stop participating in some specific project -- otherwise we get either cultish pressure (you have to remain hard-working on our project even if you no longer feel so great about it, or you lose your whole social network) or inefficiency (people remain formally members of the project, but lately barely any work gets done, and the more active ones are warned not to rock the boat). Joining or leaving a project should not be motivated or punished socially.

Perhaps acknowledging this difference is one of the differences between a standard religion and a cult. The cult is a society and a workforce in one: if you stop working, your former friends throw you overboard, because now you are just a burden to them. For a less connotationally sensitive example, consider an average job: you may think about your colleagues as your friends, but if you leave the job, how many of them will you keep regular contact with? In contast with this, a regular church just asks you to come to sunday prayers, gives you some keywords and a few relatively simple rules. If this level of participation is ideal for you, welcome, brother or sister! And if you want more, feel free to join some higher-commitment group within the church. You choose the level of your participation, and you can change it during your life. For a non-religious example, in a good neighborhood you could have similar relations with your neighbors: some of you have the same jobs, some of you have the same hobby, some of you participate on a local short-term project; but you know each other and you will remain neighbors for years.

Actually, something like this is already naturally happening with LW: there are people who merely procrastinate on the LW website, and there are people who join some of the organizations mentioned here. The only problem is that the virtual community of LW readers is... virtual. Unless you live near each other, you can't have a beer together every week, can't go together for a trip or a vacation, can't together create an environment for your children where they will naturally internalize your values, can't help each other solve their random problems.

It would be great to have a LW village, where some people would work on effective altruism, others would work on building artificial intelligence, yet others would develop a rationality curriculum, and some would be too busy with their personal issues to do any of this now... but everyone would know that this is a village where good and sane people live, where cool things happen, and whichever of these good and real goals I will choose to prioritize, it's still a community where I belong. [EDIT: Actually, it would be great to have a village where 5% or 10% of people would be the LW community. Connotationally, it's not about being away from other people, but about being with my people.]

Comment author: twanvl 20 November 2013 05:48:36PM 1 point [-]

Do you have any evidence for your claim that people need these two layers? As far as I can tell this is just something for which you can make up a plausible sounding story.

there are people who merely procrastinate on the LW website, and there are people who join some of the organizations mentioned here

There is a (multidimensional) continuum of people on LW. It is not as black and white as you make it out to be.

In response to comment by V_V on Bayesianism for Humans
Comment author: ygert 29 October 2013 11:55:47AM *  9 points [-]

You should expect that, on average, a test will leave your beliefs unchanged.

Emphasis mine.

When I shake the box, my belief that the coin landed heads is 50%. When I look inside, my belief changes, yes, but two one of two options of equal probability: 0% (I see it came out tails), or 100% (I see it came out heads.)

It is trivial to see that my expected posterior belief is 0% * 1/2 + 100% * 1/2 = 50%, or in other words, it's exactly equal to my prior belief.

Comment author: twanvl 30 October 2013 01:30:16PM 4 points [-]

The question is whether 'change' signifies only a magnitude or also a direction. The average magnitude of the change in belief when doing an experiment is larger than zero. But the average of change as vector quantity, indicating the difference between belief after and before the test, is zero.

If you drive your car to work and back, then the average velocity of your trip is 0, but the average speed is positive.

Comment author: jamesf 26 October 2013 07:37:04AM *  2 points [-]

The market provides a continuous and generally valid test of engineering principles. I think it's more scientific than peer review, in the most meaningful sense of the word "science".

Comment author: twanvl 29 October 2013 03:26:04PM 0 points [-]

Not all engineering is about developing products to sell to consumers. Engineers also design bridges and rockets. I don't think these are subject to the open marker in any meaningful sense.

View more: Prev | Next