All of taiyo's Comments + Replies

taiyo40

Thank you for this info. I've signed up. I think this flipped my mood from gloomy to happy.

Incidentally, this is the second study I've signed up for via the web. The first is the Good Judgement Project which has been a fun exercise so far.

taiyo10

I think minesweeper makes a nice analogy with many of the ideas of epistemic rationality espoused in this community. At a basic level, it demonstrates how probabilities are subjectively objective -- our state of information (the board state) is what determines the probability of a mine under an unknown square but that there really is only one correct set of mine probabilities. However, we also run quickly into the problem of bounded cognition. In this situation we resort to heuristics. Of course, heuristics are of varying quality, and it is possible w... (read more)

taiyo30

K. S. Van Horn gives a few lines describing the derivation in his PT:TLoS errata. I don't understand why he does step 4 there -- it seems to me to be irrelevant. The two main facts which are needed are step 2-3 and step 5, the sum of a geometric series and the Taylor series expansion around y = S(x). Hopefully that is a good hint.

Nitpicking with his errata, 1/(1-z) = 1 + z + O(z^2) for all z is wrong since the interval of convergence for the RHS is (-1,1). This is not important to the problem since the z here will be z = exp(-q) which is less than 1 since q is positive.

0Morendil
Is there anything more to getting 2.53 than just rearranging things around? I'm not sure I really understand where we get the left-hand side from.
3Soki
It is not very important, but since you mentioned it : The interval of convergence of the Taylor series of 1/(1-z) at z=0 is indeed (-1,1). But "1/(1-z) = 1 + z + O(z^2) for all z" does not make sense to me. 1/(1-z) = 1 + z + O(z^2) means that there is an M such as |1/(1-z) - (1 + z)| is no greater that M*z^2 for every z close enough to 0. It is about the behavior of 1/(1-z) - (1 + z) when z tends toward 0, not when z belongs to (-1,1).
0Morendil
Indeed, thanks!
taiyo70

I would like to share some interesting discussion on a hidden assumption used in Cox's Theorem (this is the result which states that what falls out of the desiderata is a probability measure).

First, some criticism of Cox's Theorem -- a paper by Joseph Y. Halpern published in the Journal of AI Research. Here he points out an assumption which is necessary to arrive at the associative functional equation:

F(x, F(y,z)) = F(F(x,y), z) for all x,y,z

This is (2.13) in PT:TLoS

Because this equation was derived by using the associativity of the conjunction opera... (read more)

taiyo00

Ah OK. You're right. I guess I was taking the 'extension of logic' thing a little too far there. I had it in my head that ({any prop} | {any contradiction}) = T since contradictions imply anything. Thanks.

0Cyan
That's legit so far as it goes -- it's just that every proposition is also false at the same time, since every proposition's negation is also true, and the whole enterprise goes to shit. There's no point in trying to extend logic to uncertain propositions when you can prove anything.
taiyo00

Yeah. My solution is basically the same as yours. Setting A=B=C makes F(T,T) = T. But setting A=B AND C -> ~A makes F(T,T) = F (warning: unfortunate notation collision here).

2Cyan
Given C -> ~A, ({any proposition} | AC) is undefined. That's why I couldn't follow your argument all the way.
taiyo20

Yeah. A total derivative. The way I think about it is the dv thing there (jargon: a differential 1-form) eats a tangent vector in the y-z plane. It spits out the rate of change of the function in the direction of the vector (scaled appropriately with the magnitude of the vector). It does this by looking at the rate of change in the y-direction (the dy stuff) and in the z-direction (the dz stuff) and adding those together (since after taking derivatives, things get nice and linear).

I'm not too familiar with the functional equation business either. I... (read more)

0Morendil
I'm totally stuck on getting 2.50 from 2.48, would appreciate a hint.
4Soki
I could not figure out why alpha > 0 neither and it seems wrong to me too. But this does not look like a problem. We know that J is an increasing function because of 2-49. So in 2-53, alpha and log(x/S(x)) must have the same sign, since the remaining of the right member tends toward 0 when q tends toward + infinity. Then b is positive and I think it is all that matters. However, if alpha = 0, b is not defined. But if alpha=0 then log(x/S(x))=0 as a consequence of 2-53, so x/S(x)=1. There is only one x that gives us this since S is strictly decreasing. And by continuity we can still get 2-56.
taiyo00

I did not go through the 9 remaining cases, but I did think about one...

Suppose (AB|C) = F[(A|BC) , (B|AC)]. Compare A=B=C with (A = B) AND (C -> ~A).

Re 2-7: Yep, chain rule gets it done. By the way, took me a few minutes to realize that your citation "2-7" refers to a line in the pdf manuscript of the text. The numbering is different in the hardcopy version. In particular, it uses periods (e.g. equation 2.7) instead of dashes (e.g. equation 2-7), so as long as we're all consistent with that, I don't suppose there will be much confusion.

1Cyan
Not sure what you're getting at. To rule out (AB|C) = F[(A|BC) , (B|AC)], set A = B and let A's plausibility given C be arbitrary. Let T represent the (fixed) plausibility of a tautology. Then we have (A|BC) = (B|AC) = T (because A = B) (AB|C) = F(T, T) = constant But (AB|C) is arbitrary by hypothesis, so (AB|C) = F[(A|BC) , (B|AC)] is not useful. ETA: Credit where it's due: page 13, point 4 of Kevin S. Van Horne's guide to Cox's theorem (warning: pdf).
3Morendil
OK, thanks. I'm able to follow a fair bit of what's going on here; the hard portions for me are when Jaynes gets some result without saying which rule or operation justifies it - I suppose it's obvious to someone familiar with calculus, but when you lack these background assumptions it can be very hard to infer what rules are being used, so I can't even find out how I might plug the gaps in my knowledge. (Definitely "deadly unk-unk" territory for me.) (Of course "follow" isn't the same thing at all as "would be able to get similar results on a different but related problem". I grok the notion of a functional equation, and I can verify intermediate steps using a symbolic math package, but Jaynes' overall strategy is obscure to me. Is this a common pattern, taking the derivative of a functional equation then integrating back?) The next bit where I lose track is 2.22. What's going on here, is this a total derivative?
4Kazuo_Thow
Could we standardize on using the whole-book-as-one-PDF version, at least for the purposes of referencing equations? ETA: So far I've benefited from checking the relevant parts of Kevin Van Horn's unofficial errata pages before (and often while) reading a particular section.
taiyo20

Jaynes discusses a "tricky point" with regard to the difference between the everyday >meaning of the verb "imply" and its logical meaning; are there other differences between >the formal language of logic and everyday language?

In formal logic, the disjunction "or" is inclusive -- "A or B" is true if A and B are true. In everyday language, typically "or" is exclusive -- "A or B" is meant to exclude the possibility that A and B are both true.

taiyo30

I'm not claiming that working from the definition of derivative is the best way to present the topic. But it is certainly necessary to present the definition if the calculus is being taught in math course. Part of doing math is being rigorous. Doing derivatives without the definition is just calling on a black box.

On the other hand, once one has the intuition for the concept in hand through more tangible things like pictures, graphs, velociraptors, etc., the definition falls out so naturally that it ceases to be something which is memorized and is something that can be produced ``on the fly''.

2wedrifid
A definition is a black box (that happens to have official status). The process I describe above leads, when managed with foresight, to an intuitive way to produce a definition. Sure, it may not include the slogan "brought to you by apostrophe, the letters LIM and an arrow" but you can go on to tell them "this is how impressive mathematcians say you should write this stuff that you already understand" and they'll get it. I note that some people do learn best by having a black box definition shoved down their throats while others learn best by building from a solid foundation of understanding. Juggling both types isn't easy.
taiyo60

I teach calculus often. Students don't get hung up on mechanical things like (x^3)' = 3x^2. They instead get hung up on what

%20=%20\lim_{h%20\to%200}%20\dfrac{f(x+h)%20-%20f(x)}{h})

has to do with the derivative as a rate of change or as a slope of a tangent line. And from the perspective of a calculus student who has gone through the standard run of American school math, I can understand. It does require a level up in mathematical sophistication.

-1wedrifid
That's the problem. See that bunch of symbols? That isn't the best way to teach stuff. It is like trying to teach them math while speaking a foreign language (even if technically we are saving the greek till next month). To teach that concept you start with the kind of picture I was previously describing, have them practice that till they get it then progress to diagrams that change once in the middle, etc. Perhaps the students here were prepared differently but the average student started getting problems with calculus when it reached a point slightly beyond what you require for the basic physics we were talking about here. ie. they would be able to do 1. and but have no chance at all with 2:
taiyo00

I'm not sure about all the details, but I believe that there was a small kerfuffle a few decades ago over a suggestion to change the apex of U.S. ``school mathematics'' from calculus to a sort of discrete math for programming course. I cannot remember what sort of topics were suggested though. I do remember having the impression that the debate was won by the pro-calculus camp fairly decisively -- of course, we all see that school mathematics hasn't changed much.

taiyo30

Probability theory as extended logic.

I think it can be presented in a manner accessible to many (Jaynes PT:LOS is not accessible to many).

taiyo60

Jaynes references Polya's books on the role of plausible reasoning in mathematical investigations. The three volumes are How to Solve it, and two volumes of Mathematics and Plausible Reasoning. They are all really fun and interesting books which kind of give a glimpse of the cognitive processes of a successful mathematician.

Particularly relevant to Jaynes' discussion of weak syllogisms and plausibility is a section of Vol. 2 of Mathematics and Plausible Reasoning which gives many other kinds of weak syllogisms. Things like: "A is analogous to B, B ... (read more)

taiyo30

Along with the distinction between causal and logical connections, when considering the conditional premise of the syllogisms (if A then B), Jaynes warns us to distinguish between those conditional statements of a purely formal character (the material conditional ) and those which assert a logical connection.

It seems to me that the weak syllogisms only "do work" when the conditional premise is true due to a logical connection between antecedent and consequent. If no such connection exists, or rather, if our mind cannot establish such a connectio... (read more)

taiyo10

Upon further study, I disagree with myself here. It does seem like entropy as a measurement of uncertainty in probability distributions does more or less fall out of the Cox Polya desiderata. I guess that 'common sense' one is pretty useful!

taiyo30

I wonder if Jaynes' statement is really true? Here is an example that is on my mind because I'm reading the (thus far) awesome book The Making of the Atomic Bomb. Apologies if I get details wrong:

In the 1930s, there was a lot of work done on neutron bombardment of uranium. At some point, Fermi fired slow moving neutrons at uranium and got a bunch of interesting reaction products that he concluded were most plausibly transuranic elements. I believe he came to this conclusion because the models of the day discounted the hypothesis that a slow moving neut... (read more)

taiyo00

I think this is not so important, but it helpful to think about nonetheless. I guess the first step is to define what is meant by 'Bayesian'. In my original comment, I took one necessary condition to be that a Bayesian gadget is one which follows from the Cox-Polya desiderata. It might be better to define it to be one which uses Bayes' Theorem. I think in either case, Maxent fails to meet the criteria.

Maxent produces the distribution on the sample space which maximizes entropy subject to any known constraints which presumably come from data. If there ... (read more)

1taiyo
Upon further study, I disagree with myself here. It does seem like entropy as a measurement of uncertainty in probability distributions does more or less fall out of the Cox Polya desiderata. I guess that 'common sense' one is pretty useful!
taiyo20

I think Jaynes more or less defines 'Bayesian methods' to be those gadgets which fall out of the Cox-Polya desiderata (i.e. probability theory as extended logic). Actually, this can't be the whole story given the following quote on page xxiii:

"It is true that all 'Bayesian' calculations are included automatically as particular cases of our rules; but so are all 'frequentist' calculations. Nevertheless, our basic rules are broader than either of these."

In any case, Maximum entropy gives you the pre-Bayesian ensemble (I got that word from here) w... (read more)

2gwern
IIRC, this was my understanding of Jaynes's position on maxent: 1. the Cox-Polya desiderata say that multiple allowed derivations of a problem ought to all lead to the same answer 2. if we consider a list of identifiers about which we know nothing, and we ask whether the first one is more likely than the nth one, then we should answer that they are equal, because if we say either greater than or less than, we could shuffle the list and get a contradictory answer. By induction, we ought to say that all members of the list are equiprobable, which only allows entries to be 1/n probable. 3. hence, we get the Principle of Indifference. (Points 1-3 are my version of chapter 2 or 3, IIRC.) 4. Maxent is just the same idea, abstract and applied to non-list thingies. (I haven't actually gotten this far, but it seems like the obvious next step.) The arguments seem to me to be as Bayesian as anything in his building up of Bayesian methods from the Cox-Polya criteria.
3Morendil
Jaynes recommends MaxEnt for situations when "the Bayesian apparatus", consisting of "a model, a sample space, hypothesis space, prior probabilities, sampling distribution" is not yet available, and only a sample space can be defined.
taiyo00

I'm (still) in!

I live in Davis, California, USA which is about an hour from the Bay Area.

taiyo00

The link to the pdf version seems to be missing in the original post.

taiyo00

I'm enthusiastically in.

taiyo100

My name is Taiyo Inoue. I am a 32, male, father of a 1 year old son, married, and a math professor. I enjoy playing the acoustic guitar (American primitive fingerpicking), playing games, and soaking up the non-poisonous bits of the internet.

I went through 12 years of math study without ever really learning that probability theory is the ultimate applied math. I played poker for a bit during the easy money boom for fun and hit on basic probability theory which the 12 year old me could have understood, but I was ignorant of the Bayesian framework for epi... (read more)

4[anonymous]
I'm just realizing this myself; probability theory is epistemology.
taiyo50

Hi.

Any comments I've made have been in the last few months. Ive been lurking this site since its inception.

taiyo00

I have no problem letting decomposition refer to details of mind that can be adjusted independently of others. I can imagine such things. But I do not know if such things actually exist in my mind.

I have a math background so I tend to think a bit like that. Here's a silly analogy: consider a linear transformation on a vector space. Sometimes there are invariant subspaces of the vector space called eigenspaces, but sometimes there are not. In this case, you cannot just analyze the effect of a linear transformation by examining a smaller subspace of th... (read more)

taiyo70

I'd like to offer what I think might be objections to this post. When I imagine myself as a non-reductionist and non-materialist reading this post (I am, in fact, neither of these things), I believe I find myself unconvinced by this thought experiment. I suppose I'm not sure convincing this hypothetical me is the goal... nonetheless here are my hypothetical objections:

  1. When I introspect on my thought processes, I am using my mind. I might imagine that I can isolate a "specks of consciousness" just as you ask me to do, but this is a fact about

... (read more)
1bogus
There is a big difference between decomposing consciousness into its constituent fragments, vs. simply identifying the degrees of freedom in your conscious experience. Drilling down into your introspection and identifying the smallest specks of experience which you can think of as being details of your consciousness is doing the latter. You can go about modifying any of these tiny details more or less independently, but your consciousness is still a self-contained object. It's only after you isolate a single electron that you can talk about what its conscious experience would look and feel like.
taiyo40

There is a sign problem when iev(A,B) is defined. You mention that you can get the mutual information of A and B by taking -log_2 of probabilistic evidence pev(A,B) = P(AB) / [P(A)P(B)], but this creates an extra negative sign:

-log_2(pev(A,B)) = -log_2[P(AB) / [P(A)P(B)]] = -[ log_2(P(AB)) - [log_2(P(A)) + log_2(P(B))] ] = -log_2(P(AB)) + log_2(P(A)) + log_2(P(B)) = inf(AB) - inf(A) - inf(B).

taiyo230

I donated 100 USD to the general fund.

I am a lurker -- always taking and never giving. This might change, but perhaps not. In any case, this opportunity is an effective way for me to give back to a community that has given me so much. Thank you.

i donated $500 to the general fund (thought a little about whether to give directly to the paper that describes how many lives per dollar could be saved by a friendly singularity, then i decided SIAI is a better judge of where to put the money. If a good massage pillow for eliezer and/or marcello improves the advent of FAI by a couple of weeks, it is money well spent :) ) I donated via my brother in california. Amounts to around 3% of my aftertax yearly income. (Indian rupee ratio is 46 to 1, really pinches)

Reasons for donating - Karmically (universal cos... (read more)