I think minesweeper makes a nice analogy with many of the ideas of epistemic rationality espoused in this community. At a basic level, it demonstrates how probabilities are subjectively objective -- our state of information (the board state) is what determines the probability of a mine under an unknown square but that there really is only one correct set of mine probabilities. However, we also run quickly into the problem of bounded cognition. In this situation we resort to heuristics. Of course, heuristics are of varying quality, and it is possible w...
Lovely. Thanks.
K. S. Van Horn gives a few lines describing the derivation in his PT:TLoS errata. I don't understand why he does step 4 there -- it seems to me to be irrelevant. The two main facts which are needed are step 2-3 and step 5, the sum of a geometric series and the Taylor series expansion around y = S(x). Hopefully that is a good hint.
Nitpicking with his errata, 1/(1-z) = 1 + z + O(z^2) for all z is wrong since the interval of convergence for the RHS is (-1,1). This is not important to the problem since the z here will be z = exp(-q) which is less than 1 since q is positive.
I would like to share some interesting discussion on a hidden assumption used in Cox's Theorem (this is the result which states that what falls out of the desiderata is a probability measure).
First, some criticism of Cox's Theorem -- a paper by Joseph Y. Halpern published in the Journal of AI Research. Here he points out an assumption which is necessary to arrive at the associative functional equation:
F(x, F(y,z)) = F(F(x,y), z) for all x,y,z
This is (2.13) in PT:TLoS
Because this equation was derived by using the associativity of the conjunction opera...
Ah OK. You're right. I guess I was taking the 'extension of logic' thing a little too far there. I had it in my head that ({any prop} | {any contradiction}) = T since contradictions imply anything. Thanks.
Yeah. My solution is basically the same as yours. Setting A=B=C makes F(T,T) = T. But setting A=B AND C -> ~A makes F(T,T) = F (warning: unfortunate notation collision here).
Yeah. A total derivative. The way I think about it is the dv thing there (jargon: a differential 1-form) eats a tangent vector in the y-z plane. It spits out the rate of change of the function in the direction of the vector (scaled appropriately with the magnitude of the vector). It does this by looking at the rate of change in the y-direction (the dy stuff) and in the z-direction (the dz stuff) and adding those together (since after taking derivatives, things get nice and linear).
I'm not too familiar with the functional equation business either. I...
I did not go through the 9 remaining cases, but I did think about one...
Suppose (AB|C) = F[(A|BC) , (B|AC)]. Compare A=B=C with (A = B) AND (C -> ~A).
Re 2-7: Yep, chain rule gets it done. By the way, took me a few minutes to realize that your citation "2-7" refers to a line in the pdf manuscript of the text. The numbering is different in the hardcopy version. In particular, it uses periods (e.g. equation 2.7) instead of dashes (e.g. equation 2-7), so as long as we're all consistent with that, I don't suppose there will be much confusion.
Jaynes discusses a "tricky point" with regard to the difference between the everyday >meaning of the verb "imply" and its logical meaning; are there other differences between >the formal language of logic and everyday language?
In formal logic, the disjunction "or" is inclusive -- "A or B" is true if A and B are true. In everyday language, typically "or" is exclusive -- "A or B" is meant to exclude the possibility that A and B are both true.
I'm not claiming that working from the definition of derivative is the best way to present the topic. But it is certainly necessary to present the definition if the calculus is being taught in math course. Part of doing math is being rigorous. Doing derivatives without the definition is just calling on a black box.
On the other hand, once one has the intuition for the concept in hand through more tangible things like pictures, graphs, velociraptors, etc., the definition falls out so naturally that it ceases to be something which is memorized and is something that can be produced ``on the fly''.
I teach calculus often. Students don't get hung up on mechanical things like (x^3)' = 3x^2. They instead get hung up on what
%20=%20\lim_{h%20\to%200}%20\dfrac{f(x+h)%20-%20f(x)}{h})
has to do with the derivative as a rate of change or as a slope of a tangent line. And from the perspective of a calculus student who has gone through the standard run of American school math, I can understand. It does require a level up in mathematical sophistication.
I'm not sure about all the details, but I believe that there was a small kerfuffle a few decades ago over a suggestion to change the apex of U.S. ``school mathematics'' from calculus to a sort of discrete math for programming course. I cannot remember what sort of topics were suggested though. I do remember having the impression that the debate was won by the pro-calculus camp fairly decisively -- of course, we all see that school mathematics hasn't changed much.
Probability theory as extended logic.
I think it can be presented in a manner accessible to many (Jaynes PT:LOS is not accessible to many).
Jaynes references Polya's books on the role of plausible reasoning in mathematical investigations. The three volumes are How to Solve it, and two volumes of Mathematics and Plausible Reasoning. They are all really fun and interesting books which kind of give a glimpse of the cognitive processes of a successful mathematician.
Particularly relevant to Jaynes' discussion of weak syllogisms and plausibility is a section of Vol. 2 of Mathematics and Plausible Reasoning which gives many other kinds of weak syllogisms. Things like: "A is analogous to B, B ...
Along with the distinction between causal and logical connections, when considering the conditional premise of the syllogisms (if A then B), Jaynes warns us to distinguish between those conditional statements of a purely formal character (the material conditional ) and those which assert a logical connection.
It seems to me that the weak syllogisms only "do work" when the conditional premise is true due to a logical connection between antecedent and consequent. If no such connection exists, or rather, if our mind cannot establish such a connectio...
Upon further study, I disagree with myself here. It does seem like entropy as a measurement of uncertainty in probability distributions does more or less fall out of the Cox Polya desiderata. I guess that 'common sense' one is pretty useful!
I wonder if Jaynes' statement is really true? Here is an example that is on my mind because I'm reading the (thus far) awesome book The Making of the Atomic Bomb. Apologies if I get details wrong:
In the 1930s, there was a lot of work done on neutron bombardment of uranium. At some point, Fermi fired slow moving neutrons at uranium and got a bunch of interesting reaction products that he concluded were most plausibly transuranic elements. I believe he came to this conclusion because the models of the day discounted the hypothesis that a slow moving neut...
I think this is not so important, but it helpful to think about nonetheless. I guess the first step is to define what is meant by 'Bayesian'. In my original comment, I took one necessary condition to be that a Bayesian gadget is one which follows from the Cox-Polya desiderata. It might be better to define it to be one which uses Bayes' Theorem. I think in either case, Maxent fails to meet the criteria.
Maxent produces the distribution on the sample space which maximizes entropy subject to any known constraints which presumably come from data. If there ...
I think Jaynes more or less defines 'Bayesian methods' to be those gadgets which fall out of the Cox-Polya desiderata (i.e. probability theory as extended logic). Actually, this can't be the whole story given the following quote on page xxiii:
"It is true that all 'Bayesian' calculations are included automatically as particular cases of our rules; but so are all 'frequentist' calculations. Nevertheless, our basic rules are broader than either of these."
In any case, Maximum entropy gives you the pre-Bayesian ensemble (I got that word from here) w...
I'm (still) in!
I live in Davis, California, USA which is about an hour from the Bay Area.
I'm enthusiastically in.
My name is Taiyo Inoue. I am a 32, male, father of a 1 year old son, married, and a math professor. I enjoy playing the acoustic guitar (American primitive fingerpicking), playing games, and soaking up the non-poisonous bits of the internet.
I went through 12 years of math study without ever really learning that probability theory is the ultimate applied math. I played poker for a bit during the easy money boom for fun and hit on basic probability theory which the 12 year old me could have understood, but I was ignorant of the Bayesian framework for epi...
Hi.
Any comments I've made have been in the last few months. Ive been lurking this site since its inception.
I have no problem letting decomposition refer to details of mind that can be adjusted independently of others. I can imagine such things. But I do not know if such things actually exist in my mind.
I have a math background so I tend to think a bit like that. Here's a silly analogy: consider a linear transformation on a vector space. Sometimes there are invariant subspaces of the vector space called eigenspaces, but sometimes there are not. In this case, you cannot just analyze the effect of a linear transformation by examining a smaller subspace of th...
I'd like to offer what I think might be objections to this post. When I imagine myself as a non-reductionist and non-materialist reading this post (I am, in fact, neither of these things), I believe I find myself unconvinced by this thought experiment. I suppose I'm not sure convincing this hypothetical me is the goal... nonetheless here are my hypothetical objections:
When I introspect on my thought processes, I am using my mind. I might imagine that I can isolate a "specks of consciousness" just as you ask me to do, but this is a fact about
There is a sign problem when iev(A,B) is defined. You mention that you can get the mutual information of A and B by taking -log_2 of probabilistic evidence pev(A,B) = P(AB) / [P(A)P(B)], but this creates an extra negative sign:
-log_2(pev(A,B)) = -log_2[P(AB) / [P(A)P(B)]] = -[ log_2(P(AB)) - [log_2(P(A)) + log_2(P(B))] ] = -log_2(P(AB)) + log_2(P(A)) + log_2(P(B)) = inf(AB) - inf(A) - inf(B).
I donated 100 USD to the general fund.
I am a lurker -- always taking and never giving. This might change, but perhaps not. In any case, this opportunity is an effective way for me to give back to a community that has given me so much. Thank you.
i donated $500 to the general fund (thought a little about whether to give directly to the paper that describes how many lives per dollar could be saved by a friendly singularity, then i decided SIAI is a better judge of where to put the money. If a good massage pillow for eliezer and/or marcello improves the advent of FAI by a couple of weeks, it is money well spent :) ) I donated via my brother in california. Amounts to around 3% of my aftertax yearly income. (Indian rupee ratio is 46 to 1, really pinches)
Reasons for donating - Karmically (universal cos...
Thank you for this info. I've signed up. I think this flipped my mood from gloomy to happy.
Incidentally, this is the second study I've signed up for via the web. The first is the Good Judgement Project which has been a fun exercise so far.