The explanation by owencb is what I was trying to address. To be explicit about when the offset is being added, I'm suggesting replacing your log1p(x) ≣ log(1 + x)
transformation with log(c + x)
for c
=10 or c
=100.
If the choice of log-dollars is just for presentation, it doesn't matter too much. But in a lesswrong-ish context, log-dollars also have connotations of things like the Kelly criterion, where it is taken completely seriously that there's more of a difference between $0 and $1 than between $1 and $3^^^3.
Given that at least 25% of respondents listed $0 in charity, the offset you add to the charity ($1 if I understand log1p
correctly) seems like it could have a large effect on your conclusions. You may want to do some sensitivity checks by raising the offset to, say, $10 or $100 or something else where a respondent might round their giving down to $0 and see if anything changes.
Curtis Yarvin, who looked to Mars for tips and tricks on writing a "tiny, diamond-perfect kernel" for a programming environment.
The Rasch model does not hate truth, nor does it love truth, but the truth if made out of items which it can use for something else.
This seems like a good occasion to quote the twist reveal in Orson Scott Card's Dogwalker:
...We stood there in his empty place, his shabby empty hovel that was ten times better than anywhere we ever lived, and Doggy says to me, real quiet, he says, "What was it? What did I do wrong? I thought I was like Hunt, I thought I never made a single mistake in this job. in this one job."
And that was it, right then I knew. Not a week before, not when it would do any good. Right then I finally knew it all, knew what Hunt had done. Jesse Hunt never made mista
This seems cool but I have a nagging suspicion that this reduces to greater generality and a handful of sentences if you use conditional expectation of the utility function and the Radon-Nikodym theorem?
Noun phrases that are insufficiently abstract.
echo chambers [...] where meaningless duckspeak is endlessly repeated
Imagine how intolerable NRx would be if it were to acquire one of these. Fortunately, their ideas are too extreme for 4chan, even, so I have no idea where such a forum would be hosted.
so I have no idea where such a forum would be hosted
It may strike you as anarchy, but pretty much anyone can host a forum on the internet for insignificant amounts of money or even for free.
How meaningful is the "independent" criterion given the heavy overlaps in works cited and what I imagine must be a fairly recent academic MRCA among all the researchers involved?
stupid problem
embarrassingly simple math since forever
I should have been years ahead of my peers
momentary lack of algebraic insight ("I could solve this in an instant if only I could get rid of that radical")
for which I've had the intuitions since before 11th grade when they began teaching it to us
Sorry to jump from object-level to meta-level here but it seems pretty clear that the problem here is not just about math. Your subjective assessments of how difficult these topics are is inconsistent with how well you report you are doing at them. A...
It's been a while since I've thought about how to learn ecology, but maybe check out Ben Bolker's Ecological Models and Data in R? It would also be a decent way to start to learn how to do statistics with R.
That is important destination but maybe too subtle a starting point.
Start with ecological models for inter-species interactions (predation, competition, mutualism, etc.) where there are more examples and the patterns are simpler, starker, and more intuitive. Roughly, death processes may depend on all involved populations but birth processes depend on each species separately. Then move to natural selection and evolution, intra-species interactions, where the birth processes for each genotype may depend on populations of all the different genotypes, and death processes depend on the phenotypes of all the different populations.
The conscientiousness/akrasia interactions are also fascinating, but even harder to measure. There's a serious missing-not-at-random censoring effect going on for people too conscientious to leave off digit ratio but too akrasic to do the measurement. I nearly fell into this bucket.
do what gwern does
Or do the complete opposite.
The impression I get of gwern is that he reads widely, thinks creatively, and experiments frequently, so he is constantly confronted with hypotheses that he has encountered or has generated. His use of statistics is generally confirmatory, in that he's using data to filter out unjustified hypotheses so he can further research or explore or theorize about the remaining ones.
Another thing you can do with data is exploratory data analysis, using statistics to pull out interesting patterns for further considerat...
No idea. Factor analysis is the standard tool to see that some instrument (fancy work for ability) is not unitary. It's worth learning about anyways, if it's not in your toolbox.
Some people like to layer trousers
A simple way to do this is flannel-lined jeans. The version of these made by L.L. Bean have worked well for me. They trade off a bit of extra bulkiness for substantially greater warmth and mildly improved wind protection. Random forum searches suggest that the fleece-lined ones are even warmer, but you lose the cool plaid patterning on the rolled up cuffs.
A not quite nit-picking critique of this phenomenon is that it's treating a complex cluster of abilities as a unitary one.
In some of the (non-Olympic!) distance races I've run, it's seemed to me that I just couldn't move my legs any faster than they were going. In others, I've felt great except for a side stitch that made me feel like I'd vomit if I pushed myself harder. And in still others, I couldn't pull in enough air to make my muscles do what I wanted. In the latter case, I'd definitely notice the lower oxygen levels but in the former cases, maybe I w...
Seconding a lot of calef's observations.
If the new topic you want to learn is "extended behavior networks", then maybe this is your best bet. But if you really want to learn about something like AI or ML or the design of agents that behave reasonably by the standards of some utility-like theory, then this is probably a bad choice. A quick search in Google Scholar (if you're not using this, or some equivalent, making this a step before going to the hivemind is a good idea) suggests that extended behavior networks are backwater-y. If the idea of a ...
While maybe not essential, the "anti-" aspect of the correlations induced by anthropic selection bias at least seems important. Obviously, the appropriate changes of variables can make any particular correlation go either positive or negative. But when the events all measure the same sort of thing (e.g., flooding in 2014, flooding in 2015, etc.), the selection bias seems like it would manifest as anti-correlation. Stretching an analogy beyond its breaking point, I can imagine these strange anti-correlations inducing something like anti-ferromagnetism.
To pick a frequentist algorithm is to pick a prior with a set of hypotheses, i.e. to make Bayes' Theorem computable and provide the unknowns on the r.h.s. above (as mentioned earlier you can in theory extract the prior and set of hypotheses from an algorithm by considering which outcome your algorithm would give when it saw a certain set of data, and then inverting Bayes' Theorem to find the unknowns.
Okay, this is the last thing I'll say here until/unless you engage with the Robins and Wasserman post that IlyaShpitser and I have been suggesting you look...
You're welcome for the link, and it's more than repaid by your causal inference restatement of the Robins-Ritov problem.
Of course arguably this entire setting is one Bayesians don't worry about (but maybe they should? These settings do come up).
Yeah, I think this is the heart of the confusion. When you encounter a problem, you can turn the Bayesian crank and it will always do the Right thing, but it won't always do the right thing. What I find disconcerting (as a Bayesian drifting towards frequentism) is that it's not obvious how to assess the adequacy...
Have you seen the series of blog posts by Robins and Wasserman that starts here? In problems like the one discussed there (such as the high-dimensional ones that are commonly seen these days), Bayesian procedures, and more broadly any procedures that satisfy the likelihood principle, just don't work. The procedures that do work, according to frequentist criteria, do not arise from the likelihood so it's hard to see how they could be approximations to a Bayesian solution.
You can also see this situation in the (frequentist) classic Theory of Point Estimation...
(Theoretical) Bayesian statistics is the study of probability flows under minimal assumptions - any quantity that behaves like we want a probability to behave can be described by Bayesian statistics.
But nobody, least of all Bayesian statistical practitioners, does this. They encounter data, get familiar with it, pick/invent a model, pick/invent a prior, run (possibly approximate) inference of the model against the data, verify if inference is doing something reasonable, and jump back to an earlier step and change something if it doesn't. After however l...
Thanks for pointing out the Gelman and Shalizi paper. Just skimmed it so far, but it looks like it really captures the zeitgeist of what reasonably thoughtful statisticians think of the framework they're in the business of developing and using.
Plus, their final footnote, describing their misgivings about elevating Bayesianism beyond a tool in the hypothetico-deductive toolbox, is great:
...Ghosh and Ramamoorthi (2003, p. 112) see a similar attitude as discouraging inquiries into consistency: ‘the prior and the posterior given by Bayes theorem [sic] are imper
I would advise looking into frequentist statistics before studying Bayesian statistics. Inference done under Bayesian statistics is curiously silent about anything besides the posterior probability, including whether the model makes sense for the data, whether the knowledge gained about the model is likely to match reality, etc. Frequentist concepts like consistency, coverage probability, ancillarity, model checking, etc., don't just apply to frequentist estimation; they can be used to asses and justify Bayesian procedures.
If anything, Bayesian statistics ...
This is really good and impressive. Do you have such a list for statistics?
The example I'm thinking about is a non-random graph on the square grid where west/east neighbors are connected and north/south neighbors aren't. Its density is asymptotically right at the critical threshold and could be pushed over by adding additional west/east non-neighbor edges. The connected components are neither finite nor giant.
If you want a solid year-long project, find a statistical model you like and figure out how to do inference in it with variational Bayes. If this has been done, change finite parts of the model into infinite ones until you reach novelty or the model is no longer recognizable/tractable. At that point, either try a new model or instead try to make the VB inference online or parallelizable. Maybe target a NIPS-style paper and a ~30-page technical report in addition to whatever your thesis will look like.
And attend a machine learning class, if offered. There's...
But to all of us perched on the back of Cthulhu, who is forever swimming left, is it the survey that will seem fixed and unchanging from our moving point of view?
Buehler, Denis. "Incomplete understanding of complex numbers Girolamo Cardano: a case study in the acquisition of mathematical concepts." Synthese 191.17 (2014): 4231-4252.
Vélez, Ricardo, and Tomás Prieto-Rumeau. "Random assignment processes: strong law of large numbers and De Finetti theorem." TEST (2014): 1-30.
In the mental health category, I'd love to see (adult) ADHD there as well. I'm less directly interested in substance abuse disorder and learning disabilities (in the US sense) / non-autism developmental disabilities, but those would be interesting additions too.
I'd believe that; my knowledge of music history isn't that great and seeing teleology where there isn't any is an easy mistake.
I guess what I'm saying, speaking very vaguely, is that melodies existing within their own tonal contexts are as old as bone flutes, and their theory goes back at least as far as Pythagoras. And most folk music traditions cooked up their own favorite scale system, which you can just stay in and make music as long as you want to. For that matter, notes in these scale systems can be played as chords and a lot of the combinations make...
In something like the Erdös-Rényi random graph, I agree that there is an asymptotic equivalence between the existence of a giant component and paths from a randomly selected points being able to reach the "edge".
On something like an n x n grid with edges just to left/right neighbors, the "edge" is reachable from any starting point, but all the connected components occupy just a 1/n fraction of the vertices. As n gets large, this fraction goes to 0.
Since, at least as a reductio, the details of graph structure (and not just its edge fract...
The idea that melodies, or at least an approximation accurate to within a few cents, can be embedded into a harmonic context. Yet in western art music, it took centuries for this to go from technically achievable but unthinkable to experimental to routine.
I think percolation theory concerns itself with a different question: is there a path from starting point to the "edge" of the graph, as the size of the graph is taken to infinity. It is easy to see that it is possible to hit infinity while infecting an arbitrarily small fraction of the population.
But there are crazy universality and duality results for random graphs, so there's probably some way to map an epidemic model to a percolation model without losing anything important?
This comment rubbed me the wrong way and I couldn't figure out why at first, which is why I went for a pithy response.
I think what's going on is I was reacting to the pragmatics of your exchange with Coscott. Coscott informally specified a model and then asked what we could conclude about a parameter of interest, which coin was chosen, given a sufficient statistic of all the coin toss data, the number of heads observed.
This is implicitly a statement that model checking isn't important in solving the problem, because everything that could be used for model ...
I'm quite confident in predicting that generic models are much more likely to be overfitted than to have too few degrees of freedom.
It's easy to regularize estimation in a model class that's too rich for your data. You can't "unregularize" a model class that's restrictive enough not to contain an adequate approximation to the truth of what you're modeling.
When I know I'm to be visited by one of my parents and I see someone who looks like my mother, should my first thought be "that person looks so unlike my father that maybe it is him and I'm having a stroke"? Should I damage my eyes to the point where this phenomenon doesn't occur to spare myself the confusion?
What I was saying was sort of vague, so I'm going to formalize here.
Data is coming from some random process X(θ,ω), where θ parameterizes the process and ω captures all the randomness. Let's suppose that for any particular θ, living in the set Θ of parameters where the model is well-defined, it's easy to sample from X(θ,ω). We don't put any particular structure (in particular, cardinality assumptions) on Θ. Since we're being frequentists here, nature's parameter θ' is fixed and unknown. We only get to work with the realization of the random process that ac...
Good point. When I introduced that toy example with Cauchy factors, it was the easiest way to get factors that, informally, don't fill in their observed support. Letting the distribution of the factors drift would be a more realistic way to achieve this.
the whole underlying distribution switched and all your old estimates just went out of the window...
I like to hope (and should probably endeavor to ensure) that I don' t find myself in situations like that. A system that generatively (what the joint distribution of factor X and outcome Y looks like) ev...
If you're working with composite hypotheses, replace "your statistic" with "the supremum of your statistic over the relevant set of hypotheses".
This looks cool. My biggest caution would be that this effect may be tied to the specific class of data generating processes you're looking at.
Your framing seems to be that you look at the world as being filled with entities whose features under any conceivable measurements are distributed as independent multivariate normals. The predictive factor is a feature and so is the outcome. Then using extreme order statistics of the predictive factor to make inferences about the extreme order statistics of the outcome is informative but unreliable, as you illustra...
The grandchild comment suggests that he does, at least to the the level of a typical user (though not a researcher or developer) of these methods.
You really should have mentioned here one of your Facebook responses that maybe the data generating processes seen in social science problems don't look like (the output of generative versions of) ML algorithms. What's the point of using a ML method that scales well computationally if looking at more data doesn't bring you to the truth (consistency guarantees can go away if the truth is outside the support of your model class) or has terrible bang for the buck (even if you keep consistency, you may take an efficiency hit)?
Also, think about how well these m...
The faster cell simulation technologies advance, the weaker is the hardware they'll run on.
If hardware growth strictly followed Moore's Law and CPUs (or GPUs, etc.) were completely general-purpose, this would be true. But, if cell simulation became a dominant application for computing hardware, one could imagine instruction set extensions or even entire architecture changes designed around it. Obviously, it would also take some time for software to take advantage of hardware change.
I was just contesting your statement as a universal one. For this poll, I agree you can't really pursue the covariate strategy. However, I think you're overstating challenge of getting more data and figuring out what to do with it.
For example, measuring BPD status is difficult. You can do it by conducting a psychological examination of your subjects (costly but accurate), you can do it by asking subjects to self-report on a four-level Likert-ish scale (cheap but inaccurate), or you could do countless other things along this tradeoff surface. On the other h...
Sure you can, in principle. When you have measured covariates, you can compare their sampled distribution to that of the population of interest. Find enough of a difference (modulo multiple comparisons, significance, researcher degrees of freedom, etc.) and you've detected bias. Ruling out systematic bias using your observations alone is much more difficult.
Even in this case, where we don't have covariates, there are some patterns in the ordinal data (the concept of ancillary statistics might be helpful in coming up with some of these) that would be extremely unlikely under unbiased sampling.
There's actually some really cool math developed about situations like this one. Large deviation theory describes how occurrences like the 1,000,004 red / 1,000,000 blues one become unlikely at an exponential rate and how, conditioning on them occurring, information about the manner in which they occurred can be deduced. It's a sort of trivial conclusion in this case, but if we accept a principle of maximum entropy, we can be dead certain that any of the 2,000,004 red or blue draws looks marginally like a Bernoulli with 1,000,004:1,000,000 odds. That's ju...
Seconding all of gjm's criticisms, and adding another point.
The sostenuto (middle) pedal was invented in 1844. The sustain (right) pedal has been around roughly as long as the piano itself, since piano technique is pretty much unthinkable without it.