All of othercriteria's Comments + Replies

Seconding all of gjm's criticisms, and adding another point.

The sostenuto (middle) pedal was invented in 1844. The sustain (right) pedal has been around roughly as long as the piano itself, since piano technique is pretty much unthinkable without it.

The explanation by owencb is what I was trying to address. To be explicit about when the offset is being added, I'm suggesting replacing your log1p(x) ≣ log(1 + x) transformation with log(c + x) for c=10 or c=100.

If the choice of log-dollars is just for presentation, it doesn't matter too much. But in a lesswrong-ish context, log-dollars also have connotations of things like the Kelly criterion, where it is taken completely seriously that there's more of a difference between $0 and $1 than between $1 and $3^^^3.

0gwern
Which will do what, exactly? What does this accomplish? If you think it does something, please explain more clearly, preferably with references explaining why +10 or +100 would make any difference, or even better, make use of the full data which I have provided you and the analysis code, which I also provided you, exactly so criticisms could go beyond vague speculation and produce something firmer. (If I sound annoyed, it's because I spend hours cleaning up my analyses to provide full source code, all the data, and make sure all results can be derived from the source code, to deal with this sort of one-liner objection. If I didn't care, I would just post some coefficients and a graph, and save myself a hell of a lot of time.)

Given that at least 25% of respondents listed $0 in charity, the offset you add to the charity ($1 if I understand log1p correctly) seems like it could have a large effect on your conclusions. You may want to do some sensitivity checks by raising the offset to, say, $10 or $100 or something else where a respondent might round their giving down to $0 and see if anything changes.

7benkuhn
Gwern has a point that it's pretty trivial to run this robustness check yourself if you're worried. I ran it. Changing the $1 to $100 reduces the coefficient of EA from about 1.8 to 1.0 (1.3 sigma), and moving to $1000 reduces it from 1.0 to 0.5 (about two sigma). The coefficient remains highly significant in all cases, and in fact becomes more significant with the higher constant in the log.
2gwern
I don't see why adding +1 to all responses would make any difference to any of the comparisons; it shifts all datapoints equally. (And anyway, log1p(0) ~> 0. The point of using log1p is simply to avoid log(0) ~> -Inf`.)

Curtis Yarvin, who looked to Mars for tips and tricks on writing a "tiny, diamond-perfect kernel" for a programming environment.

The Rasch model does not hate truth, nor does it love truth, but the truth if made out of items which it can use for something else.

This seems like a good occasion to quote the twist reveal in Orson Scott Card's Dogwalker:

We stood there in his empty place, his shabby empty hovel that was ten times better than anywhere we ever lived, and Doggy says to me, real quiet, he says, "What was it? What did I do wrong? I thought I was like Hunt, I thought I never made a single mistake in this job. in this one job."

And that was it, right then I knew. Not a week before, not when it would do any good. Right then I finally knew it all, knew what Hunt had done. Jesse Hunt never made mista

... (read more)
0palladias
That is delightful.

This seems cool but I have a nagging suspicion that this reduces to greater generality and a handful of sentences if you use conditional expectation of the utility function and the Radon-Nikodym theorem?

Noun phrases that are insufficiently abstract.

echo chambers [...] where meaningless duckspeak is endlessly repeated

Imagine how intolerable NRx would be if it were to acquire one of these. Fortunately, their ideas are too extreme for 4chan, even, so I have no idea where such a forum would be hosted.

0bramflakes
[deleted]
4MichaelAnissimov
Of course we have one, but it's secret.
Lumifer100

so I have no idea where such a forum would be hosted

It may strike you as anarchy, but pretty much anyone can host a forum on the internet for insignificant amounts of money or even for free.

How meaningful is the "independent" criterion given the heavy overlaps in works cited and what I imagine must be a fairly recent academic MRCA among all the researchers involved?

stupid problem

embarrassingly simple math since forever

I should have been years ahead of my peers

momentary lack of algebraic insight ("I could solve this in an instant if only I could get rid of that radical")

for which I've had the intuitions since before 11th grade when they began teaching it to us

Sorry to jump from object-level to meta-level here but it seems pretty clear that the problem here is not just about math. Your subjective assessments of how difficult these topics are is inconsistent with how well you report you are doing at them. A... (read more)

0Dahlen
I can see how it would sound to an outside observer now that you point it out, but in my situation at least I have trouble buying into the idea that math isn't going anywhere. The problem really is urgent; there are loads of fields I want to study that build upon math (and then upon each other), and it just isn't feasible that I can further postpone deep, lasting learning of basic math concepts any further, until after I'm in the "right mindset" for it. There just isn't time, and my neuroplasticity won't get any better with age. It'll take me at least a decade to reach the level I desire in all these fields. Not to mention that I've long since been having trouble with motivation, or else I could have been done with this specific math in about 2011-2012. I'm not doing well at these topics (despite evaluating them as easy) because I spend less than a few hours per month on them.

It's been a while since I've thought about how to learn ecology, but maybe check out Ben Bolker's Ecological Models and Data in R? It would also be a decent way to start to learn how to do statistics with R.

That is important destination but maybe too subtle a starting point.

Start with ecological models for inter-species interactions (predation, competition, mutualism, etc.) where there are more examples and the patterns are simpler, starker, and more intuitive. Roughly, death processes may depend on all involved populations but birth processes depend on each species separately. Then move to natural selection and evolution, intra-species interactions, where the birth processes for each genotype may depend on populations of all the different genotypes, and death processes depend on the phenotypes of all the different populations.

0Capla
Do you have a curriculum that works through these? It can either be an already existing textbook or class or just a list of concepts.

The conscientiousness/akrasia interactions are also fascinating, but even harder to measure. There's a serious missing-not-at-random censoring effect going on for people too conscientious to leave off digit ratio but too akrasic to do the measurement. I nearly fell into this bucket.

do what gwern does

Or do the complete opposite.

The impression I get of gwern is that he reads widely, thinks creatively, and experiments frequently, so he is constantly confronted with hypotheses that he has encountered or has generated. His use of statistics is generally confirmatory, in that he's using data to filter out unjustified hypotheses so he can further research or explore or theorize about the remaining ones.

Another thing you can do with data is exploratory data analysis, using statistics to pull out interesting patterns for further considerat... (read more)

2Capla
I want to do that. Tell me how. I think I already read widely (at least compared to my meat-space peers and possibly compared to the typical LW reader), but I can do better. I am frequently complimented for asking creative questions, coming up with unusual ideas and solutions (again, in comparison to non-rationalists), but if there are ways to do this better, I want to hear them. However, I want to make regular experimentation a part of my life and don't really know how. I'm interning with a psych lab, and hope to work with some behavioral economists who run field-experiments. How do I gain proficiency with experimental methods and build the habit of running simple experiments regularly? I suppose that there's a certain kind of phenomenon that to the educated mind is automatically flagged as ripe for experimentation (I'm thinking of Feynman's curiosity about the ants in his room or Harry James Potter-Evans-Verres testing to find out what the optimal way to fight is, prior the the first battle), but I don't have that intuition, yet. Suggestions?
1Lumifer
That's usually called "data mining" and is a popular activity. Unfortunately many people think that's all they need and stop before the confirmatory phase.

No idea. Factor analysis is the standard tool to see that some instrument (fancy work for ability) is not unitary. It's worth learning about anyways, if it's not in your toolbox.

2sixes_and_sevens
It is already in my toolbox, but I'm not sure how it helps figure out if this phenomenon is present in the real world. It's still not obvious to me that, if the phenomenon does exist, it would survive when reduced to a unitary ability. I can think of a couple of mechanisms by which it may be more prevalent in a multivariable scenario.

Some people like to layer trousers

A simple way to do this is flannel-lined jeans. The version of these made by L.L. Bean have worked well for me. They trade off a bit of extra bulkiness for substantially greater warmth and mildly improved wind protection. Random forum searches suggest that the fleece-lined ones are even warmer, but you lose the cool plaid patterning on the rolled up cuffs.

2fezziwig
Anecdote: I have several of these and love them. If you live in the Frozen North, I recommend them highly.

A not quite nit-picking critique of this phenomenon is that it's treating a complex cluster of abilities as a unitary one.

In some of the (non-Olympic!) distance races I've run, it's seemed to me that I just couldn't move my legs any faster than they were going. In others, I've felt great except for a side stitch that made me feel like I'd vomit if I pushed myself harder. And in still others, I couldn't pull in enough air to make my muscles do what I wanted. In the latter case, I'd definitely notice the lower oxygen levels but in the former cases, maybe I w... (read more)

0sixes_and_sevens
Do you have any suggestions for such unitary(ish) abilities?

Seconding a lot of calef's observations.

If the new topic you want to learn is "extended behavior networks", then maybe this is your best bet. But if you really want to learn about something like AI or ML or the design of agents that behave reasonably by the standards of some utility-like theory, then this is probably a bad choice. A quick search in Google Scholar (if you're not using this, or some equivalent, making this a step before going to the hivemind is a good idea) suggests that extended behavior networks are backwater-y. If the idea of a ... (read more)

While maybe not essential, the "anti-" aspect of the correlations induced by anthropic selection bias at least seems important. Obviously, the appropriate changes of variables can make any particular correlation go either positive or negative. But when the events all measure the same sort of thing (e.g., flooding in 2014, flooding in 2015, etc.), the selection bias seems like it would manifest as anti-correlation. Stretching an analogy beyond its breaking point, I can imagine these strange anti-correlations inducing something like anti-ferromagnetism.

To pick a frequentist algorithm is to pick a prior with a set of hypotheses, i.e. to make Bayes' Theorem computable and provide the unknowns on the r.h.s. above (as mentioned earlier you can in theory extract the prior and set of hypotheses from an algorithm by considering which outcome your algorithm would give when it saw a certain set of data, and then inverting Bayes' Theorem to find the unknowns.

Okay, this is the last thing I'll say here until/unless you engage with the Robins and Wasserman post that IlyaShpitser and I have been suggesting you look... (read more)

You're welcome for the link, and it's more than repaid by your causal inference restatement of the Robins-Ritov problem.

Of course arguably this entire setting is one Bayesians don't worry about (but maybe they should? These settings do come up).

Yeah, I think this is the heart of the confusion. When you encounter a problem, you can turn the Bayesian crank and it will always do the Right thing, but it won't always do the right thing. What I find disconcerting (as a Bayesian drifting towards frequentism) is that it's not obvious how to assess the adequacy... (read more)

Have you seen the series of blog posts by Robins and Wasserman that starts here? In problems like the one discussed there (such as the high-dimensional ones that are commonly seen these days), Bayesian procedures, and more broadly any procedures that satisfy the likelihood principle, just don't work. The procedures that do work, according to frequentist criteria, do not arise from the likelihood so it's hard to see how they could be approximations to a Bayesian solution.

You can also see this situation in the (frequentist) classic Theory of Point Estimation... (read more)

4IlyaShpitser
That's an interesting example, thanks for linking it. I read it carefully, and also some of Robins/Ritov CODA paper: http://www.biostat.harvard.edu/robins/coda.pdf and I think I get it. The example is phrased in the language of sampling/missing data, but for those in the audience familiar w/ Pearl, we can rephrase it as a causal inference problem. After all, causal inference is just another type of missing data problem. We have a treatment A (a drug), and an outcome Y (death). Doctors assign A to some patients, but not others, based on their baseline covariates C. Then some patients die. The resulting data is an observational study, and we want to infer from it the effect of drug on survival, which we can obtain from p(Y | do(A=yes)). We know in this case that p(Y | do(A=yes)) = sum{C} p(Y | A=yes,C) p(C) (this is just what "adjusting for confounders" means). If we then had a parametric model for E[Y | A=yes,C], we could just fit that model and average (this is "likelihood based inference.") Larry and Jamie are worried about the (admittedly adversarial) situation where maybe the relationship between Y and A and C is really complicated, and any specific parametric model we might conceivably use will be wrong, while non-parametric methods may have issues due to the curse of dimensionality in moderate samples. But of course the way we specified the problem, we know p(A | C) exactly, because doctors told us the rule by which they assign treatments. Something like the Horvitz/Thompson estimator which uses this (correct) model only, or other estimators which address issues with the H/T estimator by also using the conditional model for Y, may have better behavior in such settings. But importantly, these methods are exploiting a part of the model we technically do not need (p(A | C) does not appear in the above "adjustment for confounders" expression anywhere), because in this particular setting it happens to be specified exactly, while the parts of the models we do t
2TheMajor
As I've mentioned several times above Bayesian statistics are not just a set of estimators to be used on problems, they are the minimal framework of probability that satisfy Cox' law. This means that any algorithm that isn't even approximately Bayesian will spit out something other than (an approximation of) the posterior probability. In other words, in order to even get any sort of answer that can reasonably be used for further computation there has to be a Bayesian explanation, otherwise what your algorithm is doing just doesn't have anything to do with statistics. This does not mean that the only useful algorithms are those crafted by trying to compute the likelihood ratio, nor does it mean that there is always a simple algorithm that would be classified as a 'Bayesian algorithm'. It merely means that to do probability you have to do Bayes, and then maybe some more.

(Theoretical) Bayesian statistics is the study of probability flows under minimal assumptions - any quantity that behaves like we want a probability to behave can be described by Bayesian statistics.

But nobody, least of all Bayesian statistical practitioners, does this. They encounter data, get familiar with it, pick/invent a model, pick/invent a prior, run (possibly approximate) inference of the model against the data, verify if inference is doing something reasonable, and jump back to an earlier step and change something if it doesn't. After however l... (read more)

2TheMajor
Well obviously. Same for physicists, nobody (other than some highly specialised teams working at particle accelerators) use the standard model to compute the predictions of their models. Or for computer science - most computer scientists don't write code at the binary level, or explicitly give commands to individual transistors. Or chemists - just how many of the reaction equations do you think are being checked by solving the quantum mechanics? But just because the underlying theory doesn't give as good a result-vs-time-tradeoff as some simplified model does not mean that the underlying theory can be ignored altogether (in my particular examples above I remark that the respective researchers do study the fundamentals, but then hardly ever need to apply them!)! By studying the underlying (often mathematically elegant) theory first one can later look at the messy real-world examples through the lens of this theory, and see how the tricks that are used in practice are mostly making use of but often partly disagree with the overarching theory. This is why studying theoretical Bayesian statistics is a good investment of time - after this all other parts of statistics become more accessible and intuitive, as the specific methods can be fitted into the overarching theory. Of course if you actually want to apply statistical methods to a real-world problem I think that the frequentist toolbox is one of the best options available (in terms of results vs. effort). But it becomes easier to understand these algorithms (where they make which assumptions, where they use shortcuts/substitutions to approximate for the sake of computation, exactly where, how and why they might fail etc.) if you become familiar with the minimal consistent framework for statistics, which to the best of my knowledge is Bayesian statistics.

Thanks for pointing out the Gelman and Shalizi paper. Just skimmed it so far, but it looks like it really captures the zeitgeist of what reasonably thoughtful statisticians think of the framework they're in the business of developing and using.

Plus, their final footnote, describing their misgivings about elevating Bayesianism beyond a tool in the hypothetico-deductive toolbox, is great:

Ghosh and Ramamoorthi (2003, p. 112) see a similar attitude as discouraging inquiries into consistency: ‘the prior and the posterior given by Bayes theorem [sic] are imper

... (read more)

I would advise looking into frequentist statistics before studying Bayesian statistics. Inference done under Bayesian statistics is curiously silent about anything besides the posterior probability, including whether the model makes sense for the data, whether the knowledge gained about the model is likely to match reality, etc. Frequentist concepts like consistency, coverage probability, ancillarity, model checking, etc., don't just apply to frequentist estimation; they can be used to asses and justify Bayesian procedures.

If anything, Bayesian statistics ... (read more)

1TheMajor
I'm afraid I don't understand. (Theoretical) Bayesian statistics is the study of probability flows under minimal assumptions - any quantity that behaves like we want a probability to behave can be described by Bayesian statistics. Therefore learning this general framework is useful when later looking at applications and most notably approximations. For what reasons do you suggest studying the approximation algorithms before studying the underlying framework? Also you mention 'Bayesian procedures', I would like to clarify that I wasn't referring to any particular Bayesian algorithm but to the complete study of (uncomputable) ideal Bayesian statistics.
2Lumifer
Actually, if you have the necessary math background, it will probably be useful to start by looking at why and how the frequentists and the Bayesians differ. Some good starting points, in addition to Bayes, are Fisher information and Neyman-Pearson hypothesis testing. This paper by Gelman and Shalizi could be interesting as well.

This is really good and impressive. Do you have such a list for statistics?

1Gunnar_Zarncke
My main aha-moment in statistics occurred when I encountered the lebesgue integral. Integrals suddenly generalized a lot. Lebesgue also allows a lot more nifty but intuitive integral transformations. And of course it is needed for dealing cleanly with probability densities. Causal networks despite needing tricky rules follow from the other points on my list (trees and probability measures)

The example I'm thinking about is a non-random graph on the square grid where west/east neighbors are connected and north/south neighbors aren't. Its density is asymptotically right at the critical threshold and could be pushed over by adding additional west/east non-neighbor edges. The connected components are neither finite nor giant.

0Douglas_Knight
If all EW edges exist, you're really in a 1d situation. Models at criticality are interesting, but are they relevant to epidemiology? They are relevant to creating a magnet because we can control the temperature and we succeed or fail while passing through the phase transition, so detail may matter. But for epidemiology, we know which direction we want to push the parameter and we just want to push it as hard as possible.

If you want a solid year-long project, find a statistical model you like and figure out how to do inference in it with variational Bayes. If this has been done, change finite parts of the model into infinite ones until you reach novelty or the model is no longer recognizable/tractable. At that point, either try a new model or instead try to make the VB inference online or parallelizable. Maybe target a NIPS-style paper and a ~30-page technical report in addition to whatever your thesis will look like.

And attend a machine learning class, if offered. There's... (read more)

1Sherincall
I did some machine learning in previous studies, and read up on some online, so I have a basis in that. Taking Advanced Statistics, and AI (maths part) courses, and a few less relevant ones. I plan on doing it in two years, one for the courses, one for the thesis, so a yearlong project is acceptable. However, I'll also have a full time job, and a hobby or two, and a relationship. The suggestions sound great, and I'll dedicate a few days to study them carefully. Thank you very much.

But to all of us perched on the back of Cthulhu, who is forever swimming left, is it the survey that will seem fixed and unchanging from our moving point of view?

6gwern
1. https://pdf.yt/d/XYviTa7FvMEcNm_8 / https://www.dropbox.com/s/1b7dhfs7djx1k5j/2014-buehler.pdf?dl=0 2. https://pdf.yt/d/N7CH7xQxtNnQVy2f / https://www.dropbox.com/s/8mzse6c24dteqy6/2014-velez.pdf?dl=0 / http://sci-hub.org/downloads/b6c8/10.1007@s11749-014-0396-0.pdf

In the mental health category, I'd love to see (adult) ADHD there as well. I'm less directly interested in substance abuse disorder and learning disabilities (in the US sense) / non-autism developmental disabilities, but those would be interesting additions too.

I'd believe that; my knowledge of music history isn't that great and seeing teleology where there isn't any is an easy mistake.

I guess what I'm saying, speaking very vaguely, is that melodies existing within their own tonal contexts are as old as bone flutes, and their theory goes back at least as far as Pythagoras. And most folk music traditions cooked up their own favorite scale system, which you can just stay in and make music as long as you want to. For that matter, notes in these scale systems can be played as chords and a lot of the combinations make... (read more)

In something like the Erdös-Rényi random graph, I agree that there is an asymptotic equivalence between the existence of a giant component and paths from a randomly selected points being able to reach the "edge".

On something like an n x n grid with edges just to left/right neighbors, the "edge" is reachable from any starting point, but all the connected components occupy just a 1/n fraction of the vertices. As n gets large, this fraction goes to 0.

Since, at least as a reductio, the details of graph structure (and not just its edge fract... (read more)

2Douglas_Knight
The statement about percolation is true quite generally, not just for Erdős-Rényi random graphs, but also for the square grid. Above the critical threshold, the giant component is a positive proportion of the graph, and below the critical threshold, all components are finite.

The idea that melodies, or at least an approximation accurate to within a few cents, can be embedded into a harmonic context. Yet in western art music, it took centuries for this to go from technically achievable but unthinkable to experimental to routine.

5bogus
Medieval sacred music was a special case in many ways. We have some records (albeit comparatively scant ones) of secular/folk music from pre-Renaissance times, and it was a lot more tonally structured (a more meaningful term than "harmonic") than that.

I think percolation theory concerns itself with a different question: is there a path from starting point to the "edge" of the graph, as the size of the graph is taken to infinity. It is easy to see that it is possible to hit infinity while infecting an arbitrarily small fraction of the population.

But there are crazy universality and duality results for random graphs, so there's probably some way to map an epidemic model to a percolation model without losing anything important?

3TheMajor
The main question of percolation theory, whether there exists a path from a fixed origin to the "edge" of the graph, is equivalently a statement about the size of the largest connected cluster in a random graph. This can be intuitively seen as the statement: 'If there is no path to the edge, then the origin (and any place that you can reach from the origin, traveling along paths) must be surrounded by a non-crossable boundary'. So without such a path your origin lies in an isolated island. By the randomness of the graph this statement applies to any origin, and the speed with which the probability that a path to the edge exists decreases as the size of the graph increases is a measure (not in the technical sense) of the size of the connected component around your origin. I am under the impression that the statements '(almost) everybody gets infected' and 'the largest connected cluster of diseased people is of the size of the total population' are good substitutes for eachother.

This comment rubbed me the wrong way and I couldn't figure out why at first, which is why I went for a pithy response.

I think what's going on is I was reacting to the pragmatics of your exchange with Coscott. Coscott informally specified a model and then asked what we could conclude about a parameter of interest, which coin was chosen, given a sufficient statistic of all the coin toss data, the number of heads observed.

This is implicitly a statement that model checking isn't important in solving the problem, because everything that could be used for model ... (read more)

4DanielLC
I suppose my point was that assuming normal distribution can give you far more extreme probabilities than could ever realistically be justified. It would probably be better if I just said it like that.

I'm quite confident in predicting that generic models are much more likely to be overfitted than to have too few degrees of freedom.

It's easy to regularize estimation in a model class that's too rich for your data. You can't "unregularize" a model class that's restrictive enough not to contain an adequate approximation to the truth of what you're modeling.

When I know I'm to be visited by one of my parents and I see someone who looks like my mother, should my first thought be "that person looks so unlike my father that maybe it is him and I'm having a stroke"? Should I damage my eyes to the point where this phenomenon doesn't occur to spare myself the confusion?

1RowanE
If you're being asked to estimate the probability that you're being visited by your father then yes you probably should be considering the possibility that you are seeing him but you're having a stroke.

What I was saying was sort of vague, so I'm going to formalize here.

Data is coming from some random process X(θ,ω), where θ parameterizes the process and ω captures all the randomness. Let's suppose that for any particular θ, living in the set Θ of parameters where the model is well-defined, it's easy to sample from X(θ,ω). We don't put any particular structure (in particular, cardinality assumptions) on Θ. Since we're being frequentists here, nature's parameter θ' is fixed and unknown. We only get to work with the realization of the random process that ac... (read more)

0jsteinhardt
Yup I agree with all of that. Nice explanation!

Good point. When I introduced that toy example with Cauchy factors, it was the easiest way to get factors that, informally, don't fill in their observed support. Letting the distribution of the factors drift would be a more realistic way to achieve this.

the whole underlying distribution switched and all your old estimates just went out of the window...

I like to hope (and should probably endeavor to ensure) that I don' t find myself in situations like that. A system that generatively (what the joint distribution of factor X and outcome Y looks like) ev... (read more)

4Lumifer
It comes with certain territories. For example, any time you see the financial press talk about a six-sigma event you can be pretty sure the underlying distribution ain't what it used to be :-/

If you're working with composite hypotheses, replace "your statistic" with "the supremum of your statistic over the relevant set of hypotheses".

1jsteinhardt
If there are infinitely many hypotheses in the set then the algorithm in the grandparent doesn't terminate :).

This looks cool. My biggest caution would be that this effect may be tied to the specific class of data generating processes you're looking at.

Your framing seems to be that you look at the world as being filled with entities whose features under any conceivable measurements are distributed as independent multivariate normals. The predictive factor is a feature and so is the outcome. Then using extreme order statistics of the predictive factor to make inferences about the extreme order statistics of the outcome is informative but unreliable, as you illustra... (read more)

5Thrasymachus
Thanks for doing what I should have done and actually run some data! I ran your code in R. I think what is going on in the Cauchy case is that the variance on fac is way higher than the normal noise being added (I think the SD is set to 1 by default, whilst the Cauchy is ranging over some orders of magnitude). If you plot(fac, out), you get a virtually straight line, which might explain the lack of divergence between top ranked fac and out. I don't have any analytic results to offer, but playing with R suggests in the normal case the probability of the greatest factor score picking out the greatest outcome goes down as N increases - to see this for yourself, replace rcauchy with runf or rnorm, and increase the N to 10000 or 100000. In the normal case, it is still unlikely that max(fax) picks out max(out) with random noise, but this probability seems to be sample size invariant - the rank of the maximum factor remains in the same sort of percentile as you increase the sample size. I can intuit why this is the case: in the bivariate normal case, the distribution should be elliptical, and so the limit case with N -> infinity will be steadily reducing density of observations moving out from the ellipse. So as N increases, you are more likely to 'fill in' the bulges on the ellipse at the right tail that gives you the divergence, if the N is smaller, this is less likely. (I find the uniform result more confusing - the 'N to infinity case' should be a parallelogram, so you should just be picking out the top right corner, so I'd guess the probability of picking out the max factor might be invariant to sample size... not sure.)
3Lumifer
Another issue is that real-life processes are, generally speaking, not stationary (in the statistical sense) -- outside of physics, that is. When you see an extreme event in reality it might be that the underlying process has heavier tails than you thought it does, or it might be that the whole underlying distribution switched and all your old estimates just went out of the window...

The grandchild comment suggests that he does, at least to the the level of a typical user (though not a researcher or developer) of these methods.

You really should have mentioned here one of your Facebook responses that maybe the data generating processes seen in social science problems don't look like (the output of generative versions of) ML algorithms. What's the point of using a ML method that scales well computationally if looking at more data doesn't bring you to the truth (consistency guarantees can go away if the truth is outside the support of your model class) or has terrible bang for the buck (even if you keep consistency, you may take an efficiency hit)?

Also, think about how well these m... (read more)

The faster cell simulation technologies advance, the weaker is the hardware they'll run on.

If hardware growth strictly followed Moore's Law and CPUs (or GPUs, etc.) were completely general-purpose, this would be true. But, if cell simulation became a dominant application for computing hardware, one could imagine instruction set extensions or even entire architecture changes designed around it. Obviously, it would also take some time for software to take advantage of hardware change.

2private_messaging
Well, first it has to become dominant enough (for which it'd need to be common enough, for which it needs to be useful enough - used for what?), then the hardware specialization is not easy either, and on top of that specialized hardware locks the designs in (prevents easy modification and optimization). Especially if we're speaking of specializing beyond how GPUs are specialized for parallel floating point computations.

I was just contesting your statement as a universal one. For this poll, I agree you can't really pursue the covariate strategy. However, I think you're overstating challenge of getting more data and figuring out what to do with it.

For example, measuring BPD status is difficult. You can do it by conducting a psychological examination of your subjects (costly but accurate), you can do it by asking subjects to self-report on a four-level Likert-ish scale (cheap but inaccurate), or you could do countless other things along this tradeoff surface. On the other h... (read more)

0ChristianKl
You argued against a more general statement than the one I made. But I did choose my words in a way that focused on drawing conclusions from the results and not results + comparison data.

Sure you can, in principle. When you have measured covariates, you can compare their sampled distribution to that of the population of interest. Find enough of a difference (modulo multiple comparisons, significance, researcher degrees of freedom, etc.) and you've detected bias. Ruling out systematic bias using your observations alone is much more difficult.

Even in this case, where we don't have covariates, there are some patterns in the ordinal data (the concept of ancillary statistics might be helpful in coming up with some of these) that would be extremely unlikely under unbiased sampling.

0ChristianKl
That means that you need more data. Having a standard against which to train your model means that you need more than just the results of your measurement.

There's actually some really cool math developed about situations like this one. Large deviation theory describes how occurrences like the 1,000,004 red / 1,000,000 blues one become unlikely at an exponential rate and how, conditioning on them occurring, information about the manner in which they occurred can be deduced. It's a sort of trivial conclusion in this case, but if we accept a principle of maximum entropy, we can be dead certain that any of the 2,000,004 red or blue draws looks marginally like a Bernoulli with 1,000,004:1,000,000 odds. That's ju... (read more)

Load More