rain8dome9

Wikitag Contributions

Comments

Sorted by

Really thorough statistical analysis of Anki (flashcard app) data

rpubs.com/rain8/1100036 Its a work in progress with only two steps finished. Not exactly an addon because its in R not Py. So far the project does many little things like find bugs in user’s collection, describe the growth of their collection and text mining. Ultimate goal is to hopefully be able to use anki as continuous cognitive tester and allow users to learn about and optimize their memorization process. Instructions to run on your own data : github 

I am not sure data in anki could really be used as a continuous cognitive health test. Probably requires removing lots of artifacts and other influences and then finding outside influence that definitely relates to cognition.  Lit review.

Could you describe the experiment you ran on all theses models? Like 'if there are three  boxes side by side in a line and each can hold one item and the red triangle is not in the middle and the blue circle is not in the box next to the box with a red triangle in it where is the green circle? '. Chatgpt was not able to solve logic puzzles a year ago and can do it now. 

 That said, the dimensions of quality that the FDA concerns itself with (including physical functioning, self-reported pain, and other easily- and not-easily-measured things) is likely close enough to "improves quality of life" that it's not necessary to have a new direction.  

Athletic performance. Cognitive performance. Work performance. Also ability to accomplish the things needed in every day life to have uh fun..

I thinks its worth mentioning that there are two levels of black box models too. ML can memorize the expected value at each set of variables (at 1 rmp crank wheel rotates at 2 rpm)  or it can 'generalize' and, for this example, tell us that the wheel rotates at 2x speed of crank. To some extent 'ML generalization' provides good 'out of distribution' predictions. 

There is no “Wikipedia for predictive models” that I know of. No big repository to easily share and find predictive scientific models other than the relevant domain’s scientific literature, which is not optimized for these tasks: it is not organized by the variables being predicted, it is not generally available as reusable and modular software components, it is usually not focused on predictive work, some of it is paywalled, etc.

Have you tried www.openml.org?

Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.

Complicated analysis (like going far beyond p-values) is easy for anyone to see and it is evidence of effort. Complex analysis usually coocurs with thoroughness so fewer mistakes. Complicated analysis coocurs with many concurrent tests so less need to produce positive results so less p-hacking. Consequently, there is a fairly simple solution to researchers with mediocre statistical skills gaining too much trust: more plots! Anyway, I find correlation graphs and multiple comparison impressive. Also I am usually more skilled in data analysis than the subject of a paper so can more easily verify that. 

Relevant quote from Dragonfired by J. Zachary Pike. "Brokers make money by knowing key information; they make fortunes by ensuring that other brokers remain unaware or unsure of the same information until after critical trades."

In ggplot (R statistical language) the defaults include a subtle grid and no axes. They also put in the extra random space. 

Here is some code in case someone else using R wants to try out things discussed here:

library(ggplot2)
qplot(wt, mpg, data = mtcars, colour = factor(cyl)) +
theme(axis.line.x = element_line(color="black", size = 0),
axis.line.y = element_line(color="black", size = 1)) +
scale_x_continuous(expand = c(0, 0), limits = c(0,8)) + 
scale_y_continuous(expand = c(0, 0), limits = c(0,36))

Load More