All of rain8dome9's Comments + Replies

Could you describe the experiment you ran on all theses models? Like 'if there are three  boxes side by side in a line and each can hold one item and the red triangle is not in the middle and the blue circle is not in the box next to the box with a red triangle in it where is the green circle? '. Chatgpt was not able to solve logic puzzles a year ago and can do it now. 

3Thane Ruthenis
I don't really "run experiments" on models, in a systemic personal capacity. Other people are much better at that, and I believe I'd linked a few examples in the post. I do replicate the occasional experiment, and run some myself if there's something I'd like to check... But broadly, at this point, I don't expect any compact, self-contained puzzle to be a good measure of "are we getting AGIer yet?". My direct engagement with models mostly consists of feeding them research papers to process them faster, asking clarifying questions about math/physics, using Deep Research for varyingly targeted literature surveys, and chatting with them about whatever theoretical/philosophical problems I happen to be working on at a given moment. Those function pretty well as a measure of insight/innovativeness: of whether the AI is assembling a precise model of what's happening and what we're doing, and then runs internal queries on that model to move the interaction in the direction of greater understanding, vs. producing very sophisticated remixes of existing templates in a fundamentally sleepwalk-y manner.  It's been that second one every time so far.

 That said, the dimensions of quality that the FDA concerns itself with (including physical functioning, self-reported pain, and other easily- and not-easily-measured things) is likely close enough to "improves quality of life" that it's not necessary to have a new direction.  

Athletic performance. Cognitive performance. Work performance. Also ability to accomplish the things needed in every day life to have uh fun..

I thinks its worth mentioning that there are two levels of black box models too. ML can memorize the expected value at each set of variables (at 1 rmp crank wheel rotates at 2 rpm)  or it can 'generalize' and, for this example, tell us that the wheel rotates at 2x speed of crank. To some extent 'ML generalization' provides good 'out of distribution' predictions. 

There is no “Wikipedia for predictive models” that I know of. No big repository to easily share and find predictive scientific models other than the relevant domain’s scientific literature, which is not optimized for these tasks: it is not organized by the variables being predicted, it is not generally available as reusable and modular software components, it is usually not focused on predictive work, some of it is paywalled, etc.

Have you tried www.openml.org?

1Michael Latowicki
Thank you for pointing me that way. No I have not! So I took a quick look. This looks a lot like HuggingFace. It's good to be reminded that these things exist and they do have some things in common with what I propose. As they stand, though, it's not it. Notice I'm talking about scientific models here. The mindset with which I approach this is one of theoretically-motivated, sparsely connected models, the kind you learn about when you take a university course in say, psychology or economics, not the kind you train with neural networks.

Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.

Complicated analysis (like going far beyond p-values) is easy for anyone to see and it is evidence of effort. Comple... (read more)

Is this a paper? Has it been published anywhere?

2niplav
Not published anywhere except here and on my site.

Relevant quote from Dragonfired by J. Zachary Pike. "Brokers make money by knowing key information; they make fortunes by ensuring that other brokers remain unaware or unsure of the same information until after critical trades."

In ggplot (R statistical language) the defaults include a subtle grid and no axes. They also put in the extra random space. 

Here is some code in case someone else using R wants to try out things discussed here:

library(ggplot2)
qplot(wt, mpg, data = mtcars, colour = factor(cyl)) +
theme(axis.line.x = element_line(color="black", size = 0),
axis.line.y = element_line(color="black", size = 1)) +
scale_x_continuous(expand = c(0, 0), limits = c(0,8)) + 
scale_y_continuous(expand = c(0, 0), limits = c(0,36))

Might be able to use Multi-Armed Bandit-like sampling for this, even? Hm…

Effects may take time and may require time to build up to detectable levels. This is why Winters increased the length of each intervention till they lasted some weeks. If the placebo causes a different self report rating then its a bad placebo and should be Blinded out but if it causes a psychological improvement then why not use it?

 

so non-X days will be more likely measured as being high in X-effect. But that'd mean that X days are more likely followed by non-X, which with rand

... (read more)
3niplav
Okay, that is quite informative! Thank you. I think I'll take a stab at your plotting suggestions with my previous self-experiments, and make some qualitative judgements section with more plots (which people asked for in the past anyway).

The more important an effect is usually the stronger it is so starting many of the experiments but for a short time might yield results much faster. May be possible to overlap the non blinded experiments and run many at the same time with varying periodicity so the same interventions do not always happen on top of each other. 

Your statistical method is similar to two sample t test right? Well that does not account for several possible issues of time series and dependence between data points of one variable. Lag and training effects for example. So be ... (read more)

2niplav
Things like this have crossed my mind, but that seems fancier than I can handle at the moment (I may consider this once I've done one or two more experiments). Might be able to use Multi-Armed Bandit-like sampling for this, even? Hm… Could you elaborate on this a bit? I am randomizing the order of intervention/control activities. Maybe you mean something like: If I've done an intervention X, then it'll be more likely to be followed by a non-X day, but the effect of X lags, so non-X days will be more likely measured as being high in X-effect. But that'd mean that X days are more likely followed by non-X, which with random order is not the case.

In all experiments, I will be using the statistical method detailed here, code for it here, unless someone points out that I'm doing my statistics wrong.

Links lead no nowhere?

2niplav
Oops, error while copying. Will fix.

Will you try running the two notebooks on your data? I am starving for feedback and attention. 

Really thorough statistical analysis of Anki (flashcard app) data

rpubs.com/rain8/1100036 Its a work in progress with only two steps finished. Not exactly an addon because its in R not Py. So far the project does many little things like find bugs in user’s collection, describe the growth of their collection and text mining. Ultimate goal is to hopefully be able to use anki as continuous cognitive tester and allow users to learn about and optimize their memorization process. Instructions to run on your own data : github 

I am not sure data in anki could ... (read more)

3niplav
I have been using anki performance as a very weak proxy for cognitive performance in general in some of my experiments. I hope it works for that purpose.

I am willing to be a test subject. Evidence that I am serious is I have 119k reviews on Anki and am analyzing the data hoping it will be a psychometric test.

Thank you that was enlightening. 

2Noah Topper
Thanks for coming. :)

An analytic framework that takes multiple comparisons etc. into account and lets you see if any correlations are statistically significant.

Blinding. 

Two issues, one of which I did not think of, out of like 20.

EDIT: I suspect, including from my own experience, that many problems can be solved without resorting to advanced statistics. Often by using through experimental procedure instead. Like eliminating a food type for a month then not doing an intervention for a month. Repeat. Trying out medications sounds like it should be done safely.  This safety can only be achieved by monitoring vital signs and analyzing them using advanced statistics. 

Is there a way to help users collect and analyze the data without needing to be a statistics expert? 

Collection is really just a matter of finding the right devices and taking the time to use them. Analysis outside of immediate obvious effect can become difficult. If the effect is subtle and drowned in other effects, or hard to measure. If the intervention is not something user can easily or wants to reproduce.  If the effect take long time to build up, or is shifted in time from intervention. If the successful effect only happens under several c... (read more)

2mcint
Your link is broken, and while Wikipedia may be a guide to problems, generically, I'm curious about the apps, and the problems specifically relevant.
2DirectedEvolution
I agree that the highest-leverage place to start is probably the paradigm of encouraging people with obvious long-lasting chronic problems to look for immediate obvious effects by doing maximal spray n pray. Equivalents of "have nausea from cancer/chemo daily -> medical marijuana -> no more problems." Once we get away from that, I think that systems for collection, analysis, and self-blinding become important. There are a lot of details and trivial inconveniences in any research project, and most people just aren't equipped to work them out on their own. There's a lot you can do to smooth the path. For example, I can imagine a nootropics test kit. It would come with: * A standardized questionnaire that you fill out for every nootropic you try * A sample of nootropics to try along with placebos. Placebos would mimic the appearance of various drugs so it's impossible to tell which is which without deliberately unblinding yourself. the supply would be large enough to give you adequate power given the number of drugs you're trying. * An analytic framework that takes multiple comparisons etc. into account and lets you see if any correlations are statistically significant. * Perhaps packaging drugs in different ways so that you can order more of the things that work, but with a different appearance, to do a more focused experiment on the likeliest candidates. There's a lot of detail to work out in designing such a kit, but it's easy for me to see that it could convert an intractable problem into a do-able puzzle for a motivated and reasonably intelligent user.

In my case it turned out to be manufactured food and gluten.  This post is very similar to Quantifed Self movement. 

Also please remember that side effects and drug interactions are a thing. Anything with a real effect can hurt you. I gave a very caveated suggestion of BosPro to someone on Twitter and it caused something akin to niacin flush in them. This is the same brand that does nothing to me but makes me better at digestion and uninterested in sugar.

What if the problem or the negative consequence of some intervention is hard to detect? I know... (read more)

So this will be on sept 21 right? 

1Econometric Structuralist
I keep getting the time/dates in here wrong. Sorry about that! Yes, it's on the 21st.

Excuse me for the necro. I think saying all the synonyms is better than letter based constraining. If the word that fits the constraint is found later than most other synonyms, the act of checking for the constraint takes longer than just listing. According to the 20 rules of formatting knowledge by Wozniak, its better for the mind to follow a set path even if it is longer, and that is the act of making a list.  It is probably good to have sets of synonyms memorized for writing. Adding a constraint makes the question longer, which is something Wozniak advises against.

2agentydragon
problem with set of synonyms is that it can be long. i have, however, started using cards like "remember at least 4 of these 6 things" - e.g. "symptoms of acute HIV infection"

Someone should make a game/simulation of these things. Let the layman learn how to navigate politics and let the sociologists plan better.  This is the only way to get real answers, given the extreme political nature of the issue. Ok, so there is https://en.wikipedia.org/wiki/Social_simulation but the games for laymen (like the sims) listed are mostly bad and certainly do not simulate office politics. 

The internet is filled with BS. There are a million health tracking devices. The most reliable of these are either FDA certified medical devices and therefore the company that makes them will be punished for misrepresentation, or Open Source and therefor extremely transparent. Might similar rules apply to charities? 

Seems to test something different from 15 minute games.

The Elo on sites depends on the player base lots and lots. 

The AI on lichess sometimes makes clearly worse moves than the ones I made. There is also room for much more in depth analysis like drop, fork, defensiveness, etc. I switched from chess to Anki and Amphetype because, while not nearly as fun, they also taught me a skill. I will get back to chess when I find an affordable automatic board.  Cognitive tracking is often discussed on reddit or the QS forums.

 Exist a few papers on the subject of chess as a test of cognition. 

"Using  within-player  comparisons,  we  find  a ... (read more)

I have something similar. Have you worked outside?

2Viliam
No, I haven't. At work this was not an option. At home, I use a desktop computer, not a notebook.
Answer by rain8dome910

I use Bitesnap and MFP to avoid most of your problems with diet tracking. Measuring exact weight of each ingredient in something I cook is still a hassle. For heart rate I recommend uECG. Many tools are being developed to track exercise in great depth such as mbientlabs' wearable accelerometry and computer vision pose estimation.

Even if user gets good time saving equipment the daily time expenditure is still non trivial. The benefits could however be great! The current biggest problem is that no automated analysis software yet exists. For more see the Kialo debate:

https://www.kialo.com/everyone-should-health-track---self-quantify-49787

Answer by rain8dome910

Less wrong deck exists now though it seems incomplete missing things like Inferential Distance.