Review

There are too many possible quantified self experiments to run. Do hobbyist prediction platforms[1] make priorisation easier? I test this by setting up multiple markets, in order to run two experiments (the best one, and a random one), mostly for the effects of nootropics on absorption in meditation.

dynomight 2022 has a cool proposal:

Oh, and by the way are you THE NSF or DARPA or THE NIH or A BILLIONAIRE WHO WANTS TO SPEND LOTS OF MONEY AND BRAG ABOUT HOW YOU ADVANCED THE STATE OF HUMAN KNOWLEDGE MORE THAN ALL THOSE OTHER LAME BILLIONAIRES WHO WOULDN’T KNOW A HIGH ROI IF IT HIT THEM IN THE FACE? Well how about this:

  1. Gather proposals for a hundred RCTs that would each be really expensive but also really awesome. (E.g. you could investigate SALT → MORTALITY or ALCOHOL → MORTALITY or UBI → HUMAN FLOURISHING.)
  2. Fund highly liquid markets to predict the outcome of each of these RCTs, conditional on them being funded.
    • If you have hangups about prison, you might want to chat with the CFTC before doing this.
  3. Randomly pick 5% of the proposed projects, fund them as written, and pay off the investors who correctly predicted what would happen.
  4. Take the other 95% of the proposed projects, give the investors their money back, and use the SWEET PREDICTIVE KNOWLEDGE to pick another 10% of the RCTs to fund for STAGGERING SCIENTIFIC PROGRESS and MAXIMAL STATUS ENHANCEMENT.

dynomight, “Prediction market does not imply causation”, 2022

Well, I'm neither a billionaire nor the NSF or DARPA, but I have run two shitty self-blinded RCTs on myself already, and I'm certainly not afraid of the CFTC. And indeed I don't have a shortage of ideas on things I could run RCTs on, but the time is scarce (I try to collect m=50 samples in each RCT, which (with buffer-days off) is usually more than 2 months of data collection).

So I'll do what @saulmunn pointed out to me is a possibility: I'm going to do futarchy (on) myself by setting up a set of markets of Manifold Markets with respect to the outcomes of some pre-specified self-blinded RCTs, waiting until the prices on them equilibriate, and then running two of those RCTs (the "best" one, by my standards, and a random one) and using the results as resolutions, while resolving the others as ambiguous.

Timeline

If the markets receive enough liquidity, I'll start the first experiment early in 2024, and the second one sometime in 2024 (depending on the exact experiment), hopefully finishing both before 2025.

Markets

Some experiments can be self-blinded, especially ones that involve substances, others can not because they require me to engage in an activity or receive some sensory input, so I distinguish the two, and will slightly prioritise the experiments that can be blinded.

In all experiments, I will be using the statistical method detailed here, code for it here, unless someone points out that I'm doing my statistics wrong.

I will be scoring the markets based on the variables specified in the prediction market title, but I'll of course be collecting a lot of other data during that time that will also be analyzed.

Self-Blinded Experiments

In general, by meditative absorption I mean the concentration/tranquility (in Buddhist terms samatha) during a ≥30 minute meditation session in the morning, ~45 minutes after waking up and taking the substance (less if the substance starts working immediately). I will be doing at least 15 minutes of anapanasati during that meditation session, but might start (or end) with another practice).

Past meditation data can be found here.

  1. L-Theanine + Caffeine vs. SugarMeditative Absorption: 50 samples in the morning after waking up, 25 intervention with 500mg l-theanine & 200mg caffeine and 25 placebo (sugar pills). Expected duration of trial: ~2½ months (one sample every day, but with possible pauses).
  2. Nicotine vs. Normal chewing gumMeditative Absorption: 40 samples, with blocking after waking up, 20 intervention with 2mg nicotine, 20 placebo (similar-looking square chewing gum). Expected duration of trial: ~4½ months (two samples/week, to avoid getting addicted to nicotine).
  3. Modafinil vs. SugarMeditative Absorption: 40 samples, again with blocking directly after waking up, 20 intervention with 100mg modafinil and 20 placebo (sugar pills). Expected duration of trial: Also ~4½ months with two samples per week, as to prevent becoming dependent on modafinil.
  4. Vitamin D vs. SugarMeditative Absorption: 50 samples, taken after waking up, 25 intervention (25μg Vitamin D₃) and 25 placebo (sugar pills). Expected duration of trial: ~2½ months (taken ~every day, with possible pauses).
  5. Vitamin B12 vs. SugarMeditative Absorption: 50 samples, taken after waking up, 25 intervention (500μg Vitamin B12 + 200μg folate) and 25 placebo (sugar pills). Expected duration of trial: 2½ months (short interruptions included).
  6. LSD Microdosing vs. WaterMeditative Absorption: 50 samples in the morning, 25 intervention (10μg LSD), and 25 placebo (distilled water). Expected duration of trial is ~4 months (4 samples per week, with some time left as a buffer).
  7. CBD Oil vs. Similar-Tasting OilMeditative Absorption: 50 samples in the morning, 25 intervention (240mg CBD in oil, orally), and 25 placebo (whatever oil I can find that is closest in taste to the CBD oil). Expected duration of the trial: ~2½ months (taken ~every day, with possible pauses).
  8. L-Phenylalanine vs. SugarMeditative Absorption: 50 samples, taken directly after waking up, 25 intervention (750mg L-Phenylalanine), and 25 placebo (sugar pills). Duration of trial: 2½ months (one sample a day).
  9. Bupropion vs. SugarHappiness: 50 samples taken after waking up, 25 intervention (150mg Bupropion), and 25 placebo (sugar pills). Duration is typical 2½ months again.
  10. THC Oil vs. Similar-Tasting OilMeditative Absorption: 50 samples in the morning, 25 intervention (4mg THC in oil, orally), and 25 placebo (whatever oil I can find that is closest in taste to the THC oil). Expected duration of the trial: ~2½ months (taken ~every day, with possible pauses).

Non-Blinded Experiments

Some experiments can't be blinded, but they can still be randomized. I will focus on experiments that can be blinded, but don't want to exclude the wider space of interventions.

  1. Intermittent Fasting vs. Normal DietHappiness: 50 samples, 25 intervention (eating only between 18:00 and midnight), 25 non-intervention (normal diet, which is usually 2 meals a day, spaced ~10 hours apart), chosen randomly via echo -e "fast\ndon't fast" | shuf | tail -1. Expected duration of the trial: ~2 months.
  2. Pomodoro Method vs. NothingProductivity: 50 samples, 25 intervention (I try to follow the Pomodoro method as best as I can, probably by installing a TAP of some sort), 25 non-intervention (I just try to do work as normally), chosen randomly via echo -e "pomodoro\nno pomodoro" | shuf | tail -1. Expected duration of trial: 2 months.
  3. Bright Light vs. Normal LightHappiness: 50 samples, 25 intervention (turning on my lumenator of ~30k lumen in the morning), 25 non-intervention (turning on my normal desk lamp of ~1k lumen), selected via echo -e "lamp\nno lamp" | shuf | tail -1. Expected duration of trial: 4 months, as I often don't spend all my day at home.
  4. Meditation vs. No MeditationSleep duration: 50 samples, 25 intervention (2 consecutive days of ≥2h/day of meditation), 25 non-intervention (no meditation), selected via echo -e "meditation\nno meditation" | shuf | tail -1. Expected duration of trial: 5 months, as I might not always find a 2-day interval in which I'm sure I can meditative 2h/day.

Further Ideas

I have a couple more ideas on possible experiments that I could run, and will put them up as I acquire more mana.

Blindeable:

  1. Semaglutide vs. SugarProductivity (tracking conscientiousness)

Not blindeable:

  1. Binaural Beats vs. SilenceMeditative Absorption
  2. Brown Noise vs. SilenceMeditative Absorption
  3. Brown Noise vs. MusicProductivity
  4. Silence vs. MusicProductivity
  5. Time Since Last MasturbationProductivity
  6. Starting Work Standing vs. Starting Work SittingProductivity

Pleas

This little exercise may need your participation! I have three pleas to you, dear reader:

  1. Please predict on the markets! If people predict on the markets, I both get more information about the value of the different experiments, and I also get mana back. It would be cool to know whether hobbyist prediction markets can be used for choosing experiments, and the worst result would be a "well, we can't really tell because liquidity on the markets was too small".
  2. Maybe send me mana for me to create more markets or subsidise existing ones. I'd love to subsidise my markets on Manifold a whole bunch, but don't have enough mana for that at the moment. clippy and Tetraspace both already send me mana, which I greatly appreciate. With more mana, I could also put up more markets, and thereby explore a larger space of possible experiments. However, maybe the value of another market isn't so high, so this one is way less urgent.
  3. Give me ideas for more experiments to run. If you have an idea you're enthusiastic about and which you've always wanted to have tested, but you're kind of lazy about actually doing it, I might be able to jump in. Most interesting to me are experiments that are:
    1. Affordable: Expensive substances, high-end devices etc. are too prohibitive (unless you want to buy the thing for me to perform the experiment).
    2. Safe: Sorry, I'm not going to take methamphetamine, even though it might make me much more productive.
    3. Measurable: The variable the intervention is supposed to affect should be measurable in at least one of the ways I currently collect data, or at least easily measurable. In particular cognitive performance is hard to get a grip on: IQ test can't be repeated very often, but maybe there's a game that measures cognitive performance reliably?
    4. Fast: I can't do 50 samples of an intervention where one sample takes 2 weeks to take effect. Daily is best, but for really good options I might be willing to tolerate 2 samples a week.

Other than that, I also welcome all critiques at any level of detail of this undertaking.

Further Ideas

If I could create more markets, I might be able to put up markets on different variables I measure during the day. That way, I could select interventions that dominate others across multiple dimensions.

If there were prediction platforms that supported them, combinatorial prediction markets or latent-variable prediction markets could be incredibly cool, but we don't live in that world (yet).

Results

To be done, hopefully by early 2025.

Acknowledgements

Many thanks to clippy (twitter) for 500 Mana and Tetraspace (twitter) for 1000 Mana — your funding of the sciences is greatly appreciated.

See Also

Appendix A: Explanations for the Experiments I Chose

Over time, I'll put some explanations on why these specific experiments interest me. Not yet fully, though.

L-Theanine + Caffeine vs. SugarMeditative Absorption

My l-theanine experiment gave disappointing results, but people have (rightfully) pointed out that l-theanine is best taken together with caffeine: one gets energy and relaxation at the same time.

This points at a broader possibility: Why not set up markets for all possible combinations of nootropics? But alas, this runs into problems with combinatorial explosion.

Nicotine vs. Normal chewing gumMeditative Absorption

Modafinil vs. SugarMeditative Absorption

Vitamin D vs. SugarMeditative Absorption

Vitamin B12 vs. SugarMeditative Absorption

LSD Microdosing vs. WaterMeditative Absorption

Inspired by Gwern 2019).

CBD Oil vs. Similar-Tasting OilMeditative Absorption

L-Phenylalanine vs. SugarMeditative Absorption

Bupropion vs. SugarHappiness

THC Oil vs. Similar-Tasting OilMeditative Absorption

Intermittent Fasting vs. Normal DietHappiness

Pomodoro Method vs. NothingProductivity

The Pomodoro technique also uses the concept of rhythm, breaking up the day into twenty-five-minute segments of work and five minutes of a break. Interestingly, though, I found no academic study that tested the technique.

—Gloria Mark, “Attention Span” p. 66, 2023

It'd be cool if I were the first person to actually test this widespread technique.

Bright Light vs. Normal LightHappiness

See all the things people have written about lumenators.

Meditation vs. No MeditationSleep duration

  1. ^

    I find it odd to call any platform on which people functionally give probabilities, but without staking real money, "prediction markets". Neither Metaculus not Manifold Markets are prediction markets, but PredictIt and Kalshi are.

New Comment
6 comments, sorted by Click to highlight new comments since:

The more important an effect is usually the stronger it is so starting many of the experiments but for a short time might yield results much faster. May be possible to overlap the non blinded experiments and run many at the same time with varying periodicity so the same interventions do not always happen on top of each other. 

Your statistical method is similar to two sample t test right? Well that does not account for several possible issues of time series and dependence between data points of one variable. Lag and training effects for example. So be sure to control all other possible independent variables and  plot the data timeline and when you do do not connect data points with lines!

The more important an effect is usually the stronger it is so starting many of the experiments but for a short time might yield results much faster. May be possible to overlap the non blinded experiments and run many at the same time with varying periodicity so the same interventions do not always happen on top of each other.

Things like this have crossed my mind, but that seems fancier than I can handle at the moment (I may consider this once I've done one or two more experiments). Might be able to use Multi-Armed Bandit-like sampling for this, even? Hm…

Your statistical method is similar to two sample t test right? Well that does not account for several possible issues of time series and dependence between data points of one variable. Lag and training effects for example. So be sure to control all other possible independent variables and plot the data timeline and when you do do not connect data points with lines!

Could you elaborate on this a bit? I am randomizing the order of intervention/control activities.

Maybe you mean something like: If I've done an intervention X, then it'll be more likely to be followed by a non-X day, but the effect of X lags, so non-X days will be more likely measured as being high in X-effect. But that'd mean that X days are more likely followed by non-X, which with random order is not the case.

Might be able to use Multi-Armed Bandit-like sampling for this, even? Hm…

Effects may take time and may require time to build up to detectable levels. This is why Winters increased the length of each intervention till they lasted some weeks. If the placebo causes a different self report rating then its a bad placebo and should be Blinded out but if it causes a psychological improvement then why not use it?

 

so non-X days will be more likely measured as being high in X-effect. But that'd mean that X days are more likely followed by non-X, which with random order is not the case.

Yes but it will still make the effect size much less. 

Could you elaborate on this a bit

Lag and build up is mentioned above. Training effect is when you get better at something just by doing it, so later interventions look better.  At the same time there may be drift of self report. In other words effect of slowly growing change on memory making user think there is no change. For all these reasons plot the time series with time on X results on Y and make each point the color of intervention or placebo. Do not connect the dots with lines but do make a smooth loess-like line. You will be able to see some of the issues if they occur.  Some more on all the issues.

Okay, that is quite informative! Thank you.

I think I'll take a stab at your plotting suggestions with my previous self-experiments, and make some qualitative judgements section with more plots (which people asked for in the past anyway).

In all experiments, I will be using the statistical method detailed here, code for it here, unless someone points out that I'm doing my statistics wrong.

Links lead no nowhere?

Oops, error while copying. Will fix.