ciphergoth just asked what the actual value of Quantified Self/self-experimentation is. This finally tempted me into running value of information calculations on my own experiments. It took me all afternoon because it turned out I didn’t actually understand how to do it and I had a hard time figuring out the right values for specific experiments. (I may not have not gotten it right, still. Feel free to check my work!)  Then it turned out to be too long for a comment, and as usual the master versions will be on my website at some point. But without further ado!


The value of an experiment is the information it produces. What is the value of information? Well, we can take the economic tack and say value of information is the value of the decisions it changes. (Would you pay for a weather forecast about somewhere you are not going to? No. Or a weather forecast about your trip where you have to make that trip, come hell or high water? Only to the extent you can make preparations like bringing an umbrella.)

Wikipedia says that for a risk-neutral person, value of perfect information is “value of decision situation with perfect information” - “value of current decision situation”. (Imperfect information is just weakened perfect information: if your information was not 100% reliable but 99% reliable, well, that’s worth 99% as much.)

1 Melatonin

http://www.gwern.net/Zeo#melatonin & http://www.gwern.net/Melatonin

The decision is the binary take or not take. Melatonin costs ~$10 a year (if you buy in bulk during sales, as I did). Suppose I had perfect information it worked; I would not change anything, so the value is $0. Suppose I had perfect information it did not work; then I would stop using it, saving me $10 a year in perpetuity, which has a net present value (at 5% discounting) of $205. So the value of perfect information is $205, because it would save me from blowing $10 every year for the rest of my life. My melatonin experiment is not perfect since I didn’t randomize or double-blind it, but I had a lot of data and it was well powered, with something like a >90% chance of detecting the decent effect size I expected, so the imperfection is just a loss of 10%, down to $184. From my previous research and personal use over years, I am highly confident it works - say, 80%. If it works, the information is useless to me, and if it doesn’t, I save $184; what’s the expected value of obtaining the information, giving these two outcomes? (80% * $0) + (20% * $184) = $36.8. At minimum wage opportunity cost of $7 an hour, $36.8 is worth 5.25 hours of my time. I spent much time on screenshots, summarizing, and analysis, and I’d guess I spent closer to 10–15 hours all told.

(The net present value formula is the annual savings divided by the natural log of the discount rate, out to eternity. Exponential discounting means that a bond that expires in 50 years is worth a surprisingly similar amount to one that continues paying out forever. For example, a 50 year bond paying $10 a year at a discount rate of 5% is worth sum $ map (\t -> 10 / (1 + 0.05)^t) [1..50] ~> 182.5 but if that same bond never expires, it’s worth 10 / log 1.05 = 204.9 or just $22.4 more! My own expected longevity is ~50 more years, but I prefer to use the simple natural log formula rather than the more accurate summation. All the numbers here are questionable anyway.)

This worked out example demonstrates that when a substance is cheap and you are highly confident it works, a long costly experiment may not be worth it. (Of course, I would have done it anyway due to factors not included in the calculation: to try out my Zeo, learn a bit about sleep experimentation, do something cool, and have something neat to show everyone.)

2 Vitamin D

http://www.gwern.net/Zeo#vitamin-d

I ran 2 experiments on vitamin D: whether it hurt sleep when taken in the evening, and whether it helped sleep when taken in the morning.

2.1 Evening

http://www.gwern.net/Zeo#vitamin-d-at-night-hurts

The first I had no opinion on. I actually did sometimes take vitamin D in the evening when I hadn’t gotten around to it earlier (I take it for its anti-cancer and SAD effects). There was no research background, and the anecdotal evidence was of very poor quality. Still, it was plausible since vitamin D is involved in circadian rhythms, so I gave it 50% and decided to run an experiment. What effect would perfect information that it did negatively affect my sleep have? Well, I’d definitely switch to taking it in the morning and would never take it in the evening again, which would change maybe 20% of my future doses, and what was the negative effect? It couldn’t be that bad or I would have noticed it already (like I noticed sulbutiamine made it hard to get to sleep). I’m not willing to change my routines very much to improve my sleep, so I would be lying if I estimated that the value of eliminating any vitamin D-related disturbance was more than, say, 10 cents per night; so the total value of affected nights would be $0.10 * 0.20 * 365.25 = $7.3. On the plus side, my experiment design was high quality and ran for a fair number of days, so it would surely detect any sleep disturbance from the randomized vitamin D, so say 90% quality of information. This gives ((7.3 - 0) / log 1.05) * 0.90 * 0.50 = 67.3, justifying <9.6 hours. Making the pills took perhaps an hour, recording used up some time, and the analysis took several hours to label & process all the data, play with it in R, and write it all up in a clean form for readers. Still, I don’t think it took almost 10 hours of work, so I think this experiment ran at a profit.

2.2 Morning

http://www.gwern.net/Zeo#vitamin-d-at-morn-helps

With the vitamin D theory partially vindicated by the previous experiment, I became fairly sure that vitamin D in the morning would benefit my sleep somehow: 70%. Benefit how? I had no idea, it might be large or small. I didn’t expect it to be a second melatonin, improving my sleep and trimming it by 50 minutes, but I hoped maybe it would help me get to sleep faster or wake up less. The actual experiment turned out to show, with very high confidence, absolutely no change except in my mood upon awakening in the morning.

What is the “value of information” for this experiment? Essentially - nothing! Zero!

  1. If the experiment had shown any benefit, I obviously would have continued taking it in the morning
  2. if the experiment had shown no effect, I would have continued taking it in the morning to avoid incurring the evening penalty discovered in the previous experiment
  3. if the experiment had shown the unthinkable, a negative effect, it would have to be substantial to convince me to stop taking vitamin D altogether and forfeit its other health benefits, and it’s not worth bothering to analyze an outcome I would have given <=5% chance to.

Of course, I did it anyway because it was cool and interesting! (Estimated time cost: perhaps half the evening experiment, since I manually recorded less data and had the analysis worked out from before.)

3 Adderall

http://www.gwern.net/Nootropics#adderall-blind-testing

The amphetamine mix branded “Adderall” is terribly expensive to obtain even compared to modafinil, due to its tight regulation (a lower schedule than modafinil), popularity in college as a study drug, and reportedly moves by its manufacture to exploit its privileged position as a licensed amphetamine maker to extract more consumer surplus. I paid roughly $4 a pill but could have paid up to $10. Good stimulant hygiene involves recovery periods to avoid one’s body adapting to eliminate the stimulating effects, so even if Adderall was the answer to all my woes, I would not be using it more than 2 or 3 times a week. Assuming 50 uses a year (for specific projects, let’s say, and not ordinary aimless usage), that’s a cool $200 a year. My general belief was that Adderall would be too much of a stimulant for me, as I am amphetamine-naive and Adderall has a bad reputation for letting one waste time on unimportant things. We could say my prediction was 50% that Adderall would be useful and worth investigating further. The experiment was pretty simple: blind randomized pills, 10 placebo & 10 active. I took notes on how productive I was and the next day guessed whether it was placebo or Adderall before breaking the seal and finding out. I didn’t do any formal statistics for it, much less a power calculation, so let’s try to be conservative by penalizing the information quality heavily and assume it had 25%. So ((200 - 0) / log 1.05) * 0.50 * 0.25 = 512! The experiment probably used up no more than an hour or two total.

This example demonstrates that anything you are doing expensively is worth testing extensively.

4 Modafinil day

http://www.gwern.net/Nootropics#modalert-blind-day-trial

I tried 8 randomized days like with Adderall to see whether I was one of the people whom modafinil energizes during the day. (The other way to use it is to skip sleep, which is my preferred use.) I rarely use it during the day since my initial uses did not impress me subjectively. The experiment was not my best - while it was double-blind randomized, the measurements were subjective, and not a good measure of mental functioning like dual n-back (DNB) scores which I could statistically compare from day to day or against my many previous days of dual n-back scores. Between my high expectation of finding the null result, the poor experiment quality, and the minimal effect it had (eliminating an already rare use), it’s obvious without guesstimating any numbers that the value of this information was very small.

I mostly did it so I could tell people that “no, day usage isn’t particularly great for me; why don’t you run an experiment on yourself and see whether it was just a placebo effect (or whether you genuinely are sleep-deprived and it is indeed compensating)?”

5 Lithium

http://www.gwern.net/Nootropics#lithium-experiment

Low-dose lithium orotate is extremely cheap, ~$10 a year. There is some research literature on it improving mood and impulse control in regular people, but some of it is epidemiological (which implies considerable unreliability); my current belief is that there is probably some effect size, but at just 10mg, it may be too tiny to matter. I have ~40% belief that there will be a large effect size, but I’m doing a long experiment and I should be able to detect a large effect size with >75% chance. So, the formula is NPV of the difference between taking and not taking, times quality of information, times expectation: ((10 - 0) / log 1.05) * 0.75 * 0.40 = 61.4, which justifies a time investment of less than 9 hours. As it happens, it took less than an hour to make the pills & placebos, and taking them is a matter of seconds per week, so the analysis will be the time-consuming part. This one may actually turn a profit.

6 Redshift

http://www.gwern.net/Zeo#redshiftf.lux

Like the modafinil day trial, this was another value-less experiment justified by its intrinsic interest. I expect the results will confirm what I believe: that red-tinting my laptop screen will result in less damage to my sleep by not forcing lower melatonin levels with blue light. The only outcome that might change my decisions is if the use of Redshift actually worsens my sleep, but I regard this as highly unlikely. It is cheap to run as it is piggybacking on other experiments, and all the randomizing & data recording is being handled by 2 simple shell scripts.

7 Meditation

http://www.gwern.net/Zeo#meditation-1

I find meditation useful when I am screwing around and can’t focus on anything, but I don’t meditate as much as I might because I lose half an hour. Hence, I am interested in the suggestion that meditation may not be as expensive as it seems because it reduces sleep need to some degree: if for every two minutes I meditate, I need one less minute of sleep, that halves the time cost - I spend 30 minutes meditating, gain back 15 minutes from sleep, for a net time loss of 15 minutes. So if I meditate regularly but there is no substitution, I lose out on 15 minutes a day. Figure I skip every 2 days, that’s a total lost time of (15 * 2/3 * 365.25) / 60 = 61 hours a year or $427 at minimum wage. I find the theory somewhat plausible (60%), and my year-long experiment has roughly a 60% chance of detecting the effect size (estimated based on the sleep reduction in a Indian sample of meditators). So ((427 - 0) / log 1.05) * 0.60 * 0.60 = $3150. The experiment itself is unusually time-intensive, since it involve ~180 sessions of meditation, which if I am “overpaying” translates to 45 hours ((180 * 15) / 60) of wasted time or $315. But even including the design and analysis, that’s less than the calculated value of information.

This example demonstrates that drugs aren’t the only expensive things for which you should do extensive testing.

New Comment
44 comments, sorted by Click to highlight new comments since:

How much thought have you given to the value other people capture from your personal experiments? Presumably, like me, part of the reason you chose to publish your results online is because you thought other people might benefit from the info (in addition to "selfish" reasons like getting free help with analysis of your results, appearing smart and impressive, or being able to access your writings from anywhere with an Internet connection). Perhaps another variable to consider is an estimate of the (discounted) value other people are likely to receive from your self-experimentation.

[-]gwern100

That surely is a factor, but since it militates in favor of doing any experiment, I'm not sure it's worth including. I don't have any good basis for estimating how much value other people capture. If I tried to, cooking up some Fermi estimate based on, say, site traffic & how roughly a dozen people have started melatonin based on my essay, then with just a slight cooking of numbers, I could justify practically any experiment!

Thus, excluding it is a conservative assumption, especially if anyone else contemplating similar experiments wouldn't necessarily write it up and publicize it like I do.

But why should that be bad if you could justify any experiment? Let's say you had enough readership and enough 'active' readership that quite a few people did the same thing you did.

Then 1. You're doing a lot of good, and that sounds like a really cool blog and pursuit actually. And 2. You will need to raise your $/hour in the VoI in order to pick and choose only the very highest-returning experiments. Both interesting outcomes.

You will need to raise your $/hour in the VoI in order to pick and choose only the very highest-returning experiments. Both interesting outcomes.

I don't think that follows. Suppose I'm considering two experiments, A with an estimated return of $100 and another B of $200; I muse that I should probably do the $200 B experiment first and only then A $100 (if ever). I then reflect that I have 10 readers who will follow the results, and logically I ought to multiply the returns by 10, to get A actually is worth $1,000 and B is actually worth $2,000. I then muse I should probably do... experiment B.

Choices between experiments aren't affected by a constant factor applied equally to all experiments: the highest marginal return remains the highest marginal return. (If experiment B was the best one to do with no audience, then it's still the best one to do with any audience.)

Where the audience would matter is if experiments interact with the audience: maybe no one cares about vitamin D but people are keenly interested in modafinil. Then the highest return could change based on how you use audience numbers.

I think you forgot to mention that you considered (for example) Melatonin reliably non-harmful. Because if you discovered negative side effects (BTW, it is hard to find negative long-term side-effects...), cancelling Melatonin would save much more than the price of pills on its own.

Aren't you leaving out the value of the good effects if something you're experimenting with works?

Maybe, but my understanding is that that value is already being screened off: the something must be positive expected value in the first place, or you wouldn't be using it at all in the first place.

(But I could be wrong, and I've already pinged Vaniver with a request to look things over since that's the sort of basic conceptual confusion I couldn't get myself out of.)

First off, kudos for discussing non-VoI reasons to run these experiments. Real decisions have many factors.

The eyeballed estimate of how much the experimental design reduces the value from perfect information should be replaced by a decision tree. If the experiment can't give you enough data to change your position, then it's not material.

Using the first example, where W is melatonin works and "W" is the experiment saying that melatonin works, it looks like you provided P(W)=.8, P("W"|W)=.95, and P("W"|~W)=.05. I assumed that >90% corresponded to a point estimate of 95%, and that the test was symmetric, which should get thought about more if you're doing this seriously.

In the case where you get "W", you update and P(W|"W")=99% and you continue taking melatonin. But in the case where you get "~W", you update and P(W|"~W")=17%. Given the massive RoI you calculated for melatonin, it sounds like it's worth taking even if there's only a 17% chance that it's actually effective. Rather than continuing blindly on, you'd probably continue the test until you had enough data to be sure / pin down your RoI calculation, but you should be able to map that out now before you start the experiment.

There's a question of prior information here- from what you're written, it sounds like you should be more than 80% sure that melatonin worked for you. You might be interested in a different question- "melatonin still works for me"- which it might be reasonable to have an 80% prior on. If the uncertainty is about the value of taking melatonin, it seems like you could design a better experiment that narrows your uncertainty there (by looking for cognitive costs, or getting a better estimate of time saved, etc.).

A brief terminology correction: the "value of perfect information" would be $41, not $205 (i.e. it includes the 20% estimate that melatonin doesn't work). If you replace that with "value of a perfect negative result" you should be fine.

In 3, you're considering adding a new supplement, not stopping a supplement you already use. The "I don't try Adderall" case has value $0, the "Adderall fails" case is worth -$40 (assuming you only bought 10 pills, and this number should be increased by your analysis time and a weighted cost for potential permanent side effects), and the "Adderall succeeds" case is worth $X-40-4099, where $X is the discounted lifetime value of the increased productivity due to Adderall, minus any discounted long-term side effect costs. If you estimate Adderall will work with p=.5, then you should try out Adderall if you estimate that .5(X-4179)>0 -> X>4179. (Adderall working or not isn't binary, and so you might be more comfortable breaking down the various "how effective Adderall is" cases when eliciting X, by coming up with different levels it could work at, their values, and then using a weighted sum to get X. This can also give you a better target with your experiment- "this needs to show a benefit of at least Y from Adderall for it to be worth the cost, and I've designed it so it has a reasonable chance of showing that.")

One thing to notice is that the default case matters a lot. This asymmetry is because you switch decisions in different possible worlds- when you would take Adderall but stop you're in the world where Adderall doesn't work, and when you wouldn't take Adderall but do you're in the world where Adderall does work (in the perfect information case, at least). One of the ways you can visualize this is that you don't penalize tests for giving you true negative information, and you reward them for giving you true positive information. (This might be worth a post by itself, and is very Litany of Gendlin.)

The rest is similar. I definitely agree with the last line: possibly a way to drive it home is to talk about dividing by ln(1.05), which is essentially multiplying by 20.5. If you can make a one-time investment that pays off annually until you die, that's worth 20.5 times the annual return, and multiplying the value of something by 20 can often move it from not worth thinking about to worth thinking about.

Thanks for the comments.

In the case where you get "W", you update and P(W|"W")=99% and you continue taking melatonin. But in the case where you get "~W", you update and P(W|"~W")=17%. Given the massive RoI you calculated for melatonin, it sounds like it's worth taking even if there's only a 17% chance that it's actually effective.

The Bayes calculation is (0.05 * 0.8) / ((0.05 * 0.8) + (0.95 * 0.2)) = 0.1739..., right? (A second experiment would knock it down to ~0.01, apparently.)

I didn't notice that. I didn't realize I was making an assumption that on a negative experimental result, I'd immediately stop buying whatever. Now I suddenly remember the Wikipedia article talking about iterating... After I get one experimental result, I need to redo the expected-value calculation, and re-run the VoI on further experiments; sigh I guess I'd better reword the melatonin section and add a footnote to the master version explaining this!

A brief terminology correction: the "value of perfect information" would be $41, not $205 (i.e. it includes the 20% estimate that melatonin doesn't work). If you replace that with "value of a perfect negative result" you should be fine.

I'll reword that.

I'll need to think about the Adderall point.

Thanks for the comments.

You're welcome!

The Bayes calculation is ..., right?

That's how I did it.

It's also possible that P(W|"~W") is way lower than .05, and so the test could be better than that calculation makes it look. This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers. Psychology for years has been plagued with studies that are too small to actually provide valuable information, as people in general aren't good intuitive statisticians.

This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers.

As it happens, I learned how to do basic power calculations not that long ago. I didn't do an explicit calculation for the melatonin trial because I didn't randomize selection, instead doing an alternating days design and not always following that, so I thought why bother doing one in retrospect?

But if we were to wave that away, the power seems fine. I have something like 141 days of data, of which around 90-100 is usable, giving me maybe <50 pairs? If I fire up R and load in the two means and the standard deviation (which I had left over from calculating the effect size), and then play with the numbers, then to get an 85% chance I could find an effect at p=0.01:

> pwr.t.test(d=(456.4783 -  407.5312) / 131.4656,power=0.85,sig.level=0.01,type="paired",alternative="greater")

 Paired t test power calculation 

          n = 84.3067
          d = 0.3723187
  sig.level = 0.01
      power = 0.85
 alternative = greater

NOTE: n is number of *pairs* 

If I drop the p=0.01 for 0.05, it looks like I should have had a good shot at detecting the effect:

> pwr.t.test(d=(456.4783 -  407.5312) / 131.4656,power=0.85,sig.level=0.05,type="paired",alternative="greater")

 Paired t test power calculation 

          n = 53.24355

So, it's not great, but it's at least not terribly wrong?

EDIT: Just realized that I equivocated over days vs pairs in my existing power analyses; 1 was wrong, but I apparently avoided the error in another, phew.

(0.05 * 0.8) / ((0.05 * 0.8) + (0.95 * 0.2)) = 0.1739

I'm wondering why 0.05 (alpha) was used in that formula? True positive and false negative rates depends on statistical power (1-beta) and beta, and in case of beta 0.2, rate of "Melatonin is working" in case of negative result is 0.457 (not a 0.1739)

"Melatonin is working" branch (prior P(W) = 0.8) have 2 possibilities 
True positive, P("W"|W) = 1-b = 0.8
False negative, P("~W"|W) = b = 0.2

"Melatonin is not working" branch (prior P(~W) = 0.2) have 2 possibilities 
False positive, P("W"|~W) = a = 0.05
True negative, , P("~W"|~W) = 1-a = 0.95

P(W|"~W") = P("~W"|W) * P(W) / (P("~W"|W) * P(W) + P("~W"|~W) * P(~W)) = 

(0.2 * 0.8) / ((0.2 * 0.8) + (0.95 * 0.2)) = 0.457, not 0.1739 (~3 fold difference)

I'm a bit confused because i'm getting different results, but maybe i'm wrong and someone can correct me?

I'm planning to make blind experiment with melatonin, but want to learn more stats and better understand VOI, before i start

UPDATE: Math corrected. thanks!

Put simply, VOI is the difference between your expected value with and without the information.

So with Melatonin, let's simplify to 2 possibilities:

A) Melatonin has no effect, costs $10 per year, for a value of -1

B) Saves you 15 minutes per day (+5 utilons), costs $10 per year (-1 utilon), for a net value of +4 utilons.

Now, let's say you think that A and B are equally likely. Then the expected value of not taking Melatonin is 0, and the expected value of taking it is 0.5 -1 + 0.5 4 = 1.5. With only this information available, you will always take Melatonin, so your expected value is 1.5.

Then let's say you are considering a definitive experiment (so you will know with p=1 whether A or B is true).

If A is true then you will not take Melatonin, so the value of that outcome is 0 utilons.

If B is true, then you will take Melatonin, for a value of 4 utilons.

And by conservation of expected evidence, it is equally likely that the experiment will decide for A or B.*

Then the expected value of your decision with perfect info is 0.5 0 + 0.5 4 = 2 > 1.5, so the VOI is 0.5 utilons..

*Equally likely only because of how I set up the problem. Conservation of expected evidence would also be satisfied if the experiment would probably favor one side weakly, but improbably favor the other side strongly.

So what should you conclude from this?

  • VOI is higher when the experiment shifts your beliefs a lot, lower when the expected change in belief is small. For example, praying is sufficiently unlikely to work that it's not worth my time to test it. There are other cases where my uncertainty is high, but I can't think of sufficiently good cheap experiments.

  • VOI is higher when you would gain a lot if it told you to change your plans. For example, if you would have taken Adderall without an experiment, and Adderall is expensive, then finding out it doesn't work saves you a lot of money. This is less true for melatonin.

Expected value of taking M without information is 0.5 -1 + 0.5 4 = 1.5, not 1. VoI in this case is 0.5 utilon.

It's certainly true that you wouldn't be exploring things that didn't have positive expected value, but wouldn't the size of the expected value matter?

I don't think it matters unless your investments are limited. If you have presented with X positive expected value investments, and you have enough funds for X+1 investments, what do you do? Invest in all X and reap the maximum possible return. (If you are limited to only 2 investments, then you will be very interested in which 2 investments of the X investments have the greatest sized expected value.)

This is pretty much the case with supplements: I don't lack capital to invest in them (look at how cheap some of the examples are, like lithium or melatonin), I lack good candidates for investment!

Oh yes, besides ciphergoth, I was thinking in this vein in part because of some work for Luke: 'value of information' has obvious implications for anyone who is picking research topics or research funding based on consequentialist reasons.


I think it's pretty obvious why it's relevant, but to give an example, some of the topics in Bostrom's paper have very unclear value of informations, which in our case can be defined as 'what can we do about this problem?' For example, he starts with the Doomsday Problem, which is very interesting and all - but suppose we had perfect information which settled the Doomsday Problem, saying that, yes, humanity will intend end by 5000 AD or whenever with 95% probability. What is the value of this information? Well, as far as I can see, it's close to zero: the Doomsday problem doesn't specify why humanity ends, just that that class of observers runs out. If we were to get more information, it might be that we become posthumans, which is not something we need to learn.

Or to take a more pointed example: asteroids have a high value of information because once we learn about them, we can send up spacecraft to do something about it. Hence, we ought to be willing to pay an oracle billions in exchange for perfect information about anything aimed at Earth.

A collapse of the vacuum (another classic existential risk) is not worth researching in the slightest bit except as it relates to particle accelerators possibly causing it, because there is absolutely nothing we can do about it.

I'm not sure I've seen this point made anywhere in the literature, but it definitely should either be mentioned in any paper on efficient philosophy or redundant/implied by the analysis.

Why do you value your time at minimum wage?

[-][anonymous]110

Let me attempt an umeshism: If he valued his time for more, people would wonder why he thought he was worth it.

Throughout, I try to pick conservative numbers; and it's more conservative to value my time at a lower bound like minimum wage than what I'm actually paid since working is not always a live alternative.

(Although I'll admit that in this case, it would probably be more conservative overall to value my time at a lot per hour since it makes each experiment more expensive and less likely to be worth running. I didn't think a whole lot about the minimum wage assumption. Usually I'm discussing the value of time gained from the use of melatonin or modafinil - where the time value applies to the possible benefit and hence the more one's time costs, the more likely melatonin or modafinil is worth the money...)

I recommend that most people at least try melatonin.

Thanks, some useful self-experiment design ideas there.

Gwern (and whoever), I'm interested in suggestions on how I should go about estimating the effects of GW1516 on myself. It's a research chemical with various benefits (to mice and monkeys) in terms of improving endurance and cholesterol levels. It comes in a liquid form (powder imperfectly suspended in water).

The endurance effects seem particularly hard to test, given that my endurance levels are not stable to begin with (steadily improving). For the cholesterol I guess I could just get blood tests before and after three weeks on and off the stuff. I honestly don't think a placebo is particularly necessary for that test - it would be for the endurance levels.

The value of information is high here - it isn't particularly cheap but the benefits are potentially significant.

From reading Taubes and noting various failures of cholesterol-modifying drugs, I'm pretty skeptical about anything to do with cholesterol (in the absence of long-term RCTs showing positive effects on total mortality), so I'll ignore that.

If your endurance is still improving, maybe you should just wait until it's plateaued, which it must do at some point, and then you have a clear baseline to do GW1516 on. On the other hand, an increasing baseline just means you need either more data or more sophisticated statistics (to detect deviations from the trend up or down), so if you're already measuring your endurance performance and have some GW1516, you might as well start now.

The value of information is high here - it isn't particularly cheap but the benefits are potentially significant.

Why is it significant? I didn't think you were a professional athlete or anything like that. (And if you're hoping for endurance to be useful because it correlates with other good things, you're throwing a wrench into the works by taking such a drug, which likely affects only endurance. Correlation is not causation, as the cholesterol trials apparently often demonstrate.)

From reading Taubes and noting various failures of cholesterol-modifying drugs, I'm pretty skeptical about anything to do with cholesterol (in the absence of long-term RCTs showing positive effects on total mortality), so I'll ignore that.

Is this because cholesterol may not be as relevant to health as some believed? Or because the drugs (statins) just don't seem to improve total mortality much (for either the same or a different reason).

I take at times other substances that can have a significant negative side effect on cholesterol profiles. Should I just not bother trying to influence the cholesterol back towards my normal baseline?

Why is it significant?

If the cholesterol did, in fact, matter it'd be kind of neat to change it a bunch. Same with the increased fat burning.

I didn't think you were a professional athlete or anything like that.

Just an amateur one.

And if you're hoping for endurance to be useful because it correlates with other good things, you're throwing a wrench into the works by taking such a drug, which likely affects only endurance.

I was rather surprised myself when they tested this and found that neurogenesis was also increased from the pharmacological activation of the muscles (including by GW1516) in the same way that actual exercise does.

Is this because cholesterol may not be as relevant to health as some believed? Or because the drugs (statins) just don't seem to improve total mortality much (for either the same or a different reason).

Both.

Should I just not bother trying to influence the cholesterol back towards my normal baseline?

Dunno. How seriously do you take it?

Dunno. How seriously do you take it?

Cholesterol? Slightly less seriously now.

I disagree with your comment on "good stimulant hygiene" - with long term lose dose amphetamine use people often seem to develop a tolerance to the side effects (insomnia, wired feeling) while remaining sensitive to the effect of increased focus. This is what makes it an effective ADD drug. A good experiment on it's effects would alternate use and non-use over long stable time periods (weeks or longer).

I'd be curious to see the effects of your melatonin experiment performed with a blind placebo control. One confusing thing is that you appear to get less sleep on melatonin, but you're saying this is a good thing (reduced need for sleep). Is there some objective way to confirm that it's not simply reducing your sleep (as a stimulant would) as opposed to improving sleep (and therefore reducing the time you need to spend sleeping)? Perhaps you could try to see if there's a measurable rebound effect that would distinguish improved sleep from sleep deprivation. Is your need for sleep greater than normal after taking it for several days?

I'm a big fan of your self experiments, and have previously read most of them on your website... and have used many of your ideas to design my own. However, it seems surprisingly difficult to design a self experiment which rigorously tests only one hypothesis.

Is there some objective way to confirm that it's not simply reducing your sleep (as a stimulant would) as opposed to improving sleep (and therefore reducing the time you need to spend sleeping)?

Sleep quality is typically measured by "number of times awakened" or "amount moved" or so on- my experience with melatonin (and I believe gwern's as well, but I didn't check) is that melatonin decreases the number of times I awaken during the night.

But even if you measure that, it's just a proxy. A paralytic drug will reduce the amount I toss and turn at night, but may not improve how I feel the next day. What you would want to do is measure energy level / creativity, but that's even more difficult.

I suppose measuring "blocks of time where I spent at least 30 minutes doing focused, productive work" would be an easy thing to correlate against having taken melatonin the night before, but I think that it would be very difficult to measure an effect with such low resolution data.

This is even easier if you use pomodoro, because measuring productive time and noticing unproductive time is much easier.

Interesting... I already use that exact method, but never heard that term for it. I first read about it in the "Now Habit" by Neil Fiore, a strategic system for overcoming procrastination. It works incredibly well!

Low-resolution doesn't really matter; what matters is how variable the data is. If you have a binary variable - as crude resolution as possible - which rarely flips, then an intervention which occasionally flips it will still be noticeable.

(How much data would it take? Well, that's hard to estimate without any data at all...)

That makes sense, I guess all of the other unknown variables would serve to dither the low resolution data and it would work for the same reason that CDs sound good with only 16 bits of resolution.

I am going to begin a random placebo controlled trial of melatonin use, quantified against my ZEO and work logs. I just need to find some opaque capsules.

I look forward to your results.

Thanks. The experiment starts today. I made up 14 pills and randomized them with R code, under the assumption that I couldn't possibly subconsciously track each pill and then remember their locations. In hindsight, this is probably more labor intensive and error prone than the method you used to randomize your adderall...

I start with a 14 tray pill box with treatment (0.75mg melatonin + parsely) in the first 7 boxes, and placebo (parsely pills only) in the last 7. I randomly reorder them 3 times, according to each of the lists outputted by my code, and it writes the final pill locations to a text file.

Here's my R code for randomizing "single blind" placebo controlled self-trials:

sampleSize <- 14

intialSetup <- c(rep("treatment", sampleSize/2), rep("placebo", sampleSize/2))

reorder1 <- sample(1:sampleSize, sampleSize, replace = FALSE)

reorder2 <- sample(1:sampleSize, sampleSize, replace = FALSE)

reorder3 <- sample(1:sampleSize, sampleSize, replace = FALSE)

final <- intialSetup[reorder1][reorder2][reorder3]

write.table(final, "final.txt")

paste(reorder1, "->", 1:sampleSize)

paste(reorder2, "->", 1:sampleSize)

paste(reorder3, "->", 1:sampleSize)

Interesting procedure. I'd agree it's probably much more work than some simple physical procedure. (I'd also point out that 14 pairs won't give you much significance - my above power analysis suggested that for awakenings, anyway, you'd want more like 140 pairs. But I should be happy you're actually doing the experiment.)

I plan to do much more than 14, but it was very tedious to set up so I started with that. I need to streamline the procedure.

How has the experiment been going?

Hey gwern, as you predicted I didn't have enough data to learn anything... and I didn't have time to do it longer. I considered repeating it, but now I'm scared off melatonin until I learn more about how it works. Dr. Ray Peat theorizes that it might have some negative health effects by inhibiting oxidative metabolism:

http://www.google.com/cse?cx=005233684413389937395%3Ad5qfhqsz7oo&ie=UTF-8&q=melatonin#gsc.tab=0&gsc.q=melatonin&gsc.page=1

Also, anecdotally I don't really see a huge benefit to melatonin. Even small doses (0.75mg) seem to make me slightly groggy when I wake 7.5 hours later. I may have unusually slow melatonin metabolism, as I have the "slow caffeine metabolizer" P450 CYP1A2 variant, the same enzyme responsible for clearing melatonin.

For my melatonin experiment, number of awakenings was 2.86 vs 2.43, but the p-value was only 0.43. The problem is that the standard deviation is 2.25! (On many nights, I awaken zero or one times, but on one particularly bad night, I woke up 7 times.) I suspect more data would show a more reliable effect and maybe a greater effect size than d=0.19.

To expand; if d<=0.19, to detect this effect at p<0.05 with 75% odds, we need ~75 pairs of nights or ~150 nights of data:

pwr.t.test(d=(456.4783 - (456.4783 - 17.32))/131.4656,power=0.5,sig.level=0.05,type="paired",alternative="greater")

n = 75.44403
d = 0.1911111

Doable, but not trivial.

[+]Rhwawn-160