Hi! I have a pretty good amount of experience with playing this game - I have a google spreadsheet collecting all sorts of data wrt food, exercise, habits etc. that I've been collecting for quite a while. I've had some solid successes (I'd say improved quality-of-life by 100x, but starting from an unrepresentatively low baseline), but also can share a few difficulties I've had with this approach; I'll just note some general difficulties and then talk about how this might translate into a useful app sorta thing that one could use.
1. It's hard to know what data to collect in advance of having the theory you want to test (this applies mainly to whole classes of hypothesis - tracking what you eat is helpful for food intolerance hypotheses, but not as much for things affecting your sleep).
- I would recommend starting with a few classes of hypothesis and doing exploratory data analysis once you have data. e.g. I divide my spreadsheet into "food", "sleep", "habits tracking", "activities", and resultant variables "energy", "mood", and "misc. notes" (each estimated for each ~1/3 of the day). These might be different depending on your symptoms, but for non-specific symptoms, are a good place to start. Gut intuitions are, I find, more worth heeding in choosing hypothesis classes than specific hypotheses.
2. Multiple concurrent problems can mean that you might tend to discard hypotheses prematurely. As an example, I'm both Celiac (gluten-free) and soy-intolerant (though in a way that has no relation to soy-intolerance symptoms that I've seen online - take this as a datapoint). Getting rid of gluten helped a little, getting rid of soy helped a little, but each individually was barely above the signal-noise threshold; if I were unluckier, I would have missed both. It's also worth noting that I've found "intervened on X, no effect on Y after 3 weeks of perfect compliance" is, in my experience, only moderately strong evidence to discard the hypothesis (rather than damningly strong like it feels)
- Things like elimination-diets can help with this if you are willing to invest the effort. If you do it one at a time, it might be worth trying more than once, with a large time interval in between.
- If you're looking for small effects, be aware of what your inter-day variability is; at least for me, there's a bias to assume that my status on day N is a direct result of days 1-(N-1). Really there's just some fundamental amount of variability that you have to know to take as your noise threshold. You can measure this by 'living the same day' - sounds a little lame, but the value-of-information I've found to be high.
- If you have cognitive symptoms (in my case, mood fluctuations), note that there might also be some interference in estimating-in-the-moment, which can be counteracted a little bit by taking recordings once at the time, and retroactively again the next ~day.
3. Conditioning on success: as DirectedEvolution says in another comment, the space of hypotheses of things you could do to improve becomes enormous if you allow for a large window of time-delay and conjunctions of hypotheses. I find a useful thinking technique for looking at complicated hypotheses is to ask "would I expect to succeed if the ground-truth were in a hypothesis-class of this level of complexity, and with no incremental improvements by implementing only parts of the plan?". I don't feel like I've ever had luck testing conjunctions, but timescales are trickier - a physicists estimate would be to take whatever is the variability timescale of symptom severity, and look in this range.
As an overall comment on the app idea: definitely a good idea and I'd love to use it! And super-duper would double-love to have a data-set over "mysterious chronic illnesses like mine". I think there could ba a lot of value-added also in a different aspect of what you're talking about accomplishing - specifically, having a well-curated list of "things people have found success by tracking in the past", and "types of hypothesis which people have found success by testing in the past" might be more valuable than the ability to do a lot of statistics on your data (I've found that any hypothesis which is complex enough to need statistics fails my 'conditioning-on-success' test)
Hope there's something useful in here; just something I think about a lot, so sorry if I went on too long ahah. I expect this advice is biased towards reflecting what actually worked for me - food-eliminations rather than other interventions, named-disease-diagnoses, etc. so feel free to correct as you see fit.
There are also a number of good posts about self-experimentation on slimemoldtimemold.com, and a few more good ones at acesounderglass.com.
From one playing the same game, best wishes in making things better using the scientific method :)
My suggestion would be to start by focusing on hypotheses that your illness has a single cause that is short-term, like a matter of minutes, hours, or at most a day. And also that it’s reliable - do X and Y happens, almost every time. These assumptions are easiest to rule out and do not require elaborate tracking. You may also want to focus on expanding your hypothesis space if you haven’t already - food, exercise, sleep, air quality, pets, genetic and hormonal issues, and chronic infections, are all worth looking at.
As you noticed, testing more complex hypotheses over long time scales makes the process of gathering evidence more costly and slow, and the results become less reliable due to the risks of confounding and the number of post-hoc tests you will be running.
I'd pay a lot of money for an app like this. I wonder if recent development's like Google's MedicalLLM could come into play here, where all your symptoms are logged and then expert knowledge / a thorough review of medical literature is done automatically to recommend potential solutions
Relevant: The algorithm for precision medicine, where a very dedicated father of a rare chronic disease (NGLY1 deficiency) in order to save his son. He did so by writing a blog post that went viral & found other people with the same symptoms.
This article may serve as a shorter summary than the talk.
I came across GreyZone Health today, thought it might be relevant:
GreyZone Health
Hope for Difficult to Diagnose, Rare, and Complex Medical Conditions
Facing a Misdiagnosis, or Having No Diagnosis at All?
With our exceptional patient advocate service, GreyZone Health helps patients like you with difficult to diagnose, rare, and complex medical conditions. GreyZone Health finds answers and improves your quality of life. Based in Seattle, Washington, our professional patient advocates serve patients around Washington state and around the world, both virtually and in person.Get Help with Health / Patient Advocates
If you are struggling with persistent health symptoms and/or you are having a hard time managing your complex medical situation and need patient advocacy, we are here to help!
You could also study the distribution of correlation strengths found over the range of correlations tested, possible, seeing how it compares to what would be expected by chance.
In a "heatplot" or plots cf https://www.elsblog.org/the_empirical_legal_studi/2023/05/heatplots-for-correlation-coefficients-graphs.html
I wonder how much of the underlying "getting the data back out sucks" could be addressed by treating self-observations as metrics and using some backend designed for monitoring tech stacks? For human usage, it'd probably need an app that minimizes the effort of tracking things. However, a general tracking app with a decent api could probably be kludged onto a monitoring-friendly data storage solution with your choice of low-code or no-code tooling. Environmental variables could skip human input, and be automatically recorded from home sensors.
The post I expected from the title was about starting with the list of all possible diagnoses, and identifying all data that could differentiate between them, to establish the maximum that it would be medically useful to track. If I was building for myself, I'd probably try to start with all possible interventions, and go straight to tracking only the necessary information to determine which interventions are likely to make a difference.
I suspect this whole endeavor is an advanced form of one of the reasons that journaling helps people -- encouraging us to use our pattern-matching skills in particularly helpful ways.
Thank you for publishing despite the standards worries; I'm glad you did because now I'm thinking about interesting questions that I wouldn't be if you hadn't.
(This post is not up to my usual standards but I was encouraged to publish it anyway to get feedback on the idea.)
I have been ill for the past four years with a mysterious chronic illness. One of the things that I keep thinking would be nice to have (but as far as I know, doesn't exist) is some sort of symptom-tracking app that would allow me to test various hypotheses for my various symptoms.
I already do a simple version of this just in my head, for example, I can notice that if I stand for too long at a time it gets kind of aversive and sitting down makes me feel better. But it's hard to do this in my head for anything more complicated. What if eating cheese subtly helps with my breathing problem? It is not so easy to figure this out on my own.
The things that make it hard to detect connections between "things I do" and "how I feel":
Also, if the thing is something you've already tried in the past, it's good to make use of the data you already do have instead of running every single experiment after you generate the hypothesis.
As you live through chronic illness, you keep generating new data every day. You also, occasionally, come across new hypotheses. It would be nice if you can test each new hypothesis against all the data you already have.
The closest thing I know is the app Bearable, but the only "hypothesis" it seems to have is in the form of correlations of things that happen in a single day (e.g. "there is a such and such correlation between sleep duration and mood"), and the interface was so bad I couldn't keep using it.
Currently I just track my diet, medications, and symptoms in a bulleted list in Roam. This makes it possible to check for more hypotheses than by my intuition alone, but it's too difficult to search for all the relevant days and track effects. For instance if the hypothesis is "if I eat oatmeal then seven days later I will have breathing problems", I will have to find every single instance of when I ate oatmeal, and then by hand check how I was doing seven days later. So the symptoms and the interventions being machine-specifiable is important.
I suspect that this sort of thing may be useful for healthy people and in domains other than health, though I think the barrier is that most people who are not seriously ill do not have the motivation to collect so much data and track how they are feeling each day.
Won't this "fit" a lot of spurious hypotheses, especially if the hypotheses are auto-generated? (Or in other words, won't p-hacking/"correlation but not causation" type things will be a problem?) I think it might, especially if the number of times an intervention was tried is too few. But I don't think this will be a big problem for this app because as long as there are only a smallish number of viable hypotheses, I can just run the experiment later to test them. ("The app says I felt way better 3 days after eating watermelons, and this happened on two occasions? Well then, I guess I'll start eating watermelons for a while.") It may help to iteratively make the hypotheses more complicated, until one reaches the point of having just the right number of plausible hypotheses to play around with.
Some hypotheses are expensive to test so I won't have a way to check it with the existing data, e.g. right now I am testing the hypothesis "if I stop eating gluten then after 6 months various symptoms will be reduced". Prior to testing this I had eaten gluten on most days so there is no way to figure this out with my existing data. I don't think this app will be helpful for answering this kind of long-term question, though it will help with figuring out which interventions to run long-term experiments on.
Another problem is if I start to want to test things that I haven't been tracking so far. In my time being ill, I've started and stopped tracking various things based on some intuition about how likely that thing is to actually have some effect on how I feel.
A minimum viable product for me would have: