tl;dr: I'm making a thing that uses probabilistic graphical models to assist in drawing inferences from personal data. You should check it out, and share with me your wisdom/user experience.
I had this not-completely-original idea that there should be some kind of tool for easily performing statistical inference on Quantified Self-style data.
There are a lot of QS apps out there, but for the most part they seem to be designed for 1. a single domain and/or 2. recording things primarily to combat akrasia or (more often) sating curiosity/as a lifestyle accessory, rather than actively helping you discover correlations or determine causality between things-you-do and things-you-care-about. Quantified Mind stands out as a counterexample, but I can't come up with many others in that vein.
There are also commercial products and programming languages that allow one to use machine learning to perform inference on data, but they mostly seem to be proprietary and expensive software aimed at businesses, or free but intended to be used by scientists, engineers, etc.; nothing I've yet to find is really suitable for an individual without a background in statistics/machine learning who just wants to learn what they can by smashing together their Moodscope and their FitBit.
In our era of FOSS, APIs, QS, and ML, this seems like a seriously lacking state of affairs. Hence, Familiar.
Currently, it consists of a command line interface for storing variable definitions and data in a local database without too much fuss, building a naive Bayes classifier on those variables, and finding maximum likelihood estimates given the state of one variable for the states of all the other variables. This is unsophisticated and not extremely user-friendly, but those things will change in the near future. In the case where I keep working on this for a very long time, I want to automate away as much recording as possible (including things like mood and productivity), record everything with the highest reasonable time resolution, plug into every other app out there that might provide useful data, use more complex machine learning algorithms to identify causality and generate suggestions for personal experimentation, and generally have a piece of software that knows you so well it can help you think more like an ideal Bayesian reasoner and thereby assist you in living your life (thus the name). Manfred Macx's glasses from Accelerando have something like this inside them, and I want it too.
Anyway, back to the present. You can help me by answering whichever of these questions applies to you the most:
- Would you use something like this at all, or do you think the potential for extracting useful information out of messy personal data is too low?
- If you might use something like this, but don't want to use Familiar in its current state, what do you think is the most important factor? e.g. "no GUI", "not a web app", "too manual", "doesn't connect to other stuff yet", etc.
- If you're brave enough to start using this now or even look at the source code, what mistakes am I making? There are countless ways this could be easier to use, more helpful, faster, more readable, and otherwise better, and you can tell me what those ways are.
It's possible to track 1000 different variables with your model. If you do so, you will however get a lot of false positives.
I think about QS data like it gives you more than your five senses. In the end you still partly rely on your own ability of pattern matching. Graphs of data just gives you additional input to understand what's going on that you can't see or hear.
I plan on addressing false positives with a combination of sanity-checking/care-checking ("no, drinking tea probably doesn't force me to sleep for exactly 6.5 hours the following night" or "so what if reading non-fiction makes me ravenous for spaghetti?"), and suggesting highest-information-content experimentation when neither of those applies (hopefully one would collect more data to test a hypothesis rather than immediately accept the program's output in most cases). In this specific case, the raw conversation and bodily state data wo... (read more)