Thanks for writing this, it's not something I'd looked at before but I read some of the Promethease sample reports because you got me interested.
There does seem to be some weird normalisation going on when calculating magnitude. For instance this gene gives a score of 0 for (C;C) and bad magnitude of 2.7 and 3.1 for (C;G) and (G;G) respectively. So if you have (C;C) and you filter by magnitude 2 you miss out on the fact that you have an advantageous genotype.
This isn't a problem if (C;C) is extremely common but actually it's no more common than (C;G) (except for people of African descent), so the act of filtering prevents you from realising that you missed a 50:50ish chance of getting a disadvantageous genotype.
So probably to work this out properly you can't filter by magnitude and you'd have to open up every genoptye details to check for what you've avoided getting hit by. You could only really work out how well you've done compared to other people where the data includes frequency so you could see just how lucky/unlucky you got for a particular gene.
Not all of the genotypes have this issue - for instance this gene seems to be more sensibly normalised. If they were all done like this then I'd be much happier with the system.
Several years ago I participated in a study where my DNA was sequenced, and while I ended up not getting the sequence data [1] I did get a file of 23andme-style SNP variant calls. I loaded it into Promethease, and excluded mutations with magnitude below 2 ("looks interesting enough to be worth reading"). I saw 139 mutations marked as "bad," 41 as "good," and 26 as "not set."
Initially I interpreted this to mean that I should be more pessimistic about my health than I was before getting the report, since more of the mutations are bad (2x risk of something) than good (0.5x risk of something else). To figure out how your beliefs should change, though, you need to know how many bad vs good mutations people typically have. For example, if someone might normally have 200 bad mutations and 10 good ones then my report is good news, but if instead normal is 100 bad mutations and 70 good ones then my report is bad news.
In general, I would expect most people to have more negative mutations than positive ones, simply because most mutations with an effect are negative. Randomly changing something is much more likely to break things than make them better.
This also applies when determining total risk of something. For example, lets say I have SNPs that individually give me 3x, 1.5x, 2x, and 0.5x risk for heart disease. I could naively multiply them together, ignoring that they don't stack perfectly, [2] and conclude that I had 4.5x the risk of the general population. But most people will probably have some mutations that increase their risk of heart disease. I think the proper way to handle this is for each case where you have the normal value of a variant you count that as slightly improving your risk, and when you consider all of these tiny improvements you get back to the average person having the average risk. Alternatively, and probably more accurately, you could just naively compute each person's risk, and then normalize.
Is this just a Promethease problem? Do other places that give health reports handle this better? Or do places just avoid giving consumer health information because this is both really hard to do well and highly regulated?
(It's also definitely possible I'm misinterpreting Promethease, or not thinking well about how the stats work here.)
[1] This was really frustrating. They confirmed receipt of my sample in September 2012, and in March 2015 they said they had the full 26GB sequence data available for me to transfer. Unfortunately they only ever uploaded the first few 200MB chunks, and then stopped responding to my emails in mid-April. I wrote to them a few more times, and eventually gave up about a year later.
[2] This caveat about stacking is pretty serious, though. Imagine mutations A and B both give you a 3x risk of some condition. If they act completely independently then a "stacked" 9x risk from having both A and B is reasonable. But if instead A and B act exactly the same way, breaking something that has multiple ways to be rendered fully inoperable, then having them both is no worse than having just one. I don't know which end of this is closer to how things usually work.