Comment author: johnlawrenceaspden 18 April 2016 05:25:57PM *  1 point [-]

Thank you so much, intelligent and careful criticism like this is exactly what I started posting on Less Wrong for!

Why is the Pollock trial evidence supporting your hypothesis?

Well, it's only fairly weak evidence, but it does seem that the healthy controls reacted differently to the patient group. What it really proves is that thyroxine isn't just a nice recreational drug that everyone likes. Healthy people dislike it. But it seems to have been less bad for the patients on average. So I imagine there were some people in the patient group who reacted well.

What I'm saying is that Skinner got strong evidence for the idea, and wanted it confirmed by PCRT (and I agree, that's necessary). So they did a PCRT, but not very well because they didn't find patients carefully. And yet they seem to have supported him anyway, but everyone thinks that they refuted him, because they didn't quite understand what he was saying.

What outcome from the trial would you have considered to be evidence against it?

If none of the patients had had any sort of thyroid problem, I'd have expected it to be equally bad for everyone. If that had been the result, then I'd have had to think that 'type 2 hypothyroidism' is rare, or that 'fixed doses of thyroxine don't fix it'. For a long time that's exactly what I did think! I was assuming you might need T3 as well and you might need to adust the ratio carefully. Skinner and Pollock together make me think that it might be fairly common, and mostly fixable with T4 alone.

Also, what part suggests that the healthy controls could distinguish the treatment from placebo? From Table 4, it seems that the reverse is true.

That shows that when they were asked which was the active preparation, they couldn't tell. They appear to have had a 'nocebo' effect, where they interpreted everything they felt as an effect of the drug. That's as expected.

What makes me think that they felt bad on thyroxine is table 2, where all the 'self-reported' psychological scores have got worse from thyroxine. In particular p=0.007 for the decline in Vitality. Since, as you point out, they really didn't know which was which, it's hard to see how they could have faked that.

At first glance, the results from that study look like straightforward evidence that this treatment is actively harmful.

Absolutely this treatment is harmful to healthy people. It should cause 'hypermetabolism', which is unpleasant. And severe hypermetabolism is awful. Very like the manic phase of manic depression. You should be careful not to give drugs to people who don't need them. That's why in the old days, if they weren't sure, they'd give you a bit and watch to see what effect it had. That was pretty much their test, except in the obvious cases.

but choosing a single dose is normal procedure.

Yes, but that does mean that anything that needs careful dose control will get rejected. In this case I think it might have made the treatment less effective, but it shouldn't have ruined it. I'm not making any criticism of the people who did this trial, I think it was a brave try and they did it well. I just don't think it's enough to refute Skinner. In fact I think it was supportive.

From what I’m reading I don’t think there is any recognized clinical diagnosis of hypothyroidism. The TSH test is the gold standard.

There was once. The paper:

STATISTICAL METHODS APPLIED TO THE DIAGNOSIS OF HYPOTHYROIDISM by W. Z. BILLEWICZ, R. S. CHAPMAN, J. CROOKS, M. E. DAY, J. GOSSAGE, SIR EDWARD WAYNE, AND J. A. YOUNG

was the last word in 'clinical diagnosis'. It was very very difficult to do, and GPs tended to refer suspected cases to experts. In doubtful cases they just tried treating it with small amounts of thyroid and checked that people improved rather than being made anxious and hyper.

The TSH test replaced that around 1970. But they never seem to have checked that clinical and biochemical diagnoses detected the same things, and after that there was the slow emergence of all sorts of nasty diseases that look very like hypothyroidism in the clinical sense but have normal TSH.

The TSH test seems to have been accepted (and then ruthlessly enforced) on the basis of theoretical arguments that weren't checked experimentally.

I do think that the TSH test detects gland failure quite well, in fact I think that if your thyroid gland gets destroyed, your TSH value will become huge. My (excellent) GP tells me that he sees people with TSH 30 with no symptoms at all (yet! Their thyroids are obviously on the way out...).

In fact the original 'normal range' for TSH was very wide indeed. And I think that's probably right too. Over the years the 'normal range' has got narrowed to the point where it's now so narrow people with abnormal TSH usually don't have any symptoms, and the noise in the test can put you outside the range. That's kind of weird. See recent AACB study where they thought the upper limit of normal should be 2.5.

There was a recent attempt to define a new clinical score (Zulewski et al), but the authors of the paper who'd constructed it refused to endorse it because the symptoms didn't correlate with TSH. That says to me that the test isn't detecting the disease it's supposed to detect.

You have the burden of proof

Absolutely accept that! And if Skinner was right, it should be dead easy to prove. Just re-run the Scottish trial using Billewicz as the entry criterion. It would be better if you could adjust the dose, but it should work quite well with a fixed dose, if you accept you're going to under-treat some people and over-treat others. Actually I'd rather use titrated doses of desiccated thyroid, since that's what they used to do, or T4/T3 combinations, but if I believe Skinner then they should all work, and it's just a question of which works best.

Could you summarize your support for this claim? Are these the only two peer-reviewed articles?

These are the only ones I can find through google scholar / pubmed. That in itself is really surprising and one of the things I can't explain! Why has such an obvious thing not been ruled out? Real doctors seem to try it all the time, find it works, and then get persecuted for trying it.

All the rest of it is anecdotal, from alternative sources, but there's a mountain of it. Just google. If people have tried this and it didn't work, they're keeping very quiet. All I've heard against is 'it helps, but it doesn't fix it entirely'. And the alternative people say exactly that themselves, and reckon that there's usually something adrenal going on as well.

I'd point primarily to Broda Barnes, John Lowe, Kenneth Blanchard, Gordon Skinner, Sarah Myhill, Barry Durrant-Peatfield, the various thyroid activist groups, Kent Holtorf, and 'Wilson's syndrome', off the top of my head, but there's plenty more where that came from. And a lot of those guys are actual medical doctors. The big exception is John Lowe, who was a chiropractor. But I've read a lot of his stuff and he was a very careful, thoughtful man.

90% of medical research findings are false

Indeed. The whole thing is a disaster. John Ioannides said 'Evidence Based Medicine Has Been Hijacked'. But I think it's worse than that. By saying that you're going to ignore the experience of doctors, and only accept very expensive evidence that can only be provided by wealthy sources, and even then using methods so bad that they're practically guaranteed to produce false answers, you've completely cut yourselves off from the truth.

I'd go further and say 'Evidence Based Medicine Has Been A Catastrophe'. I'm not more than half-convinced this thryoid-craziness is true, but I think the fact that it's never been properly investigated is a complete scandal.

I'm not against "evidence based medicine" because it's based on evidence. I'm against "evidence based medicine" precisely because it's based on ignoring most of the evidence. -- GK Chesterton's Homeopath.

I was helping a consultant friend revise for an interview the other day, and one of the practice questions was 'describe the hierarchy of evidence'. He put 'expert opinion' bottom.

Really? Forty years of experience in treating patients is less valuable than a single anecdote published in a journal? Really?

And of course, it doesn't actually work that way. The TSH test ruling out hypothyroidism is expert opinion. Its reliability is unfounded dogma. I can't find any evidence for it as the sole measure of thyroid system function at all.

Comment author: AstraSequi 19 April 2016 07:45:34PM *  1 point [-]

If none of the patients had had any sort of thyroid problem, I'd have expected it to be equally bad for everyone.

I’m talking about conservation of expected evidence. If X is positive evidence, then ~X is negative evidence. An experiment only supports a hypothesis if it was possible for it to come out another way that refutes it. And if an experiment that could have supported the hypothesis actually didn’t, then it’s evidence against.

What makes me think that they felt bad on thyroxine is table 2, where all the 'self-reported' psychological scores have got worse from thyroxine. In particular p=0.007 for the decline in Vitality. Since, as you point out, they really didn't know which was which, it's hard to see how they could have faked that.

Terminology then. When you said “Thyroxine is very strongly disliked by the healthy controls (they could tell it from placebo and hated it),” it suggests they could identify the active treatment.

Absolutely this treatment is harmful to healthy people.

The people in the study had symptoms. Even if you think their symptoms were mild or unrepresentative, you shouldn’t call them healthy. It’s fair to extend the conclusion to cover people without those symptoms, but I think that’s an important difference.

Yes, but that does mean that anything that needs careful dose control will get rejected.

It’s more that you need an easily followed protocol. Anything else, especially anything subjective, is unlikely to be practically feasible, and will probably not be reproducible.

The TSH test replaced that around 1970. But they never seem to have checked that clinical and biochemical diagnoses detected the same things, and after that there was the slow emergence of all sorts of nasty diseases that look very like hypothyroidism in the clinical sense but have normal TSH.

This is normal. Clinical presentations often have many causes, which makes it almost impossible to progress. Eventually we break them down based on their causal mechanisms so we can treat them individually. Each time we find a new cause, some of the cases will be left unexplained.

These are the only ones I can find through google scholar / pubmed. That in itself is really surprising and one of the things I can't explain! Why has such an obvious thing not been ruled out?

There are a lot of interesting hypotheses competing for resources, and we have to decide which ones are worth considering. I can’t say what the reason might be here, but there are a lot of possibilities. For example, it might not be possible to design a study like the one you want that could effectively answer the question.

Really? Forty years of experience in treating patients is less valuable than a single anecdote published in a journal? Really?

Yes. Expert opinion (i.e., the opinion of individual experts, not expert consensus) is the lowest level because you can find an expert to support pretty much any proposition that isn’t obviously ridiculous, and sometimes even if it is. In fact, this is true higher in the hierarchy as well, which is why we use syntheses of evidence so much. I can’t stress this enough: in biology, you can use peer-reviewed evidence to make plausible arguments for arbitrary hypotheses.

All the rest of it is anecdotal, from alternative sources, but there's a mountain of it.

The point of evidence-based medicine is that perceptions are unreliable. That includes the perceptions we call clinical experience (which once said that bloodletting was an important medical treatment). Keep in mind that doctors aren’t scientists and usually don’t even qualify as experts. EBM is unreliable too, but less so, just like science is unreliable but is still better than ancestral wisdom.

The TSH test ruling out hypothyroidism is expert opinion. Its reliability is unfounded dogma.

This sounds like you’re saying the TSH test doesn’t actually measure TSH, but I think you mean to say you disagree with the conclusions that it’s used for. But since hypothyroidism is defined as low thyroid hormone levels, some of this will be a dispute over definitions.

I can't find any evidence for it as the sole measure of thyroid system function at all.

I don’t think anyone who understands it would say it is. It measures TSH levels, and the question is what we do with that measurement. But we’re often limited by what we’re able to (easily) measure, and it might even be the only objective measurement we have.

Comment author: AstraSequi 18 April 2016 03:53:07AM 1 point [-]

Why is the Pollock trial evidence supporting your hypothesis? What outcome from the trial would you have considered to be evidence against it?

Also, what part suggests that the healthy controls could distinguish the treatment from placebo? From Table 4, it seems that the reverse is true.

At first glance, the results from that study look like straightforward evidence that this treatment is actively harmful. I’d also point out that RCTs need to be standardized across patients. I can’t say whether the inclusion criteria should have been different, but choosing a single dose is normal procedure. There are always better options, but it’s a weak argument on its own, in part because it can be applied in almost any circumstances.

Everyone who's ever tried fixing the clinical diagnosis of hypothyroidism with any kind of thyroid therapy either seems to think it works, or hasn't written about it on the internet or in the medical literature.

I admit I’m not an endocrinologist, but from what I’m reading I don’t think there is any recognized clinical diagnosis of hypothyroidism. The TSH test is the gold standard. That would suggest those who talk about it are primarily cranks and such.

That's a big claim. I'm making it in bold on Less Wrong. I expect someone to turn up some evidence against it. I would love to see that evidence.

Less Wrong might not be the best place for this, since there aren’t many biologists here. You have the burden of proof (i.e., the prior for arbitrary hypotheses is very low), so you shouldn’t be asking other people to disprove it. Could you summarize your support for this claim? Are these the only two peer-reviewed articles?

Assuming he's not just making up his data it's hard to explain his results.

There are lots of ways that data can be wrong without being made up. 90% of medical research findings are false, etc.

Comment author: AstraSequi 18 April 2016 02:00:57AM *  1 point [-]

This depends on what kind of unfalsifiability you want. There are at least four kinds.

  • unfalsifiable with current resources (Russell's teapot)
  • unfalsifiable because of moving goalposts
  • unfalsifiable because the terms are incoherent or undefined ("not even wrong")
  • unfalsifiable in principle

No empirical claim is unfalsifiable in principle (i.e. without resource limitations, moving goalposts, or logical incoherency). Claims that involve violations of physical law come the closest, but require us to assume 100% confidence in the law itself. For a non-empirical claim to be unfalsifiable, empirical consequences of the claim have to be impossible, which ultimately requires you to eliminate them by definition. I think you’re trying to find an example of the fourth meaning when most people who talk about unfalsifiability are thinking about one of the others.

Comment author: buybuydandavis 20 January 2016 12:26:24AM *  1 point [-]

Always correct your probability estimates for the possibility that you've made an incorrect assumption.

I think that's good too - Jaynes advocated including a "something else that I didn't think of" hypothesis to your hypothesis to avoid accepting something strongly when all you've done is eliminate the alternatives you've considered.

I don't know about this being a category error though. I think "map 1 is accurate with respect to X" is a valid proposition

"Is accurate" isn't much of a proposition in itself, as it leaves out the level of accuracy.

Probability of a proposition. Propositions are true or false. Level of accuracy of a model. Models are more or less accurate.

Comment author: AstraSequi 20 January 2016 02:31:35AM 0 points [-]

Maybe "Is accurate enough that it doesn't change our answer by an unacceptable amount"? The level of accuracy we want depends on context.

How would you measure the accuracy of a model, other than by its probability of giving accurate answers? "Accurate" depends on what margin of error you accept, or you can define it with increasing penalties for increased divergence from reality.

Comment author: PipFoweraker 19 January 2016 11:55:34PM 1 point [-]

If I'm exclusively limiting myself to animals that are raised in an organised fashion for eventual slaughter, I don't think I need too much data to assign broadly negative values to lives that are unusually brutish, nasty and short compared to either non-existence or a hypothetical natural existence.

In my consideration, simple things like the registering of a pain stimulus and the complexity of behaviour to display distress are good enough indicators.

Comment author: AstraSequi 20 January 2016 02:14:29AM *  1 point [-]

I don't think I need too much data to assign broadly negative values to lives that are unusually brutish, nasty and short compared to either non-existence or a hypothetical natural existence.

I don't think you can make that decision so easily. They're protected from predators, well-fed, and probably healthier than they would be in the wild. (About health, the main point against is that diseases spread more rapidly. But farmers have an incentive to prevent that, and they have antibiotics and access to minimal veterinary treatment.)

'no pig' > 'happy pig + surprise axe'

This leads me to conclusions I disagree with - like if a person is murdered, then their life had negative value.

Comment author: buybuydandavis 18 January 2016 04:02:06AM 2 points [-]

it changes the issue to the probability of the model.

To throw out an idea I never followed up on, I think the "probability of a model" is a category error. Most models we deal with, and particularly in the context of assigning probabilities to models, are not propositions that are true or false, but maps that are more or less accurate.

I'm not sure what the implications to model testing and generalization theory would be in that, but I expect there would be some, and it always just irked me to see things like P(M1).

I think 4 generalizes better as

Impossible under certain assumptions does not mean impossible.

Remembering Jaynes' "background information I" is often helpful.

Comment author: AstraSequi 19 January 2016 12:30:46PM 1 point [-]

Another way to generalize 4 is

Always correct your probability estimates for the possibility that you've made an incorrect assumption.

I don't think "changes the issue" is the best way to say this, because there is always a probability that your model won't work even if it doesn't say something is impossible.

I don't know about this being a category error though. I think "map 1 is accurate with respect to X" is a valid proposition.

Comment author: AstraSequi 19 January 2016 12:21:40PM *  1 point [-]

I would add the reverse of #3: "There is evidence for it" doesn't mean much on its own either, for the same reasons.

Comment author: AstraSequi 19 January 2016 12:00:07PM *  4 points [-]

My sympathies for your loss.

In the tradition of "making up numbers and doing Fermi estimation is better than making up answers," I would focus on the history. The frequency of past outcomes is always a good place to start (I think that's in the Sequences somewhere) since there's no need to consider causality, only frequency and genetic distance. An example:

Simplify and assume the cause is genetic (which will overestimate the probability; environmental or shared genetic-environmental has more randomness and will have occurrence closer to the population average). What is the total number of siblings for yourself and your spouse, including both of you, and how many stillbirths were there? Add your children to the number, including the one stillbirth, and weight those double because they're the generation you want to know about. Calculate the percentage, then increase it by 5-10% as a crude correction for the assumption of a genetic cause. This is my estimate before you start thinking about causality.

Other things: If V is your son from a different relationship, his genetic distance is further so I would give him normal weight instead of double, but if L has other children I would still double them since the mother's genetics are probably more important. Optionally add any of your siblings' children, but weight them by half due to greater genetic distance. Check what percentage of stillbirths are genetic vs environmental, which could be used that to make a better correction than 5-10%. To avoid the multiple comparisons problem, make these choices before doing the analysis and commit not to change them.

Disclaimers: I am not a doctor or genetic counselor, and this is not medical advice. This is a superficial analysis written at 5am with the first few ideas I thought of, based on my unreliable intuitions about what sort of estimates might work. This sort of estimate is a lot weaker than direct evidence like the BMJ meta-analysis. I take no responsibility for any decisions that anyone makes...etc.

PS: you should probably assume the disclaimers apply to anything you read here. Also, I think another reason doctors avoid giving probabilities is that there can be legal consequences, especially if they're misinterpreted.

Comment author: IlyaShpitser 16 December 2015 04:30:49AM *  2 points [-]

Should be careful with that, might confuse people, see also:

https://en.wikipedia.org/wiki/Confounding

which gets it wrong.


A variable with no detectable correlation with the outcome might still be a confounder, of course, you might have unfaithful things going on, or dependence might be non-linear. "Unlikely" usually implies "with respect to some model" you have in mind. How do you know that model is right? What if the true model is highly unfaithful for some reason? etc. etc.


edit: I don't mean to jump on you specifically, but it sort of is unfortunate that it somehow is a social norm to say wrong things in statistics "informally." To me, that's sort of like saying "don't worry, when I said 2+2=5, I was being informal."

Comment author: AstraSequi 18 December 2015 02:29:25AM 1 point [-]

To me, that's sort of like saying "don't worry, when I said 2+2=5, I was being informal."

Very true. This is something I'll try to change.

Comment author: IlyaShpitser 15 December 2015 07:21:25AM *  0 points [-]

Yes, M-bias is an example of a situation where a variable depends on treatment and outcome, but is not a confounder. Hence I was confused by your statement:

confounding depends on the correlations with both the independent and dependent variables

Confounding is not about that at all.

Comment author: AstraSequi 16 December 2015 04:25:14AM *  0 points [-]

I used "depends" informally, so I didn't mean to say that variables that depend on treatment and outcome are always confounders. I was answering the implication that a variable with no detectable correlation with the outcome is not likely to be a source of confounding. I assumed they were using a correlational definition of confounding, so I answered in that context.

View more: Prev | Next