The classic example fitting the title, which I learned from a Martin Gardner article (I think he cited it from some 1800s person), is: "Hypothesis: No man is 100 feet tall. Evidence: You see a man who is 99 feet tall. Technically, that evidence does fit the hypothesis, but probably after seeing that evidence you would become much less confident in the hypothesis."
Well, basically, it can all be interpreted as having multiple competing theories (e.g. "The tallest human ever was slightly under 9 feet, humans generally follow a certain distribution, your heart probably wouldn't even be able to circulate blood that far, etc." vs "Something very, very weird and new that breaks my understanding is happening") and the evidence can be considered in a Bayesian way.
From the HN comments:
If my test suite never ever goes red, then I don't feel as confident in my code as when I have a small number of red tests.
That seems like an example of this that I have definitely experienced, where A is "my code is correct", B is "my code is not correct", and the failure case is "my tests appear to be exercising the code but actually aren't."
Or the newer version, "one weird trick", where the purpose of the negative-sounding adjective "weird" is to explain why you haven't heard the trick before, if it's so great.
When apparently positive evidence can be negative evidence
It's when my outgroup talks about the positive evidence, of course.
(just kidding)
This is a thought-provoking journal article discussing cases where:
An interesting anecdote which I haven't verified:
Via Hacker News.