You didn't talk about any self-fulfilling negative correlations. A volvo doesn't prevent accidents, just makes the accidents less deadly, so it may actually be the more reckless drivers that take them so they can continue to be reckless (this effect may be much smaller though) or parents that choose a safe car for their reckless teen. Another example is when seatbelts were introduced, the owners of cars with them became more reckless because they thought they were safer, and actually ended up in more accidents (though the death rate of drivers remained about the same because the seatbelts do actually offer protection, not just self fulfilling).
Maybe someone can think of better examples. I can imagine that these are hard to perpetuate though, because the concept that something is safer is often based on scientific evidence with proper selection, or real world evidence with distorted selection, but with a negative correlation, the distorted selection would show the safety device makes people less safe. You would have to have a good advertising team to overcome both the scientific and real world examples of your safety device. Either that or it really does make you slightly safer but the negative correlation effect would have to be strong enough to overcome that. But in either of these cases it isn't self fulfilling, because the concept isn't caused by the results, but either by actual benefit or advertising.
But does this mean that 'safer' and 'less safe' is meaningless for someone choosing a car? I mean, if I have never driven 'for real', without the instructor and with people I like very much sitting next to me, I do want a safer car; but I have no way to know if I am, on average, 'more reckless' or 'less reckless' than other drivers. And with all of these balancing effects, if I have previously found myself leaning towards buying a Volvo, now I have to doubt, vaguely, whether I want a Volvo because I actually think it would give me more leeway to drive poor...
Correlation does not imply causation. Sometimes corr(X,Y) means X=>Y; sometimes it means Y=>X; sometimes it means W=>X, W=>Y. And sometimes it's an artifact of people's beliefs about corr(X, Y). With intelligent agents, perceived causation causes correlation.
Volvos are believed by many people to be safe. Volvo has an excellent record of being concerned with safety; they introduced 3-point seat belts, crumple zones, laminated windshields, and safety cages, among other things. But how would you evaluate the claim that Volvos are safer than other cars?
Presumably, you'd look at the accident rate for Volvos compared to the accident rate for similar cars driven by a similar demographic, as reflected, for instance in insurance rates. (My google-fu did not find accident rates posted on the internet, but insurance rates don't come out especially pro-Volvo.) But suppose the results showed that Volvos had only 3/4 as many accidents as similar cars driven by similar people. Would that prove Volvos are safer?
Perceived causation causes correlation
No. Besides having a reputation for safety, Volvos also have a reputation for being overpriced and ugly. Mostly people who are concerned about safety buy Volvos. Once the reputation exists, even if it's not true, a cycle begins that feeds on itself: Cautious drivers buy Volvos, have fewer accidents, resulting in better statistics, leading more cautious drivers to buy Volvos.
Do Montessori schools or home-schooling result in better scores on standardized tests? I'd bet that they do. Again, my google-fu is not strong enough to find any actual reports on, say, average SAT-score increases for students in Montessori schools vs. public schools. But the largest observable factor determining student test scores, last I heard, is participation by the parents. Any new education method will show increases in student test scores if people believe it results in increases in student test scores, because only interested parents will sign up for that method. The crazier, more-expensive, and more-difficult the method is, the more improvement it should show; craziness should filter out less-committed parents.
Are vegetarian diets or yoga healthy for you? Does using the phone while driving increase accident rates? Yes, probably; but there is a self-fulfilling component in the data that is difficult to factor out.
Conditions under which this occurs
If you believe X helps you achieve Y, and so you use X when you are most-motivated to achieve Y and your motivation has some bearing on the outcome, you will observe a correlation between X and Y.
This won't happen if your motivation or attitude has no bearing on the outcome (beyond your choice of X). If passengers prefer one airline based on their perception of its safety, that won't make its safety record improve.
However, this is different from either confidence or the placebo effect. I'm not talking about the PUA mantra that "if you believe a pickup line will work, it will work". And I'm not talking about feeling better when you take a pill that you think will help you feel better. This is a sample-selection bias. A person is more likely to choose X when they are motivated to achieve Y relative to other possible positive outcomes of X, and hence more inclined to make many other little trade-offs to achieve Y which will not be visible in the data set.
It's also not the effect people are guarding against with double-blind experiments. That's guarding against the experimenter favoring one method over another. This is, rather, an effect guarded against with random assignment to different groups.
Nor should it happen in cases where the outcome being studied is the only outcome people consider. If a Montessori school cost the same, and was just as convenient for the parents, as every other school, and all factors other than test score were equal, and Montessori schools were believed to increase test scores, then any parent who cared at all would choose the Montessori school. The filtering effect would vanish, and so would the portion of the test-score increase caused by it. Same story if one choice improves all the outcomes under consideration: Aluminum tennis racquets are better than wooden racquets in weight, sweet spot size, bounce, strength, air resistance, longevity, time between restrings, and cost. You need not suspect a self-fulfilling correlation.
It may be cancelled by a balancing effect, when you are more highly-motivated to achieve Y when you are less likely to achieve Y. In sports, if you wear your lucky undershirt only for tough games, you'll find it appears to be unlucky, because you're more likely to lose tough games. Another balancing effect is if your choice of X makes you feel so confident of attaining Y that you act less concerned about Y; an example is (IIRC) research showing that people wearing seat-belts are more likely to get into accidents.
Application to machine learning and smart people
Back in the late 1980s, neural networks were hot; and evaluations usually indicated that they outperformed other methods of classification. In the early 1990s, genetic algorithms were hot; and evaluations usually indicated that they outperformed other methods of classification. Today, support vector machines (SVMs) are hot; and evaluations usually indicate that they outperform other methods of classifications. Neural networks and genetic algorithms no longer outperform older methods. (I write this from memory, so you shouldn't take it as gospel.)
There is a publication bias: When a new technology appears, publications indicating it performs well are interesting. Once it's established, publications indicating it performs poorly are interesting. But there's also a selection bias. People strongly motivated to make their systems work well on difficult problems are strongly motivated to try new techniques; and also to fiddle with the parameters until they work well.
Fads can create self-fulfilling correlations. If neural networks are hot, the smartest people tend to work on neural networks. When you compare their results to other results, it can be difficult to look at neural networks vs., say, logistic regression; and factor out the smartest people vs. pretty smart people effect.
(The attention of smart people is a proxy for effectiveness, which often misleads other smart people - e.g., the popularity of communism among academics in America in the 1930s. But that's yet another separate issue.)