If you get a well labelled dataset, I think this is pretty thoroughly within the scope of current machine learning technologies, but that means spending perhaps hundreds of hours labelling papers as a certain amount postmodern out of 100. If you're trying to single out the postmodernism that you're convinced is total BS, then that's more complex. Doable but you need to make the case to me about why it would be worthwhile, and what exactly your aim would be.
Thanks Ryan, that's helpful. Yes, I'm not sure one would be able to do something that has the right combination of accuracy, interestingness and low-cost at present.
If you had a million labelled postmodern and non-postmodern papers, you could decently identify them.
You could categorise most papers with fewer labels using citation graphs.
You can recommend papers, as you would Amazon books with a recommender system (using ratings).
There are hundreds of ways to apply machine learning to academic articles; it's a matter of deciding what you want the machine learning to do.
Sure, I guess my question was whether you'd think that it'd be possible to do this in a way that would resonate with readers. Would they find the estimates of quality, or level of postmodernism, intuitively plausible?
My hunch was that the classification would primarily be based on patterns of word use, but you're right that it would probably be fruitful to use at patterns of citations.
Asking scientists to keep their paper titles hedge-drift-resistant means (1) asking each individual scientist to do something that will reduce the visibility of their work relative to others', for the sake of a global benefit -- a class of policy that for obvious reasons doesn't have a great track record -- and (2) asking them to give their papers titles that are boring and wordy.
I agree that the world might be a better place if scientists consistently did this. But it doesn't seem very likely to happen.
(Also, here's what might happen if they almost consistently did this: the better, more conscientious scientists all write carefully hedged articles with carefully hedged titles, and journalists ignore all of them because they all sound like "Correlational analysis of OCEAN traits weakly suggest slight association between conscientiousness and Y-chromosome haplogroup O3". A few less careful scientists write lower-quality papers that, among other things, have titles like "The Chinese work harder: correlational analysis of OCEAN traits and genotype", and those are the ones that the journalists pick up. These are also the ones without the careful hedging in the actual analysis, without serious attempts to correct for multiple correlations, etc. So we end up with worse stuff in the press.)
Good points. I agree that what you write within parentheses is a potential problem. Indeed, it is a problem for many kinds of far-reaching norms on altruistic behaviour compliance with which is hard to observe: they might handicap conscientious people relative to less conscientious people to such an extent that the norms do more harm than good.
I also agree that individualistic solutions to collective problems have a chequered record. The point of 1)-3) was rather to indicate how you potentially could reduce hedge drift, given that you want to do that. To get scientists and others to want to reduce hedge drift is probably a harder problem.
In conversation, Ben Levinstein suggested that it is partly the editors' role to frame articles in a way such that hedge drift doesn't occur. There is something to that, though it is of course also true that editors often have incentives to encourage hedge drift as well.
Related: Scott Adams' Law of Slow Moving Disasters
"whenever humanity can see a slow-moving disaster coming, we find a way to avoid it. Let’s run through some examples:
Thomas Malthus famously predicted that the world would run out of food as the population grew. Instead, humans improved their farming technology.
When I was a kid, it was generally assumed that the world would be destroyed by a global nuclear war. The world has been close to nuclear disaster a few times, but so far we’ve avoided all-out nuclear war.
The world was supposed to run out of oil by now, but instead we keep finding new ways to extract it from the ground. The United States has unexpectedly become a net provider of energy.
The debt problem in the United States was supposed to destroy the economy. Instead, the deficit is shrinking, the stock market is surging, and the price of gold is plummeting."
Thanks. My claim is somewhat different, though. Adams says that "whenever humanity can see a slow-moving disaster coming, we find a way to avoid it". This is an all-things-considered claim. My claim is rather that sleepwalk bias is a pro-tanto consideration indicating that we're too pessimistic about future disasters (perhaps especially slow-moving ones). I'm not claiming that we never sleepwalk into a disaster. Indeed, there might be stronger countervailing considerations, which if true would mean that all things considered we are too optimistic about existential risk.
There are also some examples of anti-sleepwalk bias:
1. World War I. The crisis unfolded over more than a month. Surely the diplomats will work something out right? Nope.
2. Germany's invasion of the Soviet Union in World War II. Surely some of Hitler's generals will speak up and persuade Hitler away from this crazy plan when Germany has not even finished the first part of the war against Britain. Surely Germany would not willingly put itself into another two-front war even after many generals had explicitly decided that Germany must never get involved in another two-front war ever again. Right? Nope.
3. The sinking of the Titanic. Surely, with over two and a half hours to react to the iceberg impact before the ship finished sinking, SURELY there would be enough time to get all of the lifeboats safely and calmly loaded up to near max capacity, right? NOPE. And going even further back to the decision to not put enough lifeboats on in the first place...SURELY the White Star Line must have a good reason for this. SURELY this means that the ship really is unsinkable, right? NOPE.
4. The 2008 financial crisis. SURELY the monetary authorities have solved the problem of preventing recessions and smoothing out the business cycle. So SURELY I as a private trader can afford to be as reckless as I want and not have to worry about systemic risk, etc.
It is not quite clear to me whether you are here just talking about instances of sleepwalking, or whether you are also talking about a predictive error indicating anti-sleepwalking bias: i.e. that they wrongly predicted that the relevant actors would act, yet they sleepwalked into a disaster.
Also, my claim is not that sleepwalking never occurs, but that people on average seem to think that it happens more often than it actually does.
Great post. Another issue is why B doesn't believe Y in spite of believing X and in spite of A believing that X implies Y. Some mechanisms:
a) B rejects that X implies Y, for reasons that are good or bad, or somewhere in between. (Last case: reasonable disagreement.)
b) B hasn't even considered whether X implies Y. (Is not logically omniscient.)
c) Y only follows from X given some additional premises Z, which B either rejects (for reasons that are good or bad or somehwere in between) or hasn't entertained. (What Tyrrell McAllister wrote.)
d) B is confused over the meaning of X, and hence is confused over what X implies. (The dialect case.)
Your problem is called a clustering problem. First of all, you need to answer how you measure your error (information loss, as you call it). Typical error norms used are l1 (sum of individual errors), l2 (sum of squares of errors, penalizes larger errors more) and l-infinity (maximum error).
Once you select a norm, there always exists a partition that minimizes your error, and to find it there are a bunch of heuristic algorithms, e.g. k-means clustering. Luckily, since your data is one-dimensional and you have very few categories, you can just brute force it (for 4 categories you need to correctly place 3 boundaries, and naively trying all possible positions takes only n^3 runtime)
Hope this helps.
Thanks a lot! Yes, super-useful.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
As Bastian Stern has pointed out to me, people often mix up pro tanto-considerations with all-things-considered-judgements - usually by interpreting what is merely intended to be a pro tanto-consideration as an all-things-considered judgement. Is there a name for this fallacy? It seems both dangerous and common so should have a name.