Every point you made (0)-(5) is correct!
(0) There are some social scientists, especially in political science, who are focused on applying machine learning and text mining methods to political texts. This is a big movement and it's under the heading "text as data". Most publications use fairly simple methods, basically calibrated regressions, but a lot of thought went into choosing those and some of the people publishing are mathematically sophisticated.
Example: http://www.justingrimmer.org/
Another prominent example comes in Social Networks, where people from the CS and physics world work on the social side, and some social scientists use the methodology too.
Example: http://cs.stanford.edu/people/jure/
At the Santa Fe institute people from all kinds of disciplines do all kind of stuff, but an overall theme is methods drawing from math and physics applied to social sciences. This include networks, statistical physics, and game theory.
Not exactly social science, but Jennifer Dunne applies network analysis to food webs: http://www.santafe.edu/about/people/profile/Jennifer%20A.%20Dunne
I am certain that cutting edge mathematics and ML are applied in pockets of econometrics too. Finance is often in economics departments and ML has thoroughly invaded that, but I admit that's a stretch.
(1) Social science academics have only recently gained access to large datasets. Especially in survey-based fields like sociology and experimental psychology, small-data-oriented methods are definitely the focus. Large datasets include medical datasets, to the extent that they have access; various massive text repositories including academic paper databases and online datasets; and a very few surveys that have the size and depth to support fancier analyses.
This applies less to probit and more to clustering, bayes nets, decision trees, etc.
(2) The culture is definitely conservative. I've talked to many people interested in the more advanced methods and they have to fight harder to get published; but the tide is changing.
(3) Absolutely. It's very hard to figure out what coefficients represent when data is ambiguous and many factors are highly correlated (as they are in social science) and when the model is very possibly misspecified. Clusterings with "high score" from most methods can be completely spurious and it take advanced statistical knowledge to identify this. ML is good for prediction and classification, but this is very rarely the goal of social scientists (though one can imagine how it could be). SVMs and decision trees do a poor job of extracting causal relationships with any certainty.
(4) Again, the culture is conservative and many don't have these training. A good number know their way around R though, and newer ones often come in with quite a bit of stats/CS knowledge. The amount of statistical knowledge in the social sciences is growing fast.
(5) Yes; this is especially true of something like neural networks.
I asked this question on Facebook here, and got some interesting answers, but I thought it would be interesting to ask LessWrong and get a larger range of opinions. I've modified the list of options somewhat.
What explains why some classification, prediction, and regression methods are common in academic social science, while others are common in machine learning and data science?
For instance, I've encountered probit models in some academic social science, but not in machine learning.
Similarly, I've encountered support vector machines, artificial neural networks, and random forests in machine learning, but not in academic social science.
The main algorithms that I believe are common to academic social science and machine learning are the most standard regression algorithms: linear regression and logistic regression.
Possibilities that come to mind:
(0) My observation is wrong and/or the whole question is misguided.
(1) The focus in machine learning is on algorithms that can perform well on large data sets. Thus, for instance, probit models may be academically useful but don't scale up as well as logistic regression.
(2) Academic social scientists take time to catch up with new machine learning approaches. Of the methods mentioned above, random forests and support vector machines was introduced as recently as 1995. Neural networks are older but their practical implementation is about as recent. Moreover, the practical implementations of these algorithm in the standard statistical softwares and packages that academics rely on is even more recent. (This relates to point (4)).
(3) Academic social scientists are focused on publishing papers, where the goal is generally to determine whether a hypothesis is true. Therefore, they rely on approaches that have clear rules for hypothesis testing and for establishing statistical significance (see also this post of mine). Many of the new machine learning approaches don't have clearly defined statistical approaches for significance testing. Also, the strength of machine learning approaches is more exploratory than testing already formulated hypotheses (this relates to point (5)).
(4) Some of the new methods are complicated to code, and academic social scientists don't know enough mathematics, computer science, or statistics to cope with the methods (this may change if they're taught more about these methods in graduate school, but the relative newness of the methods is a factor here, relating to (2)).
(5) It's hard to interpret the results of fancy machine learning tools in a manner that yields social scientific insight. The results of a linear or logistic regression can be interpreted somewhat intuitively: the parameters (coefficients) associated with individual features describe the extent to which those features affect the output variable. Modulo issues of feature scaling, larger coefficients mean those features play a bigger role in determining the output. Pairwise and listwise R^2 values provide additional insight on how much signal and noise there is in individual features. But if you're looking at a neural network, it's quite hard to infer human-understandable rules from that. (The opposite direction is not too hard: it is possible to convert human-understandable rules to a decision tree and then to use a neural network to approximate that, and add appropriate fuzziness. But the neural networks we obtain as a result of machine learning optimization may be quite different from those that we can interpret as humans). To my knowledge, there haven't been attempts to reinterpret neural network results in human-understandable terms, though Sebastian Kwiatkowski's comment on my Facebook post points to an example where the results of naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as "small bedroom", were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit). But Kwiatkowski's comment also pointed to other instances where the machine's algorithms weren't human-interpretable.
What's your personal view on my main question, and on any related issues?