I present an algorithm I designed to predict which position a person would report for an issue on TakeOnIt, through Bayesian updates on the evidence of other people's positions on that issue. Additionally, I will point out some potential areas of improvement, in the hopes of inspiring others here to expand on this method.
For those not familiar with TakeOnIt, the basic idea is that there are issues, represented by yes/no questions, on which people can take the positions Agree (A), Mostly Agree (MA), Neutral (N), Mostly Disagree (MD), or Disagree (D). (There are two types of people tracked by TakeOnIt: users who register their own opinions, and Experts/Influencers whose opinions are derived from public quotations.)
The goal is to predict what issue a person S would take on a position, based on the positions registered by other people on that question. To do this, we will use Bayes' Theorem to update the probability that person S takes the position X on issue I, given that person T has taken position Y on issue I:
Really, we will be updating on several people Tj taking positions Ty on I:
To compute this, let us first figure out the prior probability P(S takes X on I). I use for this a generalization of Laplace's Law of Succession (representing my theory that a person will take each position with a particular frequency, and that there is no reason, before seeing their actual position, to suppose that one position in particular is more frequent than the others), that the odds that S takes the position A : MA : N : MD : D on I is given by:
1 + count of issues S has taken position A on : 1 + count of issues S has taken position MA on : 1 + count of issues S has taken position N on : 1 + count of issues S has taken position MD on : 1 + count of issues S has taken position D on
Thus, the probability
Likewise the probability
This leaves one term in Bayes' Theorem to figure out: P(Tj takes Yj on I | S takes X on I)
For this, I will again use the Generalized Laplace's Law of Succession, looking at issues on which both S and Tj have taken positions:
We now know how to compute, from the records of people's positions on issues, all the terms that Bayes' Theorem requires to compute the posterior probability that person S will take position X on issue I.
So, how well does this work? At this time, I have coded up a SQL script to be run against TakeOnIt's database, that predicts a user's positions based on the positions of Expert's/Influencers. (TakeOnIt stores positions for these types of users differently, which is why the first version doesn't just update on the positions of all people.) I have run this script to make predictions for myself, seeing as I am in a privileged position to judge the accuracy of those predictions. Looking at the predictions it made for me with greater than 80% confidence: Of the three predictions made with more than 90% confidence, all were correct, and of the 11 made with between 80% and 90% confidence, 10 were correct, and 1 was incorrect. From this limited data, it seems the algorithm is underconfident. I have registered my opinion on 40 issues.
In case you think my positions might be influenced by the positions, I have also looked at its retrodictions for positions I have already registered. It assigned 18% probability to my Neutral position on the issue Are successful entrepreneurs big risk takers?. My remaining 39 positions it predicted with confidence ranging from 60% to (due to round off errors on one issue) 100%.
Some areas for improvement:
This algorithm does not make any use of the structure of the possible positions. For example, Disagree is more like Mostly Disagree than Agree. And there is also symmetry such that that Agree relates to Mostly Agree in the same way that Disagree relates to Mostly Disagree. If you changed a question by adding negation, so that all the answers flipped, this algorithm would not necessarily give the flipped probability distribution. Of course, it is also possible that a person's position will not reflect the structure, so we should not completely impose it on the algorithm. But it could be an improvement to measure how well a person follows this structure (and how well people in general follow the structure), and adjust the results accordingly.
The algorithm has a violation of Conservation of Expected Evidence. When it is computing the probability that a person S will take position X on issue I, it has an expectation that person U will take position Z on issue I, which would alter its prediction for person S. But trying to extend the algorithm to recursively make predictions for U to use in its predictions for S would lead to infinite recursion.
I've been considering the question: why has using Collaborative Filtering to predict the "trivial" opinions about movies been prioritized over predicting the "important" opinions about political/social/economic/etc. issues?
On reflection, I don't actually think it's because people care more about the former than the latter. Would you rather have a prediction regarding the opinion as to whether the movie Titanic was good, or a prediction regarding the opinion as to whether there's a housing bubble?
I think the answer is that opinions about products are naturally schematized, and hence easy to collate. Products are already tracked everywhere in databases, so it's pretty easy to extend that model to add opinions about those products. In contrast, opinions about issues, although often even more passionate than opinions about products, are not as naturally schematizable, hence they're harder to collate. Even in terms of representing the identity of an issue, it's not like we have the equivalent of an ISBN number for each issue. So opinions about issues are not adequately schematized and hence we can't collate those opinions into the nice big datasets we'd want to make predictions. Obviously websites like TakeOnIt are trying to change that. Each question ID is analogous to an ISBN number for that issue, if you will.
Yes, I agree with you that opinions of products can help sell products, so predicting opinions about products has the incentive of an immediately obvious monetization strategy. But if there's money to be made depending on correctly predicting the answer to a question, then the potential for monetization of the prediction of those opinions is also there.