[Link] Algorithm aversion
It has long been known that algorithms out-perform human experts on a range of topics (here's a LW post on this by lukeprog). Why, then, is it that people continue to mistrust algorithms, in spite of their superiority, and instead cling to human advice? A recent paper by Dietvorst, Simmons and Massey suggests it is due to a cognitive bias which they call algorithm aversion. We judge less-than-perfect algorithms more harshly than less-than-perfect humans. They argue that since this aversion leads to poorer decisions, it is very costly, and that we therefore must find ways of combating it.
Abstract:
Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.
General discussion:
The results of five studies show that seeing algorithms err makes people less confident in them and less likely to choose them over an inferior human forecaster. This effect was evident in two distinct domains of judgment, including one in which the human forecasters produced nearly twice as much error as the algorithm. It arose regardless of whether the participant was choosing between the algorithm and her own forecasts or between the algorithm and the forecasts of a different participant. And it even arose among the (vast majority of) participants who saw the algorithm outperform the human forecaster.
The aversion to algorithms is costly, not only for the participants in our studies who lost money when they chose not to tie their bonuses to the algorithm, but for society at large. Many decisions require a forecast, and algorithms are almost always better forecasters than humans (Dawes, 1979; Grove et al., 2000; Meehl, 1954). The ubiquity of computers and the growth of the “Big Data” movement (Davenport & Harris, 2007) have encouraged the growth of algorithms but many remain resistant to using them. Our studies show that this resistance at least partially arises from greater intolerance for error from algorithms than from humans. People are more likely to abandon an algorithm than a human judge for making the same mistake. This is enormously problematic, as it is a barrier to adopting superior approaches to a wide range of important tasks. It means, for example, that people will more likely forgive an admissions committee than an admissions algorithm for making an error, even when, on average, the algorithm makes fewer such errors. In short, whenever prediction errors are likely—as they are in virtually all forecasting tasks—people will be biased against algorithms.
More optimistically, our findings do suggest that people will be much more willing to use algorithms when they do not see algorithms err, as will be the case when errors are unseen, the algorithm is unseen (as it often is for patients in doctors’ offices), or when predictions are nearly perfect. The 2012 U.S. presidential election season saw people embracing a perfectly performing algorithm. Nate Silver’s New York Times blog, Five Thirty Eight: Nate Silver’s Political Calculus, presented an algorithm for forecasting that election. Though the site had its critics before the votes were in— one Washington Post writer criticized Silver for “doing little more than weighting and aggregating state polls and combining them with various historical assumptions to project a future outcome with exaggerated, attention-grabbing exactitude” (Gerson, 2012, para. 2)—those critics were soon silenced: Silver’s model correctly predicted the presidential election results in all 50 states. Live on MSNBC, Rachel Maddow proclaimed, “You know who won the election tonight? Nate Silver,” (Noveck, 2012, para. 21), and headlines like “Nate Silver Gets a Big Boost From the Election” (Isidore, 2012) and “How Nate Silver Won the 2012 Presidential Election” (Clark, 2012) followed. Many journalists and popular bloggers declared Silver’s success a great boost for Big Data and statistical prediction (Honan, 2012; McDermott, 2012; Taylor, 2012; Tiku, 2012).
However, we worry that this is not such a generalizable victory. People may rally around an algorithm touted as perfect, but we doubt that this enthusiasm will generalize to algorithms that are shown to be less perfect, as they inevitably will be much of the time.
Announcing LessWrong Digest
I've been making rounds on social media with the following message.
Great content on LessWrong isn't as frequent as it used to be, so not as many people read it as frequently. This makes sense. However, I read it at least once every two days for personal interest. So, I'm starting a LessWrong/Rationality Digest, which will be a summary of all posts or comments exceeding 20 upvotes within a week. It will be like a newsletter. Also, it's a good way for those new to LessWrong to learn cool things without having to slog through online cultural baggage. It will never be more than once weekly. If you're curious here is a sample of what the Digest will be like.
https://docs.google.com/document/d/1e2mHi7W0H2toWPNooSq7QNjEhx_xa0LcLw_NZRfkPPk/edit
Also, major blog posts or articles from related websites, such as Slate Star Codex and Overcoming Bias, or publications from the MIRI, may be included occasionally. If you want on the list send an email to:
lesswrongdigest *at* gmail *dot* com
Users of LessWrong itself have noticed this 'decline' in frequency of quality posts on LessWrong. It's not necessarily a bad thing, as much of the community has migrated to other places, such as Slate Star Codex, or even into meatspace with various organizations, meetups, and the like. In a sense, the rationalist community outgrew LessWrong as a suitable and ultimate nexus. Anyway, I thought you as well would be interested in a LessWrong Digest. If you or your friends:
- find articles in 'Main' are too infrequent, and Discussion only filled with announcements, open threads, and housekeeping posts, to bother checking LessWrong regularly, or,
- are busying themselves with other priorities, and are trying to limit how distracted they are by LessWrong and other media
the LessWrong Digest might work for you, and as a suggestion for your friends. I've fielded suggestions I transform this into a blog, Tumblr, or other format suitable for RSS Feed. Almost everyone is happy with email format right now, but if a few people express an interest in a blog or RSS format, I can make that happen too.
Request: Sequences book reading group
The book version of the Sequences is supposed to be published in the next month or two, if I understand correctly. I would really enjoy an online reading group to go through the book together.
Reasons for a reading group:
- It would give some of us the motivation to actually go through the Sequences finally.
- I have frequently had thoughts or questions on some articles in the Sequences, but I refrained from commenting because I assumed it would be covered in a later article or because I was too intimidated to ask a stupid question. A reading group would hopefully assume that many of the readers would be new to the Sequences, so asking a question or making a comment without knowing the later articles would not appear stupid.
- It may even bring back a bit of the blog-style excitement of the "old" LW ("I wonder what exciting new thoughts are going to be posted today?") that many have complained has been missing since the major contributors stopped posting.
The Truth About Mathematical Ability
There's widespread confusion about the nature of mathematical ability, for a variety of reasons:
- Most people don't know what math is.
- Most people don't know enough statistics to analyze the question properly.
- Most mathematicians are not very metacognitive.
- Very few people have more than a casual interest in the subject.
If the nature of mathematical ability were exclusively an object of intellectual interest, this would be relatively inconsequential. For example, many people are confused about Einstein’s theory of relativity, but this doesn’t have much of an impact on their lives. But in practice, people’s misconceptions about the nature of mathematical ability seriously interfere with their own ability to learn and do math, something that hurts them both professionally and emotionally.
I have a long standing interest in the subject, and I’ve found myself in the unusual position of being an expert. My experiences include:
- Completing a PhD in pure math at University of Illinois.
- Four years of teaching math at the high school and college levels (precalculus, calculus, multivariable calculus and linear algebra)
- Personal encounters with some of the best mathematicians in the world, and a study of great mathematicians’ biographies.
- A long history of working with mathematically gifted children: as a counselor at MathPath for three summers, through one-on-one tutoring, and as an instructor at Art of Problem Solving.
- Studying the literature on IQ and papers from the Study of Exceptional Talent as a part of my work for Cognito Mentoring.
- Training as a full-stack web developer at App Academy.
- Doing a large scale data science project where I applied statistics and machine learning to make new discoveries in social psychology.
I’ve thought about writing about the nature of mathematical ability for a long time, but there was a missing element: I myself had never done genuinely original and high quality mathematical research. After completing much of my data science project, I realized that this had changed. The experience sharpened my understanding of the issues.
This is a the first of a sequence of posts where I try to clarify the situation. My main point in this post is:
There are several different dimensions to mathematical ability. Common measures rarely assess all of these dimensions, and can paint a very incomplete picture of what somebody is capable of.
Don't estimate your creative intelligence by your critical intelligence
When I criticize, I'm a genius. I can go through a book of highly-referenced scientific articles and find errors in each of them. Boy, I feel smart. How are these famous people so dumb?
But when I write, I suddenly become stupid. I sometimes spend half a day writing something and then realize at the end, or worse, after posting, that what it says simplifies to something trivial, or that I've made several unsupported assumptions, or claimed things I didn't really know were true. Or I post something, then have to go back every ten minutes to fix some point that I realize is not quite right, sometimes to the point where the whole thing falls apart.
If someone writes an article or expresses an idea that you find mistakes in, that doesn't make you smarter than that person. If you create an equally-ambitious article or idea that no one else finds mistakes in, then you can start congratulating yourself.
Purchasing research effectively open thread
Many of the biggest historical success stories in philanthropy have come in the form of funding for academic research. This suggests that the topic of how to purchase such research well should be of interest to effective altruists. Less Wrong survey results indicate that a nontrivial fraction of LW has firsthand experience with the academic research environment. Inspired by the recent Elon Musk donation announcement, this is a thread for discussion of effectively using money to enable important, useful research. Feel free to brainstorm your own questions and ideas before reading what's written in the thread.
The Unique Games Conjecture and FAI: A Troubling Obstacle
I am not a computer scientist and do not know much about complexity theory. However, it's a field that interests me, so I occasionally browse some articles on the subject. I was brought to https://www.simonsfoundation.org/mathematics-and-physical-science/approximately-hard-the-unique-games-conjecture/ by a link on Scott Aaronson's blog, and read the article to reacquaint myself with the Unique Games Conjecture, which I had partially forgotten about. If you are not familiar with the UGC, that article will explain it to you better than I can.
One phrase in the article stuck out to me: "there is some number of colors k for which it is NP-hard (that is, effectively impossible) to distinguish between networks in which it is possible to satisfy at least 99% of the constraints and networks in which it is possible to satisfy at most 1% of the constraints". I think this sentence is concerning for those interested in the possibility of creating FAI.
It is impossible to perfectly satisfy human values, as matter and energy are limited, and so will be the capabilities of even an enormously powerful AI. Thus, in trying to maximize human happiness, we are dealing with a problem that's essentially isomorphic to the UGC's coloring problem. Additionally, our values themselves are ill-formed. Human values are numerous, ambiguous, even contradictory. Given the complexities of human value systems, I think it's safe to say we're dealing with a particularly nasty variation of the problem, worse than what computer scientists studying it have dealt with.
Not all specific instances of complex optimization problems are subject to the UGC and thus NP hard, of course. So this does not in itself mean that building an FAI is impossible. Also, even if maximizing human values is NP hard (or maximizing the probability of maximizing human values, or maximizing the probability of maximizing the probability of human values) we can still assess a machine's code and actions heuristically. However, even the best heuristics are limited, as the UGC itself demonstrates. At bottom, all heuristics must rely on inflexible assumptions of some sort.
Minor edits.
An Introduction to Control Theory
Behavior: The Control of Perception by William Powers applies control theory to psychology to develop a model of human intelligence that seems relevant to two of LW's primary interests: effective living for humans and value-preserving designs for artificial intelligence. It's been discussed on LW previously here, here, and here, as well as mentioned in Yvain's roundup of 5 years (and a week) of LW. I've found previous discussions unpersuasive for two reasons: first, they typically only have a short introduction to control theory and the mechanics of control systems, making it not quite obvious what specific modeling techniques they have in mind, and second, they often fail to communicate the differences between this model and competing models of intelligence. Even if you're not interested in its application to psychology, control theory is a widely applicable mathematical toolkit whose basics are simple and well worth knowing.
Because of the length of the material, I'll split it into three posts. In this post, I'll first give an introduction to that subject that's hopefully broadly accessible. The next post will explain the model Powers introduces in his book. In the last post, I'll provide commentary on the model and what I see as its implications, for both LW and AI.
Exams and Overfitting
When I hear something like "What's going to be on the exam?", part of me gets indignant. WHAT?!?! You're defeating the whole point of the exam! You're committing the Deadly Sin of Overfitting!
Let me step back and explain my view of exams.
When I take a class, my goal is to learn the material. Exams are a way to answer the question, "How well did I learn the material?"[1]. But exams are only a few hours long, so it's unfeasible to have questions on all of the material. To deal with this time constraint, an exam takes a random sample of the material and gives me a "statistical" rather than "perfect" answer to the question, "How well did I learn the material?"
If I know in advance what topics will be covered on the exam, and if I then prepare for the exam by learning only those topics, then I am screwing up this whole process. By doing very well on the exam, I get the information, "Congratulations! You learned the material covered on the exam very well." But who knows how well I learned the material covered in class as a whole? This is a textbook case of overfitting.
To be clear, I don't necessarily lose respect for someone who asks, "What's going to be on the exam?". I understand that different people have different priorities[2], and that's fine by me. But if you're taking a class because you truly want to learn the material, in spite of any sacrifices that you might have to make to do so[3], then I'd like to encourage you not to "study for the test". I'd like to encourage you not to overfit.
[1] When I say "learned", I mean in the "Feynman" sense, not in the "teacher's password" sense. I believe that a necessary (but not sufficient) condition for an exam to check for this kind of learning is to have problems that I've never seen before.
[2] Someone might care much more about getting into medical school than, say, mastering classical mechanics. I respect that choice, and I acknowledge that someone might be in a system where getting a good grade in physics is required for getting into medical school, even though mastering classical mechanics isn't required for becoming a good doctor.
[3] There were a few terms when I felt like I did a really good job of learning the material (conveniently, I also got really good grades during these terms). But for these terms, one (or both) of the following would happen:
- I would take a huge hit in social status, because I was taking barely more than the minimum courseload. At my university, there was a lot of social pressure to always take the maximum courseload (or petition to exceed the maximum courseload), and still participate in lots of extracurricular activities.
- My girlfriend at the time would break up with me because of all the time I was spending on my coursework (and not with her).
Open thread, Jan. 12 - Jan. 18, 2015
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)