A Year of Spaced Repetition Software in the Classroom
Last year, I asked LW for some advice about spaced repetition software (SRS) that might be useful to me as a high school teacher. With said advice came a request to write a follow-up after I had accumulated some experience using SRS in the classroom. This is my report.
Please note that this was not a scientific experiment to determine whether SRS "works." Prior studies are already pretty convincing on this point and I couldn't think of a practical way to run a control group or "blind" myself. What follows is more of an informal debriefing for how I used SRS during the 2014-15 school year, my insights for others who might want to try it, and how the experience is changing how I teach.
Summary
SRS can raise student achievement even with students who won't use the software on their own, and even with frequent disruptions to the study schedule. Gains are most apparent with the already high-performing students, but are also meaningful for the lowest students. Deliberate efforts are needed to get student buy-in, and getting the most out of SRS may require changes in course design.
The software
After looking into various programs, including the game-like Memrise, and even writing my own simple SRS, I ultimately went with Anki for its multi-platform availability, cloud sync, and ease-of-use. I also wanted a program that could act as an impromptu catch-all bin for the 2,000+ cards I would be producing on the fly throughout the year. (Memrise, in contrast, really needs clearly defined units packaged in advance).
The students
I teach 9th and 10th grade English at an above-average suburban American public high school in a below-average state. Mine are the lower "required level" students at a school with high enrollment in honors and Advanced Placement classes. Generally speaking, this means my students are mostly not self-motivated, are only very weakly motivated by grades, and will not do anything school-related outside of class no matter how much it would be in their interest to do so. There are, of course, plenty of exceptions, and my students span an extremely wide range of ability and apathy levels.
The procedure
First, what I did not do. I did not make Anki decks, assign them to my students to study independently, and then quiz them on the content. With honors classes I taught in previous years I think that might have worked, but I know my current students too well. Only about 10% of them would have done it, and the rest would have blamed me for their failing grades—with some justification, in my opinion.
Instead, we did Anki together, as a class, nearly every day.
As initial setup, I created a separate Anki profile for each class period. With a third-party add-on for Anki called Zoom, I enlarged the display font sizes to be clearly legible on the interactive whiteboard at the front of my room.
Nightly, I wrote up cards to reinforce new material and integrated them into the deck in time for the next day's classes. This averaged about 7 new cards per lesson period.These cards came in many varieties, but the three main types were:
- concepts and terms, often with reversed companion cards, sometimes supplemented with "what is this an example of" scenario cards.
- vocabulary, 3 cards per word: word/def, reverse, and fill-in-the-blank example sentence
- grammar, usually in the form of "What change(s), if any, does this sentence need?" Alternative cards had different permutations of the sentence.
Weekly, I updated the deck to the cloud for self-motivated students wishing to study on their own.
Daily, I led each class in an Anki review of new and due cards for an average of 8 minutes per study day, usually as our first activity, at a rate of about 3.5 cards per minute. As each card appeared on the interactive whiteboard, I would read it out loud while students willing to share the answer raised their hands. Depending on the card, I might offer additional time to think before calling on someone to answer. Depending on their answer, and my impressions of the class as a whole, I might elaborate or offer some reminders, mnemonics, etc. I would then quickly poll the class on how they felt about the card by having them show a color by way of a small piece of card-stock divided into green, red, yellow, and white quadrants. Based on my own judgment (informed only partly by the poll), I would choose and press a response button in Anki, determining when we should see that card again.

[Data shown is from one of my five classes. We didn't start using Anki until a couple weeks into the school year.]
Opportunity costs
8 minutes is a significant portion of a 55 minute class period, especially for a teacher like me who fills every one of those minutes. Something had to give. For me, I entirely cut some varieties of written vocab reinforcement, and reduced the time we spent playing the team-based vocab/term review game I wrote for our interactive whiteboards some years ago. To a lesser extent, I also cut back on some oral reading comprehension spot-checks that accompany my whole-class reading sessions. On balance, I think Anki was a much better way to spend the time, but it's complicated. Keep reading.
Whole-class SRS not ideal
Every student is different, and would get the most out of having a personal Anki profile determine when they should see each card. Also, most individuals could study many more cards per minute on their own than we averaged doing it together. (To be fair, a small handful of my students did use the software independently, judging from Ankiweb download stats)
Getting student buy-in
Before we started using SRS I tried to sell my students on it with a heartfelt, over-prepared 20 minute presentation on how it works and the superpowers to be gained from it. It might have been a waste of time. It might have changed someone's life. Hard to say.
As for the daily class review, I induced engagement partly through participation points that were part of the final semester grade, and which students knew I tracked closely. Raising a hand could earn a kind of bonus currency, but was never required—unlike looking up front and showing colors during polls, which I insisted on. When I thought students were just reflexively holding up the same color and zoning out, I would sometimes spot check them on the last card we did and penalize them if warranted.
But because I know my students are not strongly motivated by grades, I think the most important influence was my attitude. I made it a point to really turn up the charm during review and play the part of the engaging game show host. Positive feedback. Coaxing out the lurkers. Keeping that energy up. Being ready to kill and joke about bad cards. Reminding classes how awesome they did on tests and assignments because they knew their Anki stuff.
(This is a good time to point out that the average review time per class period stabilized at about 8 minutes because I tried to end reviews before student engagement tapered off too much, which typically started happening at around the 6-7 minute mark. Occasional short end-of-class reviews mostly account for the difference.)
I also got my students more on the Anki bandwagon by showing them how this was directly linked reduced note-taking requirements. If I could trust that they would remember something through Anki alone, why waste time waiting for them to write it down? They were unlikely to study from those notes anyway. And if they aren't looking down at their paper, they'll be paying more attention to me. I better come up with more cool things to tell them!
Making memories
Everything I had read about spaced repetition suggested it was a great reinforcement tool but not a good way to introduce new material. With that in mind, I tried hard to find or create memorable images, examples, mnemonics, and anecdotes that my Anki cards could become hooks for, and to get those cards into circulation as soon as possible. I even gave this method a mantra: "vivid memory, card ready".
When a student during review raised their hand, gave me a pained look, and said, "like that time when...." or "I can see that picture of..." as they struggled to remember, I knew I had done well. (And I would always wait a moment, because they would usually get it.)
Baby cards need immediate love
Unfortunately, if the card wasn't introduced quickly enough—within a day or two of the lesson—the entire memory often vanished and had to be recreated, killing the momentum of our review. This happened far too often—not because I didn't write the card soon enough (I stayed really on top of that), but because it didn't always come up for study soon enough. There were a few reasons for this:
- We often had too many due cards to get through in one session, and by default Anki puts new cards behind due ones.
- By default, Anki only introduces 20 new cards in one session (I soon uncapped this).
- Some cards were in categories that I gave lower priority to.
Two obvious cures for this problem:
- Make fewer cards. (I did get more selective as the year went on.)
- Have all cards prepped ahead of time and introduce new ones at the end of the class period they go with. (For practical reasons, not the least of which was the fact that I didn't always know what cards I was making until after the lesson, I did not do this. I might able to next year.)
Days off suck
SRS is meant to be used every day. When you take weekends off, you get a backlog of due cards. Not only do my students take every weekend and major holiday off (slackers), they have a few 1-2 week vacations built into the calendar. Coming back from a week's vacation means a 9-day backlog (due to the weekends bookending it). There's no good workaround for students that won't study on their own. The best I could do was run longer or multiple Anki sessions on return days to try catch up with the backlog. It wasn't enough. The "caught up" condition was not normal for most classes at most points during the year, but rather something to aspire to and occasionally applaud ourselves for reaching. Some cards spent weeks or months on the bottom of the stack. Memories died. Baby cards emerged stillborn. Learning was lost.
Needless to say, the last weeks of the school year also had a certain silliness to them. When the class will never see the card again, it doesn't matter whether I push the button that says 11 days or the one that says 8 months. (So I reduced polling and accelerated our cards/minute rate.)
Never before SRS did I fully appreciate the loss of learning that must happen every summer break.
Triage
I kept each course's master deck divided into a few large subdecks. This was initially for organizational reasons, but I eventually started using it as a prioritizing tool. This happened after a curse-worthy discovery: if you tell Anki to review a deck made from subdecks, due cards from subdecks higher up in the stack are shown before cards from decks listed below, no matter how overdue they might be. From that point, on days when we were backlogged (most days) I would specifically review the concept/terminology subdeck for the current semester before any other subdecks, as these were my highest priority.
On a couple of occasions, I also used Anki's study deck tools to create temporary decks of especially high-priority cards.
Seizing those moments
Veteran teachers start acquiring a sense of when it might be a good time to go off book and teach something that isn't in the unit, and maybe not even in the curriculum. Maybe it's teaching exactly the right word to describe a vivid situation you're reading about, or maybe it's advice on what to do in a certain type of emergency that nearly happened. As the year progressed, I found myself humoring my instincts more often because of a new confidence that I can turn an impressionable moment into a strong memory and lock it down with a new Anki card. I don't even care if it will ever be on a test. This insight has me questioning a great deal of what I thought knew about organizing a curriculum. And I like it.
A lifeline for low performers
An accidental discovery came from having written some cards that were, it was immediately obvious to me, much too easy. I was embarrassed to even be reading them out loud. Then I saw which hands were coming up.
In any class you'll get some small number of extremely low performers who never seem to be doing anything that we're doing, and, when confronted, deny that they have any ability whatsoever. Some of the hands I was seeing were attached to these students. And you better believe I called on them.
It turns out that easy cards are really important because they can give wins to students who desperately need them. Knowing a 6th grade level card in a 10th grade class is no great achievement, of course, but the action takes what had been negative morale and nudges it upward. And it can trend. I can build on it. A few of these students started making Anki the thing they did in class, even if they ignored everything else. I can confidently name one student I'm sure passed my class only because of Anki. Don't get me wrong—he just barely passed. Most cards remained over his head. Anki was no miracle cure here, but it gave him and I something to work with that we didn't have when he failed my class the year before.
A springboard for high achievers
It's not even fair. The lowest students got something important out of Anki, but the highest achievers drank it up and used it for rocket fuel. When people ask who's widening the achievement gap, I guess I get to raise my hand now.
I refuse to feel bad for this. Smart kids are badly underserved in American public schools thanks to policies that encourage staff to focus on that slice of students near (but not at) the bottom—the ones who might just barely be able to pass the state test, given enough attention.
Where my bright students might have been used to high Bs and low As on tests, they were now breaking my scales. You could see it in the multiple choice, but it was most obvious in their writing: they were skillfully working in terminology at an unprecedented rate, and making way more attempts to use new vocabulary—attempts that were, for the most part, successful.
Given the seemingly objective nature of Anki it might seem counterintuitive that the benefits would be more obvious in writing than in multiple choice, but it actually makes sense when I consider that even without SRS these students probably would have known the terms and the vocab well enough to get multiple choice questions right, but might have lacked the confidence to use them on their own initiative. Anki gave them that extra confidence.
A wash for the apathetic middle?
I'm confident that about a third of my students got very little out of our Anki review. They were either really good at faking involvement while they zoned out, or didn't even try to pretend and just took the hit to their participation grade day after day, no matter what I did or who I contacted.
These weren't even necessarily failing students—just the apathetic middle that's smart enough to remember some fraction of what they hear and regurgitate some fraction of that at the appropriate times. Review of any kind holds no interest for them. It's a rerun. They don't really know the material, but they tell themselves that they do, and they don't care if they're wrong.
On the one hand, these students are no worse off with Anki than they would have been with with the activities it replaced, and nobody cries when average kids get average grades. On the other hand, I'm not ok with this... but so far I don't like any of my ideas for what to do about it.
Putting up numbers: a case study
For unplanned reasons, I taught a unit at the start of a quarter that I didn't formally test them on until the end of said quarter. Historically, this would have been a disaster. In this case, it worked out well. For five weeks, Anki was the only ongoing exposure they were getting to that unit, but it proved to be enough. Because I had given the same test as a pre-test early in the unit, I have some numbers to back it up. The test was all multiple choice, with two sections: the first was on general terminology and concepts related to the unit. The second was a much harder reading comprehension section.
As expected, scores did not go up much on the reading comprehension section. Overall reading levels are very difficult to boost in the short term and I would not expect any one unit or quarter to make a significant difference. The average score there rose by 4 percentage points, from 48 to 52%.
Scores in the terminology and concept section were more encouraging. For material we had not covered until after the pre-test, the average score rose by 22 percentage points, from 53 to 75%. No surprise there either, though; it's hard to say how much credit we should give to SRS for that.
But there were also a number of questions about material we had already covered before the pretest. Being the earliest material, I might have expected some degradation in performance on the second test. Instead, the already strong average score in that section rose by an additional 3 percentage points, from 82 to 85%. (These numbers are less reliable because of the smaller number of questions, but they tell me Anki at least "locked in" the older knowledge, and may have strengthened it.)
Some other time, I might try reserving a section of content that I teach before the pre-test but don't make any Anki cards for. This would give me a way to compare Anki to an alternative review exercise.
What about formal standardized tests?
I don't know yet. The scores aren't back. I'll probably be shown some "value added" analysis numbers at some point that tell me whether my students beat expectations, but I don't know how much that will tell me. My students were consistently beating expectations before Anki, and the state gave an entirely different test this year because of legislative changes. I'll go back and revise this paragraph if I learn anything useful.
Those discussions...
If I'm trying to acquire a new skill, one of the first things I try to do is listen to skilled practitioners of that skill talk about it to each other. What are the terms-of-art? How do they use them? What does this tell me about how they see their craft? Their shorthand is a treasure trove of crystallized concepts; once I can use it the same way they do, I find I'm working at a level of abstraction much closer to theirs.
Similarly, I was hoping Anki could help make my students more fluent in the subject-specific lexicon that helps you score well in analytical essays. After introducing a new term and making the Anki card for it, I made extra efforts to use it conversationally. I used to shy away from that because so many students would have forgotten it immediately and tuned me out for not making any sense. Not this year. Once we'd seen the card, I used the term freely, with only the occasional reminder of what it meant. I started using multiple terms in the same sentence. I started talking about writing and analysis the way my fellow experts do, and so invited them into that world.
Even though I was already seeing written evidence that some of my high performers had assimilated the lexicon, the high quality discussions of these same students caught me off guard. You see, I usually dread whole-class discussions with non-honors classes because good comments are so rare that I end up dejectedly spouting all the insights I had hoped they could find. But by the end of the year, my students had stepped up.
I think what happened here was, as with the writing, as much a boost in confidence as a boost in fluency. Whatever it was, they got into some good discussions where they used the terminology and built on it to say smarter stuff.
Don't get me wrong. Most of my students never got to that point. But on average even small groups without smart kids had a noticeably higher level of discourse than I am used to hearing when I break up the class for smaller discussions.
Limitations
SRS is inherently weak when it comes to the abstract and complex. No card I've devised enables a student to develop a distinctive authorial voice, or write essay openings that reveal just enough to make the reader curious. Yes, you can make cards about strategies for this sort of thing, but these were consistently my worst cards—the overly difficult "leeches" that I eventually suspended from my decks.
A less obvious limitation of SRS is that students with a very strong grasp of a concept often fail to apply that knowledge in more authentic situations. For instance, they may know perfectly well the difference between "there", "their", and "they're", but never pause to think carefully about whether they're using the right one in a sentence. I am very open to suggestions about how I might train my students' autonomous "System 1" brains to have "interrupts" for that sort of thing... or even just a reflex to go back and check after finishing a draft.
Moving forward
I absolutely intend to continue using SRS in the classroom. Here's what I intend to do differently this coming school year:
- Reduce the number of cards by about 20%, to maybe 850-950 for the year in a given course, mostly by reducing the number of variations on some overexposed concepts.
- Be more willing to add extra Anki study sessions to stay better caught-up with the deck, even if this means my lesson content doesn't line up with class periods as neatly.
- Be more willing to press the red button on cards we need to re-learn. I think I was too hesitant here because we were rarely caught up as it was.
- Rework underperforming cards to be simpler and more fun.
- Use more simple cloze deletion cards. I only had a few of these, but they worked better than I expected for structured idea sets like, "characteristics of a tragic hero".
- Take a less linear and more opportunistic approach to introducing terms and concepts.
- Allow for more impromptu discussions where we bring up older concepts in relevant situations and build on them.
- Shape more of my lessons around the "vivid memory, card ready" philosophy.
- Continue to reduce needless student note-taking.
- Keep a close eye on 10th grade students who had me for 9th grade last year. I wonder how much they retained over the summer, and I can't wait to see what a second year of SRS will do for them.
Suggestions and comments very welcome!
A self-experiment in training "noticing confusion"
I previously discussed the potential relevance of therapeutic and instructional models of metacognitive training to LW-style rationality skills. As an attempted concrete realization of what this connection could look like, I ran a self-experiment in which I counted instances of noticing confusion. Below I elaborate on the motivation and design of the experiment, then discuss some quantitative results and qualitative reflections.
Intuitive cooperation
This is an exposition of some of the main ideas in the paper Robust Cooperation. My goal is to make the ideas and proofs seem natural and intuitive - instead of some mysterious thing where we invoke Löb's theorem at the right place and the agents magically cooperate. Also I hope it is accessible to people without a math or CS background. Be warned, it is pretty cheesy ok.
In a small quirky town, far away from other cities or towns, the most exciting event is a game called (for historical reasons) The Prisoner's Dilemma. Everyone comes out to watch the big tournament at the end of Summer, and you (Alice) are especially excited because this year it will be your first time playing in the tournament! So you've been thinking of ways to make sure that you can do well.
The way the game works is this: Each player can choose to cooperate or defect with the other player. If you both cooperate, then you get two points each. If one of you defects, then that player will get three points, and the other player won't get any points. But if you both defect, then you each get only one point. You have to make your decisions separately, without communicating with each other - however, everyone is required to register the algorithm they will be using before the tournament, and you can look at the other player's algorithm if you want to. You also are allowed to use some outside help in your algorithm.

Now if you were a newcomer, you might think that no matter what the other player does, you can always do better by defecting. So the best strategy must be to always defect! Of course, you know better, if everyone tried that strategy, then they would end up defecting against each other, which is a shame since they would both be better off if they had just cooperated.
But how can you do better? You have to be able to describe your algorithm in order to play. You have a few ideas, and you'll be playing some practice rounds with your friend Bob soon, so you can try them out before the actual tournament.
Your first plan:
I'll cooperate with Bob if I can tell from his algorithm that he'll cooperate with me. Otherwise I'll defect.
For your first try, you'll just run Bob's algorithm and see if he cooperates. But there's a problem - if Bob tries the same strategy, he'll have to run your algorithm, which will run his algorithm again, and so on into an infinite loop!
So you'll have to be a bit more clever than that... luckily you know a guy, Shady, who is good at these kinds of problems.
You call up Shady, and while you are waiting for him to come over, you remember some advice your dad Löb gave you.
(Löb's theorem) "If someone says you can trust them on X, well then they'll just tell you X."
If (someone tells you If [I tell you] X, then X is true)
Then (someone tells you X is true)
(See The Cartoon Guide to Löb's Theorem[pdf] for a nice proof of this)
Here's an example:
Sketchy watch salesman: Hey, if I tell you these watches are genuine then they are genuine!
You: Ok... so are these watches genuine?
Sketchy watch salesman: Of course!
It's a good thing to remember when you might have to trust someone. If someone you already trust tells you you can trust them on something, then you know that something must be true.
On the other hand, if someone says you can always trust them, well that's pretty suspicious... If they say you can trust them on everything, that means that they will never tell you a lie - which is logically equivalent to them saying that if they were to tell you a lie, then that lie must be true. So by Löb's theorem, they will lie to you. (Gödel's second incompleteness theorem)
Despite his name, you actually trust Shady quite a bit. He's never told you or anyone else anything that didn't end up being true. And he's careful not to make any suspiciously strong claims about his honesty.
So your new plan is to ask Shady if Bob will cooperate with you. If so, then you will cooperate. Otherwise, defect. (FairBot)
It's game time! You look at Bob's algorithm, and it turns out he picked the exact same algorithm! He's going to ask Shady if you will cooperate with him. Well, the first step is to ask Shady, "will Bob cooperate with me?"
Shady looks at Bob's algorithm and sees that if Shady says you cooperate, then Bob cooperates. He looks at your algorithm and sees that if Shady says Bob cooperates, then you cooperate. Combining these, he sees that if he says you both cooperate, then both of you will cooperate. So he tells you that you will both cooperate (your dad was right!)
Let A stand for "Alice cooperates with Bob" and B stand for "Bob cooperates with Alice".
From looking at the algorithms, and
.
So combining these, .
Then by Löb's theorem, .
Since that means that Bob will cooperate, you decide to actually cooperate.
Bob goes through an analagous thought process, and also decides to cooperate. So you cooperate with each other on the prisoner's dilemma! Yay!
That night, you go home and remark, "it's really lucky we both ended up using Shady to help us, otherwise that wouldn't have worked..."
Your dad interjects, "Actually, it doesn't matter - as long as they were both smart enough to count, it would work. This doesn't just say 'I tell you X', it's stronger than that - it actually says 'Anyone who knows basic arithmetic will tell you X'. So as long as they both know a little arithmetic, it will still work - even if one of them is pro-axiom-of-choice, and the other is pro-axiom-of-life. The cooperation is robust." That's really cool!
But there's another issue you think of. Sometimes, just to be tricky, the tournament organizers will set up a game where you have to play against a rock. Yes, literally just a rock that holding the cooperate button down. If you played against a rock with your current algorithm, well you start by asking Shady if the rock will cooperate with you. Shady is like, "well yeah, duh." So then you cooperate too. But you could have gotten three points by defecting! You're missing out on a totally free point!
You think that it would be a good idea to make sure the other player isn't a complete idiot before you cooperate with them. How can you check? Well, let's see if they would cooperate with a rock placed on the defect button (affectionately known as 'DefectRock'). If they know better than that, and they will cooperate with you, then you will cooperate with them.
The next morning, you excitedly tell Shady about your new plan. "It will be like before, except this time, I also ask you if the other player will cooperate with DefectRock! If they are dumb enough to do that, then I'll just defect. That way, I can still cooperate with other people who use algorithms like this one, or the one from before, but I can also defect and get that extra point when there's just a rock on cooperate."
Shady get's an awkward look on his face, "Sorry, but I can't do that... or at least it wouldn't work out the way you're thinking. Let's say you're playing against Bob, who is still using the old algorithm. You want to know if Bob will cooperate with DefectRock, so I have to check and see if I'll tell Bob that DefectRock will cooperate with him. I would have say I would never tell Bob that DefectRock will cooperate with him. But by Löb's theorem, that means I would tell you this obvious lie! So that isn't gonna work."
Notation, if X cooperates with Y in the prisoner's dilemma (or = D if not).
You ask Shady, does ?
Bob's algorithm: only if
.
So to say , we would need
.
This is equivalent to , since
is an obvious lie.
By Löb's theorem, , which is a lie.
<Extra credit: does the fact that Shady is the one explaining this mean you can't trust him?>
<Extra extra credit: find and fix the minor technical error in the above argument.>
Shady sees the dismayed look on your face and adds, "...but, I know a guy who can vouch for me, and I think maybe that could make your new algorithm work."
So Shady calls his friend T over, and you work out the new details. You ask Shady if Bob will cooperate with you, and you ask T if Bob will cooperate with DefectRock. So T looks at Bob's algorithm, which asks Shady if DefectRock will cooperate with him. Shady, of course, says no. So T sees that Bob will defect against DefectRock, and lets you know. Like before, Shady tells you Bob will cooperate with you, and thus you decide to cooperate! And like before, Bob decides to cooperate with you, so you both cooperate! Awesome! (PrudentBot)
If Bob is using your new algorithm, you can see that the same argument goes through mostly unchanged, and that you will still cooperate! And against a rock on cooperate, T will tell you that it will cooperate with DefectRock, so you can defect and get that extra point! This is really great!!
(ok now it's time for the really cheesy ending)
It's finally time for the tournament. You have a really good feeling about your algorithm, and you do really well! Your dad is in the audience cheering for you, with a really proud look on his face. You tell your friend Bob about your new algorithm so that he can also get that extra point sometimes, and you end up tying for first place with him!
A few weeks later, Bob asks you out, and you two start dating. Being able to cooperate with each other robustly is a good start to a healthy relationship, and you live happily ever after!
The End.
Three questions about source code uncertainty
In decision theory, we often talk about programs that know their own source code. I'm very confused about how that theory applies to people, or even to computer programs that don't happen to know their own source code. I've managed to distill my confusion into three short questions:
1) Am I uncertain about my own source code?
2) If yes, what kind of uncertainty is that? Logical, indexical, or something else?
3) What is the mathematically correct way for me to handle such uncertainty?
Don't try to answer them all at once! I'll be glad to see even a 10% answer to one question.
[moderator action] Eugine_Nier is now banned for mass downvote harassment
As previously discussed, on June 6th I received a message from jackk, a Trike Admin. He reported that the user Jiro had asked Trike to carry out an investigation to the retributive downvoting that Jiro had been subjected to. The investigation revealed that the user Eugine_Nier had downvoted over half of Jiro's comments, amounting to hundreds of downvotes.
I asked the community's guidance on dealing with the issue, and while the matter was being discussed, I also reviewed previous discussions about mass downvoting and looked for other people who mentioned being the victims of it. I asked Jack to compile reports on several other users who mentioned having been mass-downvoted, and it turned out that Eugine was also overwhelmingly the biggest downvoter of users David_Gerard, daenarys, falenas108, ialdabaoth, shminux, and Tenoke. As this discussion was going on, it turned out that user Ander had also been targeted by Eugine.
I sent two messages to Eugine, requesting an explanation. I received a response today. Eugine admitted his guilt, expressing the opinion that LW's karma system was failing to carry out its purpose of keeping out weak material and that he was engaged in a "weeding" of users who he did not think displayed sufficient rationality.
Needless to say, it is not the place of individual users to unilaterally decide that someone else should be "weeded" out of the community. The Less Wrong content deletion policy contains this clause:
Harrassment of individual users.
If we determine that you're e.g. following a particular user around and leaving insulting comments to them, we reserve the right to delete those comments. (This has happened extremely rarely.)
Although the wording does not explicitly mention downvoting, harassment by downvoting is still harassment. Several users have indicated that they have experienced considerable emotional anguish from the harassment, and have in some cases been discouraged from using Less Wrong at all. This is not a desirable state of affairs, to say the least.
I was originally given my moderator powers on a rather ad-hoc basis, with someone awarding mod privileges to the ten users with the highest karma at the time. The original purpose for that appointment was just to delete spam. Nonetheless, since retributive downvoting has been a clear problem for the community, I asked the community for guidance on dealing with the issue. The rough consensus of the responses seemed to authorize me to deal with the problem as I deemed appropriate.
The fact that Eugine remained quiet about his guilt until directly confronted with the evidence, despite several public discussions of the issue, is indicative of him realizing that he was breaking prevailing social norms. Eugine's actions have worsened the atmosphere of this site, and that atmosphere will remain troubled for as long as he is allowed to remain here.
Therefore, I now announce that Eugine_Nier is permanently banned from posting on LessWrong. This decision is final and will not be changed in response to possible follow-up objections.
Unfortunately, it looks like while a ban prevents posting, it does not actually block a user from casting votes. I have asked jackk to look into the matter and find a way to actually stop the downvoting. Jack indicated earlier on that it would be technically straightforward to apply a negative karma modifier to Eugine's account, and wiping out Eugine's karma balance would prevent him from casting future downvotes. Whatever the easiest solution is, it will be applied as soon as possible.
EDIT 24 July 2014: Banned users are now prohibited from voting.
Separating the roles of theory and direct empirical evidence in belief formation: the examples of minimum wage and anthropogenic global warming
I recently asked two questions on Quora with similar question structures, and the similarities and differences between the responses were interesting.
Question #1: Anthropogenic global warming, the greenhouse effect, and the historical weather record
I asked the question here. Question statement:
If you believe in Anthropogenic Global Warming (AGW), to what extent is your belief informed by the theory of the greenhouse effect, and to what extent is it informed by the historical temperature record?
In response to some comments, I added the following question details:
Due to length limitations, the main question is a bit simplistically framed. But what I'm really asking for is the relative importance of theoretical mechanisms and direct empirical evidence. Theoretical mechanisms are of course also empirically validated, but the empirical validation could occur in different settings.
For instance, the greenhouse effect is a mechanism, and one may get estimates of the strength of the greenhouse effect based on an understanding of the underlying physics or by doing laboratory experiments or simulations.
Direct empirical evidence is evidence that is as close to the situation we are trying to predict as possible. In this case, it would involve looking at the historical records of temperature and carbon dioxide concentrations, and perhaps some other confounding variables whose role needs to be controlled for (such as solar activity).
Saying that your belief is largely grounded in direct empirical evidence is basically saying that just looking at the time series of temperature, carbon dioxide concentrations and the other variables can allow one to say with fairly high confidence (starting from very weak priors) that increased carbon dioxide concentrations, due to human activity, are responsible for temperature increases. In other words, if you ran a regression and tried to do the usual tricks to infer causality, carbon dioxide would come out as the culprit.
Saying that your belief is largely grounded in theory is basically saying that the science of the greenhouse effect is sufficiently convincing that the historical temperature and weather record isn't an important factor in influencing your belief: if it had come out differently, you'd probably just have thought the data was noisy or wrong and wouldn't update away from believing in the AGW thesis.
I also posted to Facebook here asking my friends about the pushback to my use of the term "belief" in my question.
Question #2: Effect of increase in the minimum wage on unemployment
I asked the question here. Question statement:
If you believe that raising the minimum wage is likely to increase unemployment, to what extent is your belief informed by the theory of supply and demand and to what extent is it informed by direct empirical evidence?
I added the following question details:
By "direct empirical evidence" I am referring to empirical evidence that directly pertains to the relation between minimum wage raises and employment level changes, not empirical evidence that supports the theory of supply and demand in general (because transferring that to the minimum wage context would require one to believe the transferability of the theory).
Also, when I say "believe that raising the minimum wage is likely to increase unemployment" I am talking about minimum wage increases of the sort often considered in legislative measures, and by "likely" I just mean that it's something that should always be seriously considered whenever a proposal to raise the minimum wage is made. The belief would be consistent with believing that in some cases minimum wage raises have no employment effects.
I also posted the question to Facebook here.
Similarities between the questions
The questions are structurally similar, and belong to a general question type of considerable interest to the LessWrong audience. The common features to the questions:
- In both cases, there is a theory (the greenhouse effect for Question #1, and supply and demand for Question #2) that is foundational to the domain and is supported through a wide range of lines of evidence.
- In both cases, the quantitative specifics of the extent to which the theory applies in the particular context are not clear. There are prima facie plausible arguments that other factors may cancel out the effect and there are arguments for many different effect sizes.
- In both cases, people who study the broad subject (climate scientists for Question #1, economists for Question #2) are more favorably disposed to the belief than people who do not study the broad subject.
- In both cases, a significant part of the strength of belief of subject matter experts seems to be their belief in the theory. The data, while consistent with the theory, does not seem to paint a strong picture in isolation. For the minimum wage, consider the Card and Krueger study. Bryan Caplan discusses how Bayesian reasoning with strong theoretical priors can lead one to continue believing that minimum wage increases cause unemployment to rise, without addressing Card and Krueger at the object level. For the case of anthropogenic global warming, consider the draft by Kesten C. Green (addressing whether a warming-based forecast has higher forecast accuracy than a no-change forecast) or the paper AGW doesn't cointegrate by Beenstock, Reingewertz, and Paldor (addressing whether, looking at the data alone, we can get good evidence that carbon dioxide concentration increases are linked with temperature increases).
- In both cases, outsiders to the domain, who nonetheless have expertise in other areas that one might expect gives them insight into the question, are often more skeptical of the belief. A number of weather forecasters, physicists, and forecasting experts are skeptical of long-range climate forecasting or confident assertions about anthropogenic global warming. A number of sociologists, lawyers, and politicians often are disparaging of the belief that minimum wage increases cause unemployment levels to rise. The criticism is similar: namely, that a basically correct theory is being overstretched or incorrectly applied to a situation that is too complex, is similar.
- In both cases, the debate is somewhat politically charged, largely because one's beliefs here affect one's views of proposed legislation (climate change mitigation legislation and minimum wage increase legislation). The anthropogenic global warming belief is more commonly associated with environmentalists, social democrats, and progressives, and (in the United States) with Democrats, whereas opposition to it is more common among conservatives and libertarians. The minimum wage belief is more commonly associated with free market views and (in the United States) with conservatives and Republicans, and opposition to it is more common among progressives and social democrats.
Looking for help
I'm interested in thoughts from the people here on these questions:
- Thoughts on the specifics of Question #1 and Question #2.
- Other possible questions in the same reference class (where a belief arises from a mix of theory and data, and the theory plays a fairly big role in driving the belief, while the data on its own is very ambiguous).
- Other similarities between Question #1 and Question #2.
- Ways that Question #1 and Question #2 are disanalogous.
- General thoughts on how this relates to Bayesian reasoning and other modes of belief formation based on a combination of theory and data.
Raven paradox settled to my satisfaction
The raven paradox, originated by Carl Gustav Hempel, is an apparent absurdity of inductive reasoning. Consider the hypothesis:
H1: All ravens are black.
Inductively, one might expect that seeing many black ravens and no non-black ones is evidence for this hypothesis. As you see more black ravens, you may even find it more and more likely.
Logically, a statement is equivalent to its contrapositive (where you negate both things and flip the order). Thus if "if it is a raven, it is black" is true, so is:
H1': If it is not black, it is not a raven.
Take a moment to double-check this.
Inductively, just like with H1, one would expect that seeing many non-black non-ravens is evidence for this hypothesis. As you see more and more examples, you may even find it more and more likely. Thus a yellow banana is evidence for the hypothesis "all ravens are black."
Since this is silly, there is an apparent problem with induction.
Resolution
Consider the following two possible states of the world:

Suppose that these are your two hypotheses, and you observe a yellow banana (drawing from some fixed distribution over things). Q: What does this tell you about one hypothesis versus another? A: It tells you bananas-all about the number of black ravens.
One might contrast this with a hypothesis where there is one less banana, and one more yellow raven, by some sort of spontaneous generation.

Observations of both black ravens and yellow bananas cause us to prefer 1 over 3, now!
The moral of the story is that the amount of evidence that an observation provides is not just about whether it whether it is consistent with the "active" hypothesis - it is about the difference in likelihood between when the hypothesis is true versus when it's false.
This is a pretty straightforward moral - it's a widely known pillar of statistical reasoning. But its absence in the raven paradox takes a bit of effort to see. This is because we're using an implicit model of the problem (driven by some combination of outside knowledge and framing effects) where nonblack ravens replace black ravens, but don't replace bananas. The logical statements H1 and H1' are not alone enough to tell how you should update upon seeing new evidence. Or to put it another way, the version of induction that drives the raven paradox is in fact wrong, but probability theory implies a bigger version.
(Technical note: In the hypotheses above, the exact number of yellow bananas does not have to be the same for observing a yellow banana to provide no evidence - what has to be the same is the measure of yellow bananas in the probability distribution we're drawing from. Talking about "99 ravens" is more understandable, but what differentiates our hypotheses are really the likelihoods of observing different events [there's our moral again]. This becomes particularly important when extending the argument to infinite numbers of ravens - infinities or no infinities, when you make an observation you're still drawing from some distribution.)
Fair Division of Black-Hole Negentropy: an Introduction to Cooperative Game Theory
Non-cooperative game theory, as exemplified by the Prisoner’s Dilemma and commonly referred to by just "game theory", is well known in this community. But cooperative game theory seems to be much less well known. Personally, I had barely heard of it until a few weeks ago. Here’s my attempt to give a taste of what cooperative game theory is about, so you can decide whether it might be worth your while to learn more about it.
The example I’ll use is the fair division of black-hole negentropy. It seems likely that for an advanced civilization, the main constraining resource in the universe is negentropy. Every useful activity increases entropy, and since entropy of the universe as a whole never decreases, the excess entropy produced by civilization has to be dumped somewhere. A black hole is the only physical system we know whose entropy grows quadratically with its mass, which makes it ideal as an entropy dump. (See http://weidai.com/black-holes.txt where I go into a bit more detail about this idea.)
Let’s say there is a civilization consisting of a number of individuals, each the owner of some matter with mass mi. They know that their civilization can’t produce more than (∑ mi)2 bits of total entropy over its entire history, and the only way to reach that maximum is for every individual to cooperate and eventually contribute his or her matter into a common black hole. A natural question arises: what is a fair division of the (∑ mi)2 bits of negentropy among the individual matter owners?
Fortunately, Cooperative Game Theory provides a solution, known as the Shapley Value. There are other proposed solutions, but the Shapley Value is well accepted due to its desirable properties such as “symmetry” and “additivity”. Instead of going into the theory, I’ll just show you how it works. The idea is, we take a sequence of players, and consider the marginal contribution of each player to the total value as he or she joins the coalition in that sequence. Each player is given an allocation equal to his or her average marginal contribution over all possible sequences.
Dissolving the Thread of Personal Identity
(Background: I got interested in anthropics about a week ago. It has tormented my waking thoughts ever since in a cycle of “be confused, develop idea, work it out a bit, realize that it fails, repeat” and it is seriously driving me berserk by this point. While drawing a bunch of “thread of personal continuity” diagrams to try to flesh out my next idea, I suspected that it was a fairly nonsensical idea, came up with a thought experiment that showed it was definitely a nonsensical idea, realized I was trying to answer the question “Is there any meaningful sense in which I can expect to wake up as myself tomorrow, rather than Brittany Spears?”, kept thinking anyways for about an hour, and eventually came up with this possible reduction of personal identity over time. It differs somewhat from Kaj Sotala’s. And I still have no idea what the hell to do about anthropics, but I figured I should write up this intermediate result. It takes the form of a mental dialogue with myself, because that’s what happened.)
Doubt: Hang on, this whole notion of “thread of personal continuity” looks sort of fishy. Self, can you try to clarify what it is?
Self: Let’s see… I have a causal link to my past and future self, and this causal link is the thread of personal identity!
Current Me: Please notice Past Self’s use of the cached thought from “Timeless Identity” even though it doesn’t fit.
Doubt: Causal links can’t possibly be the thread of personal continuity. Your state at time t+1 is not just caused by your state at time t, lots of events in your surroundings also cause the t+1 state as well. A whole hell of a lot of stuff has a causal link to you. That can’t possibly be it. And when you die, alive you has a causal link to dead you.
Doubt: And another thing, personal continuity isn’t just an on-off thing. There’s a gradient to it.
Self: What do you mean?
Doubt: Let’s say you get frozen by cryonics, and then revived a century later.
Self: Sure.
Doubt: Let’s say you know that you will be revived with exactly the same set of memories, preferences, thought patterns, etc, that you have currently. As you are beginning the process, what is your subjective credence that you will wake up a century later?
Self: Fairly close to 1.
Doubt: Now, let’s say they could recover all the information from your brain except your extreme love for chocolate, so when your brain is restored, they patch in a generic average inclination for chocolate. What is your subjective credence that you will wake up a century later?
Self: Fairly close to 1.
Doubt: Let’s say that all your inclinations and thought patterns and other stuff will be restored fully, but they can’t bring back memories. You will wake up with total amnesia. What is your… you get the idea.
Self: Oh crap. I… I really don’t know. 0.6??? But then again, this is the situation that several real-life people have found themselves in… Huh.
Doubt: For this one, inclinations and thought patterns and many of your memories are unrecoverable, so when your brain is restored, you only have a third of your memories, a strong belief that you are the same person that was cryopreserved, and a completely different set of… everything else except for the memories and the belief in personal continuity. P(I wake up a century later)?
Self: Quite low. ~0.1.
Self: But I see your point. For that whole personal identity/waking up as yourself thing, it isn’t a binary trait, it’s a sliding scale of belief that I’ll keep on existing which depends on the magnitude of the difference between myself and the being that wakes up. If upload!me were fed through a lossy compression algorithm and then reconstructed, my degree of belief in continuing to exist would depend on how lossy it was.
Doubt: Now you realize that the “thread of subjective experience” doesn’t actually exist. There are just observer-moments. What would it even mean for something to have a “thread of subjective experience”?
Self: (Taps into intuition) What about that big rock over there? Forget “subjective”, that rock has a “thread of existence”. That rock will still be the same rock if it is moved 3 feet to the left, that rock will still be the same rock if a piece of it is chipped off, that rock will still be the same rock if it gets covered in moss, but that rock will cease to be a rock if a nuke goes off, turning it into rock vapor! I don’t know what the hell the “thread of existence” is, but I know it has to work like that rock!!
Doubt: So you’re saying that personal identity over time works like the Ship of Theseus?
Self: Exactly! We’ve got a fuzzy category, like “this ship” or “this rock” or “me”, and there’s stuff that we know falls in the category, stuff that we know doesn’t fall in the category, and stuff for which we aren’t sure whether it falls in the category! And the thing changes over time, and as long as it stays within certain bounds, we will still lump it into the same category.
Doubt: Huh. So this “thread of existence” comes from the human tendency to assign things into fuzzy categories. So when a person goes to sleep at night, they know that in the morning, somebody extremely similar to themselves will be waking up, and that somebody falls into the fuzzy cluster that the person falling asleep labels “I”. As somebody continues through life, they know that two minutes from now, there will be a person that is similar enough to fall into the “I” cluster.
Doubt: But there’s still a problem. 30yearfuture!me will probably be different enough from present!me to fall outside the “I” category. If I went to sleep, and I knew that 30yearfuture!me woke up, I’d consider that to be tantamount to death. The two of us would share only a fraction of our memories, and he would probably have a different set of preferences, values, and thought patterns. How does this whole thing work when versions of yourself further out than a few years from your present self don’t fall in the “I” cluster in thingspace?
Self: That’s not too hard. The “I” cluster shifts over time as well. If you compare me at time t and me at time t+1, they would both fall within the “I” cluster at time t, but the “I” cluster of time t+1 is different enough to accommodate “me” at time t+2. It’s like this rock.
Doubt: Not the rock again.
Self: Quiet. If you had this rock, and 100yearfuture!thisrock side by side, they would probably not be recognizable as the same rock, but there is a continuous series of intermediates leading from one to the other, each of which would be recognizable as the same rock as its immediate ancestors and descendants.
Self: If there is a continuous series of intermediates that doesn’t happen too fast, leading from me to something very nonhuman, I will anticipate eventually experiencing what the nonhuman thing does, while if there is a discontinuous jump, I won’t anticipate experiencing anything at all.
Doubt: Huh.
Self: So that’s where the feeling of the “thread of personal identity” comes from. We have a fuzzy category labeled “I”, anticipate experiencing the sorts of things that probable future beings who fall in that category will experience, and in everyday life, there aren’t fast jumps to spots outside of the “I” category, so it feels like you’ve stayed in the same category the whole time.
Doubt: You’ll have to unpack “anticipate experiencing the sorts of things that probable future beings who fall in that category will experience”. Why?
Self: Flippant answer: If we didn’t work that way, evolution would have killed us a long time ago. Actual answer: Me at time t+1 experiences the same sorts of things as me at time t anticipated, so when me at time t+1 anticipates that me at time t+2 will experience something, it will probably happen. Looking backwards, anticipations of past selves frequently match up with the experiences of slightly-less-past selves, so looking forwards, the anticipations of my current self are likely to match up with the experiences of the future being who falls in the “I” category.
Doubt: Makes sense.
Self: You’ll notice that this also defuses the anthropic trilemma (for humans, at least). There is a 1 in a billion chance of the quantum random number generator generating the winning lottery ticket. But then a trillion copies are made, but you at time (right after the generator returned the winning number) has a trillion expected near-future beings who fall within the “I” category, so the 1 in a billion probability is split up a trillion ways among all of them. P(loser) is about 1, P(specific winner clone) is 1 in a quintillion. All the specific winner clones are then merged, and since a trillion different hypotheses each with a 1 in a quintillion probability all predict the same series of observed future events from time(right after you merge) onwards, P(series of experiences following from winning the quantum lottery) is 1 in a billion.
Doubt: Doesn’t this imply that anthropic probabilities depend on how big a boundary the mind draws around stuff it considers “I”?
Self: Yes. Let’s say we make 2 copies of a mind, and a third “copy” produced by running the mind through a lossy compression algorithm, and uncompressing it. A blue screen will be shown to one of the perfect mind copies (which may try to destroy it). A mind that considered the crappy copy to fall in the “I” category would predict a 1/3 chance of seeing the blue screen, while a mind that only considers near-perfect copies of itself as “I” would predict a 1/2 chance of seeing the blue screen, because the mind with the broad definition of “I” seriously considers the possibility of waking up as the crappy copy, while the mind with the narrow definition of “I” doesn’t.
Doubt: This seems to render probability useless.
Self: It means that probabilities of the form (I will observe X) are mind-dependent. Different minds given the same data will disagree on the probability of that statement, because they have different reference classes for the word “I”. Probabilities of the form (reality works like X)… to be honest, I don’t know. Anthropics is still extremely aggravating. I haven’t figured out the human version of anthropics (using the personal continuity notion) yet, I especially haven’t figured out how it’s going to work if you have a AI which doesn’t assign versions of itself to a fuzzy category labeled “I”, and I’m distrustful of how UDT seems like it’s optimizing over the entire tegmark 4 multiverse when there’s a chance that our reality is the only one there is, in which case it seems like you’d need probabilities of the form (reality works like X) and some way to update far away from the Boltzmann Brain hypothesis. This above section may be confused or flat-out wrong.
Can noise have power?
One of the most interesting debates on Less Wrong that seems like it should be definitively resolvable is the one between Eliezer Yudkowsky, Scott Aaronson, and others on The Weighted Majority Algorithm. I'll reprint the debate here in case anyone wants to comment further on it.
In that post, Eliezer argues that "noise hath no power" (read the post for details). Scott disagreed. He replied:
...Randomness provably never helps in average-case complexity (i.e., where you fix the probability distribution over inputs) -- since given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.
On the other hand, if you care about the worst-case running time, then there are settings (such as query complexity) where randomness provably does help. For example, suppose you're given n bits, you're promised that either n/3 or 2n/3 of the bits are 1's, and your task is to decide which. Any deterministic strategy to solve this problem clearly requires looking at 2n/3 + 1 of the bits. On the other hand, a randomized sampling strategy only has to look at O(1) bits to succeed with high probability.
Whether randomness ever helps in worst-case polynomial-time computation is the P versus BPP question, which is in the same league as P versus NP. It's conjectured that P=BPP (i.e., randomness never saves more than a polynomial). This is known to be true if really good pseudorandom generators exist, and such PRG's can be constructed if certain problems that seem to require exponentially large circuits, really do require them (see this paper by Impagliazzo and Wigderson). But we don't seem close to proving P=BPP unconditionally.
Eliezer replied:
Scott, I don't dispute what you say. I just suggest that the confusing term "in the worst case" be replaced by the more accurate phrase "supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated 'random'".
Scott replied:
I often tell people that theoretical computer science is basically mathematicized paranoia, and that this is the reason why Israelis so dominate the field. You're absolutely right: we do typically assume the environment is an adversarial superintelligence. But that's not because we literally think it is one, it's because we don't presume to know which distribution over inputs the environment is going to throw at us. (That is, we lack the self-confidence to impose any particular prior on the inputs.) We do often assume that, if we generate random bits ourselves, then the environment isn't going to magically take those bits into account when deciding which input to throw at us. (Indeed, if we like, we can easily generate the random bits after seeing the input -- not that it should make a difference.)
Average-case analysis is also well-established and used a great deal. But in those cases where you can solve a problem without having to assume a particular distribution over inputs, why complicate things unnecessarily by making such an assumption? Who needs the risk?
And later added:
...Note that I also enthusiastically belong to a "derandomize things" crowd! The difference is, I think derandomizing is hard work (sometimes possible and sometimes not), since I'm unwilling to treat the randomness of the problems the world throws at me on the same footing as randomness I generate myself in the course of solving those problems. (For those watching at home tonight, I hope the differences are now reasonably clear...)
Eliezer replied:
I certainly don't say "it's not hard work", and the environmental probability distribution should not look like the probability distribution you have over your random numbers - it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming "worst case". Optimizing for the worst case in environments that aren't actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.
I would defend the following sort of statement: While often it's not worth the computing power to take advantage of all the believed-in regularity of your probability distribution over the environment, any environment that you can't get away with treating as effectively random, probably has enough structure to be worth exploiting instead of randomizing.
(This isn't based on career experience, it's how I would state my expectation given my prior theory.)
Scott replied:
> "once you know what your probability distribution is..."
I'd merely stress that that's an enormous "once." When you're writing a program (which, yes, I used to do), normally you have only the foggiest idea of what a typical input is going to be, yet you want the program to work anyway. This is not just a hypothetical worry, or something limited to cryptography: people have actually run into strange problems using pseudorandom generators for Monte Carlo simulations and hashing (see here for example, or Knuth vol 2).
Even so, intuition suggests it should be possible to design PRG's that defeat anything the world is likely to throw at them. I share that intuition; it's the basis for the (yet-unproved) P=BPP conjecture.
"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin." --von Neumann
And that's where the debate drops off, at least between Eliezer and Scott, at least on that thread.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)