From this 2001 article:

Eric Horvitz... feels bad about [Microsoft Office's Clippy]... many people regard the paperclip as annoyingly over-enthusiastic, since it appears without warning and gets in the way.

To be fair, that is not Dr Horvitz's fault. Originally, he programmed the paperclip to use Bayesian decision-making techniques both to determine when to pop up, and to decide what advice to offer...

The paperclip's problem is that the algorithm... that determined when it should appear was deemed too cautious. To make the feature more prominent, a cruder non-Bayesian algorithm was substituted in the final product, so the paperclip would pop up more often.

Ever since, Dr Horvitz has wondered whether he should have fought harder to keep the original algorithm.

I, at least, found this amusing.

New to LessWrong?

New Comment
46 comments, sorted by Click to highlight new comments since: Today at 7:32 PM

I wonder why they didn't just use the same algorithm, but just make him less cautious. For example, instead of commenting when he's 90% sure he knows what he's talking about, make him do it when he's 75% sure.

decision makers rarely know any math IME.

My main problem with Clippy was not how often it would appear, but that it used a modal dialog, meaning that you cannot just ignore it, you have to explicitly dismiss it before continuing to work. It's the difference between your cat rubbing against your leg and it jumping on your keyboard. One is cute, the other is irritating when happening repeatedly.

It was designed to invoke attributions of agency. It had more than its share of literalist computer misunderstandings, but unlike the box whose malfunctions were as unintended as the weather Clippy was obnoxious.

Umm... English, please?

Clippy has googly eyes, so when it doesn't understand what I want and blocks me from continuing my task, I think it's an earnest obstructionist idiot. My computer does not have googly eyes so I don't get angry at it because I don't feel it is a person. Also, when it doesn't understand what I want it's not designed to block me from trying things,

I'm not convinced. When web sites throw up interstitial or pop-over ads that block me from doing what I'm trying to do, I think I respond to them as if an agent is deliberately interfering with my plans; even if those interfering ads do not have facial features or expressions.

But that may be because I recognize that some programmer or web designer has chosen to put those obstacles in my way; I curse the human agent responsible, not the pop-up itself. To distinguish the two cases, we'd need to find out whether people without extensive programming or systems experience do feel more irritated at modal pop-ups with facial features, than at modal pop-ups that do not have facial features.

I think this is more of a reflection of computer literacy.

When my computer does something wrong, I assume its my fault. When my parent's computers do something wrong, they complain at it. Pretty uselessly.

I guess Clippy just makes people who don't normally over-anthropomorphize computers over-anthropomorphize enough to get annoyed.

Well, you often are not doing anything at all wrong when Clippy gets in your face. Its annoying because it reliably fails, until I disable it.

True. Not only does Clippy not do what you want it to, it gets in the way of your doing things until you deal with it, and shows up at bad times.

Amusing, but I am embarrassed that this so highly voted (which I am attributing to this being written by luke).

Why shouldn't it be highly voted? When you're talking to a random outsider, and want to demonstrate the usefulness of bayesian techniques, using the example of clippy is a funny, and interesting, way to make your point.

As such, this is a valuable contribution for anyone who might, at some point, want to convert someone to bayesian techniques.

Given that it takes very little time to read, this means that it's value:time ratio is very good. As it is a discussion post, rather than a main post, this is sufficient justification to upvote it.*

*(with a main post I'd also expect a significant amount of content)

Personally, I think it's a little disturbing that the post's karma fell from ~35 to ~25 since you posted this. I would have thought LWers generally put more thought into their votes than that.

(A consistent effect. Comments about karma are powerful. People seem rather malleable.)

[-][anonymous]12y-10

This is a really good point, so I upvoted it!

I withdrew my vote after jsalvatier made his comment. It made me think more about it, and the fact that I have a general problem with voting too often for things that are funny instead of things that genuinely help the signal to noise ratio. I also saw the extremely high total vote as worrisome. If the vote had at the time been +10 or +15 or so I might not have felt as much of a need to withdraw my vote.

I suspect that similar thought processes occurred with other people.

Yes, this is what I assumed had happened and was commenting on. Maybe I just pay too much attention to karma because I'm green as grass, but I don't think I've ever cast a vote without thinking through what I find valuable about the post and how that compares to its current total. The fact that apparently a lot of people cast what I would call impulse votes is making me reevaluate exactly what it is that 'karma' is measuring.

Edit: Oh, I just realized - the anti-kibitzer hides karma scores as well as usernames. Probably there's a large subset of voters who don't and can't take relative totals into account until someone comments on it.

Of course, if someone considers that a good reason to reverse their vote I don't know why they would be using the anti-kibitzer in the first place.

I still don't think anyone here should feel good about paying attention to current total while deciding whether to upvote or downvote. Share evidence, not conclusions. The net karma a comment ends up at should be the result of aggregating our valuations, not a result of, say, whether those who thought it should be at +100 voted before or after those who thought it should be at +2.

Edit: it's clear to me now that I don't have a good solution to my perceived problem.

It seems to me that your suggested policy would result in comment-placement effects being even stronger than they are now. What score should a comment end up with if 50 people consider voting on it and they all think it should have a score of +2?

I communicated poorly. I don't think "should have a score of +2" should enter into the decision to upvote, downvote, or not vote. Instead, I'd rather voting algorithms which, when implemented individually, have results which can be meaningfully summed. For example, suppose everyone upvotes exactly when they think a comment is in the top 5% of comments in "everyone should read this" ordering and downvotes for the bottom 5%. Then the sum reflects the number of people who read the comment x (the average percentage of people who thought it was in the top 5% - bottom 5%). That's something I can understand.

If I think a comment should end up with a score of +2, too bad, I have no direct way of controlling that. The resulting score is a reflection of the community's votes, not something I try to game by altering my voting decision based on whether the score gets closer to +2.

I mean, do people downvote comments that they would have otherwise not voted on if they think the comment has too many upvotes? If not, why do they decline to upvote when they otherwise would have upvoted? The two look the same from everyone else's perspective, right?

I'm not saying that your proposed algorithm is wrong - not exactly, anyway. I am pointing out something that I think is a flaw.

Putting the same point a different way:

Consider two comments. One is posted early, and is seen by 50 people. It's slightly good - good enough that each of those people would, by your algorithm, upvote it, but no better than that. The other is posted late, and is only seen by 10 people, but it's very, very good. According to your algorithm, the first one would get a score of +50 and the second one would get a score of +10. By the methods currently in use, the first one will get a low score - probably +1 or +2 - and the second one will still get +10.

The first comment got many more points than the second, by your algorithm, because its author was able to quickly put together something good enough to be upvoteable, and because they were at the right place at the right time to post it early in the conversation, which implies either luck or lots of time spent lurking on LW. I don't think these are things we want to incentivise - at least not more than we want to incentivise putting time into crafting well-thought-out comments.

Also:

... do people downvote comments that they would have otherwise not voted on if they think the comment has too many upvotes?

I do this. Not very often, but it happens.

You're right. Reviewing my feelings on this I discovered that my main "ugh, that's terrible" feeling comes from the observation that a correlated set of people form a control system that wipes out the contributions of others not in a similar or larger implicit alliance. That doesn't imply the solution is to vote independently of the total, though, as there are negative side effects like the one you describe.

I mean, do people downvote comments that they would have otherwise not voted on if they think the comment has too many upvotes? If not, why do they decline to upvote when they otherwise would have upvoted?

I often (although) not always will upvote a comment simply if it deserves it. I only very rarely downvote or don't vote a comment if I think it is too high but should be positive. Declining to upvote a too high comment is something I do much more frequently than downvoting a too high comment. This is a passive rather than active decision. In general declining to upvote creates less negative emotional feelings in me than actively downvoting something which is too high.

I do sometimes upvote comments that have been downvoted if I think they've simply been downvoted way too much. That seems for me at least to be the most common form of corrective voting.

I have no idea how representative my behavior is of the general LWian.

If I think a comment should end up with a score of +2, too bad, I have no direct way of controlling that. The resulting score is a reflection of the community's votes, not something I try to game by altering my voting decision based on whether the score gets closer to +2.

Ok, but that's your self handicapping and I want no part of it myself.

My decision to vote shall be determined by whatever vote I predict has the best consequences.

Surely by whatever vote is recommended by the decision procedure you predict has the best consequences. ;)

Surely by whatever vote is recommended by the decision procedure you predict has the best consequences. ;)

No, I meant what I said.

I don't think "should have a score of +2" should enter into the decision to upvote, downvote, or not vote.

Why not? No, really: what's wrong with that?

Instead, I'd rather voting algorithms which, when implemented individually, have results which can be meaningfully summed.

The current voting algorithms can be meaningfully summed, they're just complicated, opaque and nonstandardized. I don't understand why you think "everyone should use my voting algorithm" is a useful thing to say.

If I think a comment should end up with a score of +2, too bad, I have no direct way of controlling that.

In what situation would you not, given that it is possible to alter your voting decision based on whether the score gets closer to +2? Do you intend to prevent that somehow?

do people downvote comments that they would have otherwise not voted on if they think the comment has too many upvotes?

At least two people do. Why do you ask? (Seriously, I can't figure out why this is phrased as a rhetorical question.)

Edit: Okay, here's the thing: I think it would be more useful if karma was the average of our valuations; i.e. if you could, say, input '+10' or '-3' as shorthand for 'upvote if below this number, downvote if above' rather than simply 'upvote' and 'downvote'. What do you imagine the problem with this system would be?

Edit: Okay, here's the thing: I think it would be more useful if karma was the average of our valuations; i.e. if you could, say, input '+10' or '-3' as shorthand for 'upvote if below this number, downvote if above' rather than simply 'upvote' and 'downvote'. What do you imagine the problem with this system would be?

Not exactly a problem but a lotof my votes would either be +1000 or -1000.

I think that karma is a useful feedback but only at a very approximate level. If a post is heavily upvoted or heavily downvoted it is likely to be higher quality. But this is extremely approximate. The posts I've had most upvoted are rarely what I would consider my highest quality remarks. For example, this comment was relevant but I don't see any reason why it is at +24 other than some sort of bandwagon effect.

Pff, that's nothing. Two of my highest-karma comments (try not to laugh at the totals; I'm green as grass, remember) are utterly derivative, by virtue of being simple restatements of another person's point in a slightly funnier way. Namely this and this.

It's embarrassing, frankly.

Ok. But the real thing is the discrepancy between them. While that comment I made is at +24, this comment is at +2 where it uses a nearly identical level of sources and analysis about a somewhat similar set of demographic issues.

It isn't just that some funny comments get voted up a lot. It is that there's very little general pattern to how far one comment gets up compared to another even when they are very similar comments.

[-][anonymous]12y150

Comments get more upvotes, independent of quality, if they:

  • Are in a high-traffic thread
  • Are made while the thread is still new
  • Get an early complimentary reply
  • Make a point many people agree with and care about (especially if the first to make that point)
  • Become the highest-karma comment early on (bandwagon + people may only read/vote on the first few comments, so being the top comment is valuable)
  • Are closer to top-level (people don't read deep into threads unless particularly interested)

I think these effects, in aggregate, are probably much stronger determinants of comment karma than actual quality. Top-level posts, to main or discussion, suffer from fewer of these effects, so their karma is a little more reliable. But I hope no one is taking their comment karma too much to heart.

I think that karma is a useful feedback but only at a very approximate level. ...

... there's very little general pattern to how far one comment gets voted up compared to another even when they are very similar comments.

If that's true, then... what's the point of karma scores?

How about this: keep track of total votes behind the scenes, but only report whether the karma is [- -] for k<-5, [-] for -4+10.

I don't think the attribution is right. I am always surprised by what does and doesn't get upvoted, which means I'm poorly calibrated. Something useful I post to discussion after spending 20 hours on it gets 5 upvotes, and then something useless like this discussion post that took me 60 seconds to post gets 25+ upvotes. :)

My first assumption is that almost everything you post is seen as (at least somewhat) valuable (for almost every post #upvotes > #downvotes), so the net karma you get is mostly based on throughput. More readers, more votes. More votes, more karma.

Second, useful posts do not only take time to write, they take time to read as well. And my guess is that most of us don't like to vote on thoughtful articles before we have read them. So for funny posts we can quickly make the judgement on how to vote, but for longer posts it takes time.

Decision fatigue may also play a role (after studying something complex the extra decision of whether to vote on it feels like work so we skip it). People may also print more valuable texts, or save them for later, making it easy to forget to vote.

The effect is much more evident on other karma based sites. Snarky one-liners and obvious puns are karma magnets. LessWrong uses the same system and is visited by the same species and therefore suffers from the same problems, just to a lesser extent.

Decision fatigue may also play a role (after studying something complex the extra decision of whether to vote on it feels like work so we skip it).

This. Also after reading a more complex thing, it seems common that I'll forget to think about voting at all, since I'm distracted by thinking about the implications or who I might want to share it with or what other people have to say about it. Sometimes I remember to go back and vote, but I think most of the time I just don't, whereas with funny things the impulse to focus on the author and give them a reward in response seems to be automatic.

Also, sometimes an apparently well-researched article turns out to be based on only a superficial understanding of the topic (e.g. only having skimmed the abstracts) and mis-represents the cited material, and this is sometimes revealed on "cross-examination" in the comments.

I guess that's a little better. (also that sounds like poor accuracy rather than poor calibration, but that's probably just semantics).

On the other hand I suspect you are well calibrated with what gives you respect and reputation. You could say that your poor calibration with respect to karma is karma's problem! :)

Something useful I post to discussion after spending 20 hours on it

20 hours on a discussion post? That would be a mistake right there!

The 20 hours isn't for LW karma, obviously. It's stuff like announcing IntelligenceExplosion.com.

At least one version of the "Clippy" avatar also appeared to have a facial expression that looked like it was being sarcastic at you. I bet if they had made the "cat" avatar the default it wouldn't have been so hated.

[-][anonymous]12y30

.

Clippy - like tech might get another chance

http://9to5mac.com/2011/10/03/co-founder-of-siri-assistant-is-a-world-changing-event-interview/

if Apple releases it it's highly likely to usable. And it does use graphical models, as far as I know - I think they might be using Domingos' Markov Logic.