Alexandros comments on Is Google Paperclipping the Web? The Perils of Optimization by Proxy in Social Systems - Less Wrong

37 Post author: Alexandros 10 May 2010 01:25PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread. Show more comments above.

Comment author: Chronos 15 May 2010 10:33:08PM *  10 points [-]

I think it's interesting to note that this is the precise reason why Google is so insistent on defending its retention of user activity logs. The logs contain proxies under control of the end user, rather than the content producer, and thus allow a clean estimate of (the end user's opinion of) search result quality. This lets Google spot manipulation after-the-fact, and thus experiment with new algorithm tweaks that would have counterfactually improved the quality of results.

(Disclaimer: I currently work at Google, but not on search or anything like it, and this is a pretty straightforward interpretation starting from Google's public statements about logging and data retention.)

Comment author: Alexandros 16 May 2010 07:44:50PM *  1 point [-]

Thanks for this. I'm constantly amazed at the relevant information that has been turning up here.

I agree that if anything is to be improved, information from other stakeholder groups with different incentives (such as end users) must be integrated. Given the amount by which end-users outnumber manipulators, this is a pretty good source of data, especially for high-traffic keywords.

However, what would stop spammers that focus on some low-traffic keyword to start feeding innocent-looking user logs into the system? I guess the fundamental question is, besides raw quantity, how would someone trust the user logs to be coming from real end-users?

(I understand that it may not be possible for you to get into a discussion about this, if so, no worries)

Comment author: Chronos 19 May 2010 11:58:56PM 1 point [-]

I'm afraid I can't say much beyond what I've already said, except that Google places a fairly high value on detecting fraudulent activity.

I'd be surprised if I discovered that no bad guys have ever tried to simulate the search behavior of unique users. But (a) assuming those bad guys are a problem, I strongly suspect that the folks worried about search result quality are already on to them; and (b) I suspect bad guys who try such techniques give up in favor of the low hanging fruit of more traditional bad-guy SEO techniques.