In the past few months, the LessWrong team has been making use of the latest AI tools (given that they unfortunately exist[1]) for art, music, and deciding what we should all be reading.
Our experiments with the latter, i.e. the algorithm that chooses which posts to show on the frontpage, has produced results sufficiently good that at least for now, we're making Enriched the default for logged-in users[2]. If you're logged in and you've never switched tabs before, you'll now be on the Enriched tab. (If you don't have an account, making one takes 10 seconds.)
To recap, here are the currently available tabs (subject to change):
- Latest: 100% post from the Latest algorithm (using karma and post age to sort[3])
- Enriched (new default): 50% posts from the Latest algorithm, 50% posts from the recommendations engine
- Recommended: 100% posts from the recommendations engine, choosing posts specifically for you based on your history
- Subscribed: a feed of posts and comments from users you have explicitly followed
- Bookmarks: this tab appears if you have bookmarked any posts
Note that posts which are the result of the recommendation engine have a sparkle icon after the title (on desktop, space permitting):
Posts from the last 48 hours have their age bolded:
Why make Enriched the default?
To quote from my earlier post about frontpage recommendation experiments:
A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[2], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be centered around the latest content.
This seems very sad to me. When a new user shows up on LessWrong, it seems extremely unlikely that the most important posts for them to read were all written within the last week or two.
I do really like the simplicity and predictability of the Hacker News algorithm. More karma means more visibility, older means less visibility. Very simple. When I vote, I basically know the full effect this has on what is shown to other users or to myself.
But I think the cost of that simplicity has become too high, especially as older content makes up a larger and larger fraction of the best content on the site, and people have been becoming ever more specialized in the research and articles they publish on the site.
We found that a hybrid posts list of 50% Latest and 50% Recommended lets us get the benefits of each algorithm[4].
- The Latest component of the list allows people to stay up to date with the most recent new content, provides predictable visibility for new posts, and is approximately universal in that everyone sees those posts which makes posts a bit more common-knowledge-y.
- The Recommended component of the list allows us to present content that's predicted to be most interesting/valuable to a user from across thousands of posts from the last 10+ years, not being limited to just recent stuff.
Shifting the age of posts
When we first implemented recommendations, they were very recency biased. My guess is that's because the data we were feeding it was of people reading and voting on recent posts, so it knew those were the ones we liked. In a manner less elegant than I would have prefered, we constrained the algorithm to mostly serving content 30 or 365 days older. You can see the evolution of the recommendation engine, on the age dimension, here:
I give more detailed thoughts about what we found in the course of developing our recommendation algorithm in this comment below.
Feedback, please
Although we're making Enriched the general default, this feature direction is still experimental and could turn out to be a bad idea, likely due to more subtle effects that were hard to detect from initial analytics data and brief user interviews.
Any feedback on how you do/don't like what you're getting recommended would be great, and even more so if you can tell us what you'd like to be seeing.
I think the results of the current algorithm are decent; I also imagine that a lot more is possible in terms of detecting what a given user would most want and benefit from seeing.
As always, happy reading!
- ^
Well, if current tools were to exist and we'd stop here or soon, that'd be great – these are useful tools – what's unfortunate is these tools seem to be the product of a generator that isn't gonna stop here.
- ^
We plan to roll this out to logged-out users too, but doing so requires additional technical work.
- ^
Since the dawn of LessWrong 2.0, posts on the frontpage have been sorted according to the HackerNews algorithm:
Each posts is assigned a score that's a function of how much karma it was and how it old is, with posts hyperbolically discounted over time. In the last few years, we've enabled customization by allowing users to manually boost or penalize the karma of posts in this algorithm based on tag. The site has default tag modifiers to boost Rationality and World Modeling content (introduced when it seemed like AI content was going to eat everything).
- ^
We initially tried variations on the recommendations engine to get it to also provide the "latest" half of the posts list, but with our current set-up, that seemed to work much worse than just interleaving with Latest posts.
I wonder if there's a way to give the black box recommended a different objective function. CTR is bad for the obvious clickbait reasons, but signals for user interaction are still valuable if you can find the right signal to use.
I would propose that returning to the site some time in the future is a better signal of quality than CTR, assuming the future is far enough away. You could try a week, a month, and a quarter.
This is maybe a good time to use reinforcement learning, since the signal is far away from the decision you need to make. When someone interacts with an article, reward the things they interacted with n weeks ago. Combined with karma, I bet that would be a better signal than CTR.