Ruby

LessWrong Team

Sequences

LW Team Updates & Announcements
Novum Organum

Comments

Ruby92

As noted in an update on LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!"), yesterday we started an AB test on some users automatically being switched over to the Enriched [with recommendations] Latest Posts feed.

The first ~18 hours worth of data does seem like a real uptick in clickthrough-rate, though some of that could be novelty.

(examining members of the test (n=921) and control groups (n~=3000) for the last month, the test group seemed to have a slightly (~7%) lower clickthrough-rate baseline, I haven't investigated this)

However the specific posts that people are clicking on don't feel on the whole like the ones I was most hoping the recommendations algorithm would suggest (and get clicked on). It feels kinda like there's a selection towards clickbaity or must-read news (not completely, just not as much as I like). 

If I look over items recommended by Shoggoth that are older (50% are from last month, 50% older than that), they feel better but seem to get fewer clicks.
 



A to-do item is to look at voting behavior relative to clicking behavior. Having clicked on these items, do people upvote them as much as others? 

I'm also wanting to experiment with just applying a recency penalty if it seems that older content suggested by the algorithm is more "wholesome", though I'd like to get some data from the current config before changing it.

Ruby40

We've had the choice of tabs up for a month now and the results so far are encouraging, or at least not discouraging. There are many users who are very pleased with the Recommendations, liking among other things that it brings to attention posts that otherwise get lost if you only see what's new. Clickthrough-rates are higher for people using the Enriched/Recommendations tab, although this is most certainly a selection effect on the kind of user who changes tab at all. Switching some people over automatically is motivated by wanting to get a better signal here before doing something like changing the global default.

The current recommendations still needs more work though. People are much less likely to click on recommendations of posts that they've already clicked on, but it's proving tricky to eliminate such recommendation entirely. Also the algorithm overwhelmingly recommends posts from the last year when we'd like to see it surfacing stuff from further back too. Still, Latest is overwhelming stuff from the Last week, so it's still an improvement over the counterfactual.

--

From when we started the project, we've settled on the "hybrid" list being likely optimal as the default list people look at. Many people want to "keep up with the latest" even if they're also interested in good posts from all time, so any recommended list of posts that's the default has to have a heavy latest component. We first tried making two calls to the Recommendations API, one with heavy recency bias, but it was hard to get it consisted, so we switched to just splitting the list between the usual Latest algorithm and new recommendations algorithm.

This has the advantage that is preserves some of the "common knowledge" aspect of the current algorithm where you know which posts other people are seeing too, and an author knows that if they get upvoted, their post will be visible automatically and transparently to many people. As discussed elsethread on this post, we want to have a pure-recommendations tab as well and have been waiting on a bit of coding to make that happen.

--

People often have the fear of goodharting on the wrong metric (like clicks) for recommendation algorithms. I think we do need to keep an eye on that, and I want to build more analytics tools for detecting drift here, and more talking to people. I think as we fix up more basic issues like excluding read content and getting it to even recommend posts from older than a year ago[1], we'll put more attention on is the trend good.

  1. ^

    One guess I have is the algorithm is stuck for dumb "structural" reasons, in that it's been given recent data which is overwhelmingly of people reading recent content, so when it queries "what's good?" recent content comes out on top even without explicitly training that into the system.

Ruby30

It's the plan to have that live, only reason we didn't deploy it on Thursday was we have to do a small bit of extra work to extend caching (to achieve acceptable performance) to the pure-recommender view. Probably have it up soon.

Ruby132

The title is strong with this one. I like it.

Ruby40

Over the years the idea of a closed forum for more sensitive discussion has been raised, but never seemed to quite make sense. Significant issues included:
- It seems really hard or impossible to make it secure from nation state attacks
- It seems that members would likely leak stuff (even if it's via their own devices not being adequately secure or what)

I'm thinking you can get some degree of inconvenience (and therefore delay), but hard to have large shared infrastructure that's that secure from attack.

Ruby31

I'd be interested in a comparison with the Latest tab.

Ruby80

Typo? Do you mean "click on Recommended"? I think the answer is no, in order to have recommendations for individuals (and everyone), they have browsing data.

1) LessWrong itself doesn't aim for a super high degree of infosec. I don't believe our data is sensitive to warrant large security overhead.
2) I trust Recombee with our data about as much as our trust ourselves to not have a security breach. Maybe actually I could imagine LessWrong being of more interest to someone or some group and getting attacked.

It might help to understand what your specific privacy concerns are.

Ruby20

Hard to answer without knowing your background. I might try online courses or ask Chat-GPT here for advice.

Ruby50

Curated. It's a funny thing how fiction can sharpen our predictions, at least fiction that's aiming to be at least plausible in some world model. Perhaps it's the exercise of playing our models forwards in detail rather than isolated abstracted predictions. This is a good example. Even if it seems implausible, noting why is interesting. Curating, and I hope to see more of these built on differing assumptions and reaching different places. Cheers.

Load More