TikTok Recommendation Algorithm Optimizations

Xodarap

Update 2020-06-07: I have done more in-depth investigations into TikTok, whose results can be found at tiktokodds.com.

Summary

The communities I am part of rely heavily on longform text for communication. The world seems to be moving away from text (and especially longform text).

One clear example is TikTok, which is the most downloaded social media application. It centers around sharing short videos. I’m interested in whether important concepts can be communicated via this medium.

As part of that, I researched more about the recommendation algorithm. This has led me to some success (e.g. a coronavirus video with 400,000+ views). Because I found it very hard to get useful information about TikTok when I was doing this research, and because I want the sort of person who would read this post to get wider visibility for their ideas, I am writing this summary.

Background

Most TikTok videos are viewed through the “for you page”, roughly analogous to the Twitter feed. The TikTok algorithm recommends videos for you to view on this page. Note that, unlike with Twitter or Instagram, a large fraction of the content comes from creators that the user does not follow.

The TikTok recommendation algorithm is proprietary and mostly confidential. We know only a few things through information that employees have leaked to the press.

The TikTok recommendation algorithm consists of two components: an automated component and a manual component.

Automated

When a user creates a video, TikTok analyzes the video to identify the target audience for the video. They claim to use a combination of discrete fields (namely the hashtags used by the author and the sound chosen), natural language processing (presumably analyzing any text which is overlaid in the video), and computer vision. For example, they might analyze your video to find that contains images of a cat, has text like “meow” and the hashtag “#catsoftiktok”. They will use this information to identify an audience of people who like cat videos.

They then create a small sample audience of cat-video-lovers who get shown this video. If this audience likes the video, it will be shown to a larger audience. If that audience likes it, it will be shown to a still larger audience, etc.

Whether the audience likes the video is some function of how they engage with it: do they watch it the whole way through, like it, comment, share, etc. A common heuristic is that a video needs at least 10% of the viewers to like it in order to advance to the next stage.

Manual

Reportedly, moderators manually review videos, once after the video has ~100 views, again after it has ~8,000 views, and a third time after it has ~20,000 views. At each stage, the moderator can set different levels of visibility, which determine how widely the video will be shown. These levels of visibility are not visible to the author.

Early leaks of the moderation guidelines showed that they included limiting the visibility of videos containing disabled and possibly also LGBT+ actors, as well as politically sensitive content like protests.

The Intercept recently obtained a more full leak of TikTok’s moderation guidelines. The guidelines mostly attempt to limit sexually “explicit” content (“explicit” in scare quotes because the guidelines include things like banning people for wearing bikinis when not swimming) and politically “controversial” content (notably content that makes China look bad).

Moderation Goals

Interestingly, moderators are also instructed to prohibit “ugly” people and environments (slums etc.). The motivation here seems to be that new users will bounce if they are presented with videos of ugly people/places.

On the other side of things, TikTok apparently hired contractors to steal content of “nice-looking” videos from Instagram and repost it on TikTok. The #BeachGirl hashtag was specifically mentioned as a source contractors should use.

My guess is that there are two separate moderation goals at play: one is politically motivated (e.g. limiting videos about Tiananmen Square), and the second is targeted towards increasing engagement (e.g. displaying videos from more attractive users). TikTok’s official party line seems to be that these guidelines are “locally” produced (implying that the political censorship only happens in China).

Implications

Statistical Modeling

It seems likely that the number of views any video can expect to receive should be modeled as 4 separate models: one which is applicable when the video has less than 100 views, one for the 100-8000 regime, one for 8000-20,000, and one for 20,000+. (Corresponding to the thresholds where manual review takes place.)

For simplicity, I will just assume there is a single model. The process of iteratively showing videos to larger audiences implies a distribution like: 1/2 chance of being shown to 10 people, 1/4 chance of 100, 1/8 chance of 1000, etc. More generally, this implies some distribution like

$P (V i e w s > n^{k}) = p^{k}$

For some parameters $n, p$ .

Noting that we can re-parameterize this as

$P (V i e w s > v) = v^{{log}_{n} (p)}$

We can see that this implies a power law distribution. Indeed, Chen et al. 2019 found that views on the most popular videos were Zipf-distributed.

Empirical Evidence

Looking at my own videos, a simple power law distribution fits reasonably well. (These numbers are using the model $P (V i e w s \geq 10^{k}) = {0.3}^{k - 2}$ .)

Optimizations

Manual versus automated algorithm optimizations

My intuition is that most content is not remarkably changed by manual moderation. (As one intuition pump: TikTok wants to create videos which are popular among its target audience, and probably the empirical evidence about whether a video is popular among its target audience is better correlated with this metric than the opinions of some random moderator.) This implies that it’s more important to focus on automated algorithm optimizations.

Brainstorm

This section is a brainstorm of ways one can optimize their video for wider spread distribution. They all seem like reasonable hypotheses to me based on the above information, but I have no real evidence to support them.

Also, it’s worth pointing out that TikTok has fascist moderation policies and optimizing for fascist moderation is maybe a bad idea.

Optimizing things the recommendation algorithm can easily calculate. Machine learning engineers routinely overuse data fields which are easy to measure, even if those data fields are not the most important. Therefore, it’s reasonable to assume that TikTok’s algorithm also prioritizes these.

Camera resolution. Probably better to use the backward facing camera on a phone, and use a phone with a better camera. Possibly the algorithm might penalize you for having too high of a resolution (e.g. by using a professional camera), since TikTok wants to maintain an “amateur” aesthetic.
Use trending sounds. TikTok can tell which music is more or less popular, and recommend videos based on that. Using sponsored sounds is even better than using ones which are trending organically.
Use trending hashtags. Same idea, although note that the sample audience will also be chosen based on your hashtag. (And it’s crucial to get engagement from the sample audiences.) So if you make a video about video games, using hashtags related to makeup is probably a bad idea, even if those makeup hashtags are trending, because your sample audience will be people who want to watch makeup videos (and therefore won’t engage with your videogame content).
Use trending keywords in text. Even though TikTok claims they do “natural language processing”, I’m skeptical it’s very advanced. Similar to search engine optimization, you probably want to explicitly list out key terms in either your caption or in the video itself.
Use good lighting. TikTok doesn’t want videos that are too dark or oversaturated, and I suspect they can at least partially identify this algorithmically.

Making your content more easily understandable for the algorithm.

Use contrasting backgrounds. Computer vision is still in its infancy, and algorithms would probably struggle to recognize an object in front of a complex background. Using single-color backdrops with bright lighting let the algorithm more easily identify the content of your video. (Of course, the content in your video has to also be the sort of thing TikTok wants to recommend, otherwise them being able to identify the content of your video doesn’t help.)

(Pretend to) Be the person TikTok wants you to be.

Be physically attractive. See above information about how moderators filter out “ugly” people.
Don’t be bully-able. TikTok suppresses videos by creators who moderators think could be bullied. Be cautious sharing problems in your life or controversial opinions. Political opinions seem especially dangerous.
Purchase, use, and mention sponsored products. TikTok makes money through sponsors paying to promote videos containing their products. It seems quite likely that e.g. mentioning a sponsored hairspray will increase your views relative to mentioning a non-sponsored hairspray.
Live the lifestyle Gen Z wants to live. TikTok wants to highlight videos that showcase the lifestyle its target audience (primarily 13-16-year-olds) want to live in order to attract and retain these users. This includes being successful (or at least participating) in desired careers like being a social media influencer or twitch streamer. General wealth, beauty, and success is probably also good.

Lu and Lu 2019: “We found that 21 [of 28] interviewees use Douyin for following the perceived stylish and up-to-date lifestyle, because they consider using Douyin as “a fashionable lifestyle”. Further, 16 interviewees reported that they used Douyin to be able to talk with people around them with interesting and trending topics, because peer students or workers often talk about content on Douyin.”

Have upper-class and clean backdrops. Moderation guidelines include hiding videos that have broken pipes or cracks in the wall.

Timing. As with all social media platforms that prioritize recent content, it’s important to publish content when your audience is most active. (TikTok Pro accounts display a histogram about when your followers are active.)
Focus on quantity. Power law distributions have the counterintuitive property of having expected values that are significantly higher than the median value. The expected number of views on your next video is probably higher than you intuitively expect.

Broader idea: steal ideas from China/Douyin

Douyin is a sister product to TikTok. Several trends on Douyin do not seem popular on TikTok yet. For example:

Interestingly, several young users reported that they like watching and sharing videos in which the creator is sharing good-looking profile images for them to adopt for their social accounts, fancy nicknames for online social accounts, and creative animated mobile phone wallpapers for individuals or couples (Figure 1). This shows a trend that the content of some short videos can be easily “materialized” and adopted in other social channels, and the videos penetrate into users’ real life. As noted by P28, a male student who was in a relationship:

“My girlfriend really likes to adopt the profile images recommended by a content provider as her WeChat account. She says that those profile images are beautiful and can fit with her personality well. She also likes to adopt animated mobile phone wallpapers shared on Douyin. Sometimes those wallpapers are for a couple, which are very creative in that two phones can create a holistic story when put together, and she always urges me to adopt those wallpapers together with her. It is a way to show to her friends that we are in a sweet relationship. ”

Lu and Lu 2019

Academic literature presumably lags trends by a significant amount, so if you have direct access to Chinese or other markets where you can copy trends more rapidly, that’s probably better.

Click Farms/Vote Brigading

TikTok’s recommendation process makes coordinated voting hard. Because the decision about promoting a video is based on how a randomly selected audience interacts with it, having your friends all like your video doesn’t do much.

One possibility is that you could ensure that your friends are disproportionately represented in the randomly selected audience. For example, you could have some hashtag that only you and your friends use, and TikTok might therefore automatically choose your friends to be in the sample audience.

This is just a hypothesis, and I have low confidence it will work.

I would like to thank JP Addison for discussions that led to this work, particularly the observation that EA is heavily tied to longform text.

[-]Pattern6y20

natural language processing [NLP] (presumably analyzing any text which is overlaid in the video),

Audio -> text programs do exist, so it seems possible they could be doing NLP on the output of such a program, though even Youtube's audio to text program can be really bad.

[-]Xodarap6y10

Good point

LESSWRONG
LW