Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: More on Twitter and Algorithms, published by Zvi on April 19, 2023 on LessWrong.
Previously: The Changing Face of Twitter
Right after I came out with a bunch of speculations about Twitter and its algorithm, we got a whole bunch of concrete info detailing exactly how much of Twitter’s algorithms work.
Thus, it makes sense to follow up and see what we have learned about Twitter since then. We no longer have to speculate about what might get rewarded. We can check.
We Have the Algorithm
We have better data now. Twitter ‘open sourced’ its algorithm – the quote marks are because we are missing some of the details necessary to recreate the whole algorithm. There is still a lot of useful information. You can find the announcement here and the GitHub depot here. Brandon Gorrell describes the algorithm at Pirate Wires.
Here are the parts of the announcement I found most important.
The foundation of Twitter’s recommendations is a set of core models and features that extract latent information from Tweet, user, and engagement data. These models aim to answer important questions about the Twitter network, such as, “What is the probability you will interact with another user in the future?” or, “What are the communities on Twitter and what are trending Tweets within them?” Answering these questions accurately enables Twitter to deliver more relevant recommendations.
The recommendation pipeline is made up of three main stages that consume these features:
Fetch the best Tweets from different recommendation sources in a process called candidate sourcing.
Rank each Tweet using a machine learning model.
Apply heuristics and filters, such as filtering out Tweets from users you’ve blocked, NSFW content, and Tweets you’ve already seen.
Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.
The most important component in ranking In-Network Tweets is Real Graph. Real Graph is a model which predicts the likelihood of engagement between two users. The higher the Real Graph score between you and the author of the Tweet, the more of their tweets we’ll include.
This matches my experience with For You. Your follows are very much not created equal. The accounts you often interact with will get shown reliably, and even shown when replying to other accounts. Accounts that you don’t interact with, you might as well not be following.
Thus, if there is an account you want to follow within For You, you’ll want to like a high percentage of their tweets, and if you don’t want that for someone, you’ll want to avoid interactions.
What about out-of-network?
We traverse the graph of engagements and follows to answer the following questions:
What Tweets did the people I follow recently engage with?
Who likes similar Tweets to me, and what else have they recently liked?
.we developed GraphJet, a graph processing engine that maintains a real-time interaction graph between users and Tweets, to execute these traversals. While such heuristics for searching the Twitter engagement and follow network have proven useful (these currently serve about 15% of Home Timeline Tweets), embedding space approaches have become the larger source of Out-of-Network Tweets.
So that’s super interesting on both fronts. The algorithm is explicitly looking to pattern match on what you liked. My lack of likes perhaps forced the algorithm to, in my case, fall back on more in-network Tweets. So one should be very careful with likes, and only use them when you want to see more similar things.
More than that, who you follow now is doing two distinct tasks. It provides in-network tweets, but only for those accounts you interact with. It also essentially authorizes those you follow to upvote content for you by interacting with that content.
That implies a...