Tech Leader Pro

Tech Leader Pro podcast 2023 week 27, the Twitter rate limit fiasco


Listen Later

Last week Twitter gave their millions of users a lesson on the impacts of rate limiting a service. In this episode, I will discuss why that was an incredibly ham-fisted way of tackling web scraping.

Notes:

  • Early on Saturday afternoon, European time, I started to get error messages on the Twitter mobile website that read: "Sorry, you are rate limited. Please wait a few moments, then try again.". Ref: https://twitter.com/TechLeaderPro/status/1675109388221132839
  • My immediate thought was my account was being restricted somehow, perhaps from being reported by another user.
  • I did not assume it was a global issue.
  • Then I logged into my laptop, and tried the main website which was displaying similar warnings. When I opened the web console on my browser, I could see that the Twitter API was returning "429 Too Many Requests" to the Twitter web client. Ref: https://twitter.com/TechLeaderPro/status/1675110903262461955
  • Put simply, it became clear that Twitter was throttling it's API. Several hours later, Elon Musk confirmed that with the following tweet:
  • "To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:
    • Verified accounts are limited to reading 6000 posts/day
    • Unverified accounts to 600 posts/day
    • New unverified accounts to 300/day"
    • Ref: https://twitter.com/elonmusk/status/1675187969420828672
    • After that it became clear this was a global issue, and it was deliberate policy instead of a bug!
    • If we take that at face value, and ignore the rumours of cloud bills not being paid on time, this is still an incredibly ham-fisted way of tackling web scraping.
    • If you have a public website, it will be scraped, usually for legitimate reasons like the Google bot scraping your pages to add them to it's search index.
    • Beyond that, if a scraper is behaving in an aggressive manner, it can be blocked via Web Application Firewalls (WAFs) or other types of application gateways.
    • Typically, that block will be via IP address or user agent string, or a combination of both.
    • Following that approach, there is zero reason to rate limit legitimate users, especially for verified users who are paying customers!
    • For legitimate large-scale scraping use cases, Twitter provides a rate-limited API with various free and paid tiers. Ref: https://developer.twitter.com/en/docs/twitter-api
    • Putting rate limits in place there makes perfect sense, and all scrapers should be encouraged to go to the API rather than scraping the site. It is a good revenue generator for Twitter.
    • All companies with public APIs do this, it's common practice in the industry.
    • At no point however should regular users be impacted, and the fallout of this embarrassing instance has once again left a lot of users worrying about the stability of Twitter not only as a platform, but also as a firm.
    • At one point, #RIPTwitter and #Bluesky were trending on Twitter, while as of writing, the tweet above from Elon has received 550m views.
    • This is serious reputational harm not only for Twitter, but for Elon Musk if this service ultimately fails for technical or commercial reasons.
    • I still can't see the end-game here.
    • What I am working on this week:
      • Tech Leadership podcast series: episode 23 will be on "Turning up at meetings", and being present in general.
      • greppr.org : now at 1.8m web pages indexed.
      • Media I am enjoying this week:
        • Doom 2016 replay
        • The Sound of Waves by Yukio Mishima
        • Longitude by Dava Sobel
        • Notes and subscription links are here: https://techleader.pro/a/598-Tech-Leader-Pro-podcast-2023-week-27,-the-Twitter-rate-limit-fiasco

          ...more
          View all episodesView all episodes
          Download on the App Store

          Tech Leader ProBy John Collins