Overview of FOX’s Resilient, Low Latency Streaming Video Architecture for Super Bowl LIV
Mayur Srinivasan
· 8 min read · 8 min readFeb 5 · 8 min read
On February 2nd, 2020, FOX delivered the most live-streamed Super Bowl in history, delivering an average minute audience of 3.4 million viewers. Behind the scenes, FOX’s video engineering team designed an innovative and highly redundant video streaming workflow to support this record breaking audience with a flawless experience.
We built this cloud-based streaming workflow in-house using a collection of vendor services across transmission, encode, storage, origin shielding, delivery and playback. Our focus was on building resiliency for every component of the video workflow.
This article will cover the following topics:
Architectural overview of FOX’s resilient, low latency video streaming workflow
Monitoring tools that were built/leveraged for the workflow
Testing strategies
Rehearsals leading up to game day
Game day recap
1. Architectural Overview
The following diagram illustrates the signal flow architecture that can be broken up into:
Transmission
Encode
Origin
Origin Shield
Delivery (Multi CDN)
Playback
Transmission
The primary broadcast signal originated from Hard Rock Stadium, in Miami and was then sent to FOX’s Master Control in Los Angeles. Master Control inserted commercials, rating watermarks and closed captioning. From LA, we used a managed fiber network to deliver the finished signal over four diverse fiber paths to multiple cloud regions.
Encode
For encoding resiliency, we used redundant encoder pipelines, deployed in multiple regions . For seamless failover in case of transmission issues on one of the paths, we made sure that the signals were sync locked. Each incoming signal was an RTP-FEC feed with video at 720p60 20 Mbps, AVC. In order to achieve low latency, we chose 2 sec segments with a live window size of 15 segments (30 secs total). On game day, end users experienced latency of roughly 8–12 secs behind the feed from Master Control!
Origin
We used two redundant origins, one in each region. This gave us geo-redundancy in the scenario wherein an entire cloud region had an outage/issue.
Origin Shield
To properly implement our Multi-CDN strategy and protect against congestion, we deployed a industry leading origin shield product to ensure that servers weren’t getting hammered with requests and to optimize for better caching. We also had the necessary knobs/logic to failover in either of the following scenarios:
If a particular feed in a region was down, failover to the backup feed within the same region
If a particular region was down (either dual feed failure/encoder failure/origin issue), failover to the backup origin in the alternate region
Additionally, in the scenario where the origin shield were to go down, we had the option of switching over to a backup shield with a different partner. This backup shield also had the same amount of redundant logic as the primary origin shield.
The ability to switch between primary and backup origin shields was handled through DNS.
Delivery (Multi CDN)
We used a pool of five CDNs for the event. We built a backend service for CDN decisioning that ran at each session start based on the following metrics:
Latency: We used test objects embedded within FOX properties to capture real time user metrics for latency. The size of the test objects was representative of our typical video segment size, in order to emulate latency measurements for video segment downloads. These test objects were fronted by each of the CDNs using configurations similar to actual segment delivery.
Rebuffering ratio: We utilized player rebuffering % metrics as part of the decisioning process.
Number of errors: We utilized player error metrics as part of the decisioning process.
We had protocols established, wherein if a particular CDN approached its reserved capacity, we would rebal...