VP9 Encoding : Journey So Far
Qian Chang
Follow
· 7 min read · 7 min readFeb 8 · 7 min read
As we rolled out our encoding using VP9, we saw a bunch of improvements on video quality and bitrates (compared to H264), however, we also ran into challenges, which we will talk about.
Exploration
We’re using libvpx to encode our videos. After a period of time, we found that some content can be leaner with similar perceptual quality, by adding some entropy preserving filters. With these changes, we got remarkable bitrate savings on some of our most popular shows.
We found that some of our VP9 encoded content didn’t perform well on some content with high motion scenes and dark scenes. So, we decided to pause VP9 encoding on content types that would exhibit these issues. We then found that on some specific contents, the manifest bandwidth of the 240p layer is higher than that of the 360p layer. Due to the above issues, we paused VP9 encoding and dug deeper to analyse and investigate them. Finally, we came up with solutions that made our VP9 encoding output better.
Encoding
In this part, we would like to talk about 2 points that are not so frequently discussed on tech forums: the 2pass rate control method and multi-thread encoding speed.
Rate Control method
Similar to x264, VP9 has 1pass ABR, Constant Quality, 2pass ABR, and Constrained Quality rate-control methods.
CRF with bitrate caps is frequently used in x264 encoding. In VP9 CRF mode, the encoder tries to reach a constant (perceptual) quality while keeping the average bitrate below the bitrate limit.
This is different from the x264 CRF rate-control. In x264 we can use VBV buffer and VBV maxrate to control the bandwidth (maxrate) value of each layer in the DASH manifest. But in VP9 CRF mode there is no way to control this directly. The maxrate value is very important in adaptive bitrate selection. With a higher maxrate value, the stream will be picked up by fewer clients.
Another thing that is rarely noticed is that we can use 2pass in Constrained Quality mode encoding. As 1pass CRF is widely used in x264, we didn’t try 2pass CRF in VP9 at the beginning. However, 2pass CRF performs much better than 1pass CRF in VP9. It can improve the quality of some complex scenes. We will discuss the details later.
Multi-thread encoding speed
For VOD encoding, we tend to use a slow speed setting to get better quality and smaller size. In x264/x265, we can use 10 or more threads to speed up the encoding of 1080p videos. However, we can not utilize that many threads in libvpx. And the 1080p encoding speed is much slower than x264 under slow preset.
After some investigation, we knew that the maximum threads that libvpx can utilize are related to tiles. The max tiles are determined by the resolution. This table shows the max tiles for each resolution.
For 1080p contents, the video width is 1920 and the max tiles are only 4. Therefore the encoding time of 1080p is a bottleneck of our VOD service. Fortunately, the -row-mt option was introduced in libvpx v1.7 and the multi-thread encoding speed became faster than old versions. But for video content that requires a short release time, libvpx still could not meet our requirements and we need GOP level parallelization to improve it.
Packaging
Bento4 or Shaka packager?
Bento4 is very popular in HLS/DASH packaging for H264/H265 contents. For VP9, we have one more choice: Shaka packager. According to the developers, Bento4 focused on all formats based on the ISO Base File Format standard and Webm was thought to be very different. Besides, some VP9 + AAC streams generated by Bento4 could not play well in our Chrome browsers. On the contrary, Shaka packager can cover all of our use cases. So we decided to use Shaka packager in VP9 packaging.
The Shaka Packager can output fMP4 DASH streams with VP9 + AAC codec and Webm DASH streams with VP9 + Opus codec. It supports AV1 + AAC and AV1 + Opus well too.
Shaka Packager ...