Adaptive Streaming in the Field

Article Featured Image

Choosing Your Streams
Probably the most interesting question that I asked was how each organization derived the number of streams that it used and its configuration. In essence, I asked, “Was there any science behind your configurations or did you choose them experientially?” I received a number of very diverse answers that should be considered by any company planning to implement adaptive streaming. 

For example, MTV matches video resolution to the display resolution on the webpage, which delivers the best quality at the most efficient file size and playback efficiency. That is, if displaying video in a 640x360 window, a 640x360 stream encodes most efficiently and looks better and plays more efficiently than a smaller or larger stream because there’s no upward or downward scaling. When formulating the stream sizes that he used, Goldstein inventoried the popular display sizes used on the MTV web properties and either conformed his streams to those sizes or convinced the web team to support the more prevalent sizes. 

In addition, once the maximum screen quality is reached for a particular window size, MTV won’t  switch to a higher resolution stream unless the viewer opts for a larger display size, such as full-screen viewing. For example, the 768x432 stream is the primary stream that matches the display window on many MTV web properties, and it’s encoded at 1700Kbps. There are two higher quality streams available as you can see in Table 7, but MTV won’t switch upward unless and until the viewer clicks into a larger display size. The rationale is since viewers wouldn’t notice the difference between the 768x432 stream or a higher quality stream when viewing inside the 768x432 window, this schema saves MTV bandwidth costs that wouldn’t translate to increased viewer satisfaction. 

To maintain consistent stream quality, Goldstein uses the bits/pixel*frame metric shown in Table 7, which is a PowerPoint slide from a recent presentation Goldstein hosted on adaptive streaming. The bits/pixel*frame calculation measures the amount of data applied to each pixel in the video stream, and it’s a great metric to use when comparing the data rates of streams with disparate frame sizes. 

Like all streaming metrics, the appropriate values depend upon the content of the video, since a talking-head newscast might look great at .08 bits per pixel, while a soccer game would look awful. MTV’s numbers reflect two broadly applicable realities, however, that you can achieve sufficient quality at well less than 0.2 bits/pixel and that values should go down as screen sizes increase, reflecting that codecs work more efficiently at larger frame sizes.

Microsoft and Mod-16 
Microsoft also had very well-formed views on the subject of stream configuration, including whether to prioritize “mod-16” resolutions, or resolutions that are divisible by 16. More on that in a moment.

When I asked Zambelli if there was science behind his stream configurations, he responded, “Yes. The resolutions and bitrates are plotted against a power curve that approximates the relationship between bitrate, resolution, and quantization. In other words, we try to keep the quantizer parameter roughly consistent for all bitrates in order to ensure consistent compression quality. The science behind it is captured in my bitrate calculator tool” at 
http://alexzambelli.com/WMV/MBRCalc.html (Figure 1). 

Ozer Adaptive Figure 1

Figure 1. Alex Zambelli's bitrate calculator

As shown in Figure 1, you insert a number of values into the calculator, including resolution, frame rate, aspect ratio, minimum and maximum bitrates, and the number of levels to generate. Then, you press the magic Go! button, and the calculator produces the recommended bitrates and resolutions for the adaptive streaming group. It’s a great starting point for any computer-targeted adaptive streaming effort, though I would defer to the Apple Technical Note when producing for adaptive iOS consumption. 

On the top right of Figure 1 you can see a check box for Force mod-16, which will ensure that the width and height parameters for each stream are divisible by 16. Why? Because most codecs encode in 16x16 blocks, and if the height and width aren’t divisible by 16, the codec will create the block anyway, adding more pixels to the file and making it harder to compress. 

For example, a 16x16 video file would require one 16x16 block to encode, while an 18x18 video file would require four; one extra on the right and bottom to encode the extra pixels and one on the bottom right to square out the video stream. Note that all of these extra blocks and pixels are automatically cropped during display, so you never see them anyway.

Obviously, this is a worst case scenario, and there are two schools of thought on mod-16. One treats nonmod-16 streams like “ring around the collar,” evidencing a total lack of sophistication on the part of the compressionist. However, there are some very relevant arguments against the importance of mod-16. First, as shown in the Frame Size column of Table 7, not all mod-16 resolutions are a perfect 16:9 aspect ratio, forcing the compressionist to either crop pixels or adapt a non-16:9 aspect ratio, which distorts the video, however slightly. Second, the extra pixels are always at the edges, so the encoder can apply less data to these blocks without a noticeable loss in quality. 

Ozer Adaptive Table 7

In addition, the importance of mod-16 resolutions decreases as the resolution increases because 16x16 blocks end up comprising less of the total picture. For example, according to Zambelli, 320x176 versus 320x180 yields a 9% efficiency advantage, but 1920x1072 versus 1920x1080 yields only a 1.5% improvement. Finally, 640x360 is probably the most widely used stream size in existence today, which obviously wouldn’t be the case if the lack of mod-16 compliance significantly degraded quality. When theory clashes with reality, go with reality. 

Here’s what Zambelli had to say  regarding his Olympic and Sunday Night Football encodes: “Most resolutions are mod-16, but in some cases, we had to settle for mod-8 or mod-4 in order to try to match a video resolution to a particular video player window size. For example, 720x404 was the Sunday Night Football player video window size, so we matched it with one of the encoded resolutions in order to ensure it played optimally without requiring any scaling.”

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Creating an HTML5 Website Strategy

When Deutsche Welle, Germany's national broadcaster, decided to move to HTML5, it mapped out all the issues and challenges it would face.

Streaming Gets Smarter: Evaluating the Adaptive Streaming Technologies

With adaptive bitrate streaming, companies can post a video and let the technology sort out the rest. So which product is best for you?