H.264 Lessons from iTunes
[Editor's note: This article first appeared in EventDV magazine—www.eventdv.net.]
I recently gave a presentation on producing H.264 video at Streaming Media West. During my preparation, I noticed that while H.264 is the hot topic in periodicals and forums, it’s still not widely used for streaming, with Windows Media and VP6-based Flash still predominating. In one market segment, though, H.264 was nearly ubiquitous, so I decided to spend some time learning about how H.264 was used—and misused—there. The market I’m referring to is video podcasts distributed via iTunes. iTunes is all about iPod/iPhone devices, which play only H.264 and MPEG-4 video. Finding H.264 video in iTunes is easy.
Having found the Mecca of H.264 usage, I decided to download 50 podcasts, try to load them on my iPod Nano, and see what happened. Six refused to load at all, and three had what I’ll call "potentially sub-optimal" displays. After analyzing all the podcasts in Inlet HD’s excellent streaming media analysis tool, Semaphore, I noticed that many others used suboptimal encoding parameters. While producing podcasts is probably a tiny part of what we do, it’s still a useful skill, so I thought I would detail my findings. First, here’s some background. MPEG-4 is the overarching standard that includes two video codecs, the MPEG-4 codec itself and a more advanced video codec, H.264, also known as AVC. When used in an MPEG-4 "wrapper," H.264 files typically have a .mp4 or .m4v extension, the first being the official designation and the latter being the extension Apple created for its devices.
You can also "wrap" an MPEG-4 file in a QuickTime file with a .mov extension or encode it for Flash with a .flv or .f4v extension. Soon, you’ll be able to encode H.264 to Windows Media presumably with a .wmv extension. H.264 has multiple "profiles" that specify levels of playback compatibility. For example, the Baseline profile is typically for devices like iPods or cell phones that have limited playback horsepower. Accordingly, the Baseline profile doesn’t use many of H.264’s more advanced encoding techniques that can produce higher-quality streams, but it may also create a stream that’s hard to decode. Then there’s the Main and High profiles, typically for computer-based playback, which produce a tighter, higher-quality stream that’s harder to decode.
Obviously, when producing for devices rather than general purpose playback, job No. 1 is to use the appropriate profile. Interestingly, of the six videos that wouldn’t play on my iPod Nano, five used the Main Profile, which is verboten. The sixth used the Sorenson Video 3 codec, of all things, which also won’t play.
So when producing for podcasts, always use the Baseline profile of the H.264 codec. Before encoding, however, go to Apple.com, print the video playback specs for the latest iPod, and make sure that you’re within the resolution and data rate requirements. Unfortunately, this is more complicated than it sounds because the initial iPod could only play H.264, Baseline-profile videos at 320x240 resolution, while current iPods and iPhones can play Baseline H.264 video up to 640x480 resolution.
So your next major decision is target resolution. In the sample of 44 videos that loaded on my iPod Nano, 25 went with 320x240, which is obviously the safer route since the video should play on all iPods, while the other 19 (and five of six that failed to play) went 640x360 or larger, which created potential incompatibilities with olde iPods. Why go larger than 640x480 when the screen resolution of most iPods is 320x240? First, many iPods have composite output ports that let you play the video on a TV set or other analog device. Though display on the device itself is limited to 320x240, 640x480 video will look better than 320x240 when displayed on a TV set. More importantly, iPhones and the iPod Touch have 480x320 resolutions, and six of the 19 producers using greater than 320x240 resolution produced at 16:9, which looked better on the iPhone/iPod Touch than 4:3 video.
This leads me to the three podcasts with "potentially sub-optimal displays." Briefly, if you display 16:9 video on a 4:3 iPod, by default, the device displays the middle section of the video and cuts off the right and left edges. Several producers of 16:9 video—including Photoshop User TV—included screencam videos with content on the edges that wasn’t visible when viewed on a 4:3 display. As a result, while the announcer says "click this menu item," the menu item may have been offscreen on 4:3 displays.
Note that your viewers can change this default in the Movie menu and display the entire 16:9 frame with letterboxes on the top and bottom of the video. Since most iPods have 4:3 displays, if you’re going to produce 16:9 video, either make sure that all critical content is visible in the 4:3 center cut, or advise your viewers to change the default to display the entire frame within letterboxes.
The other mistakes I found were technical, like exceeding the recommended data rate and using too-frequent keyframes, which can degrade quality and add a pulsing effect. So it’s worth noting that the iPod preset in Apple Compressor uses a bitrate of 1.12Mbps for 640x480 video and inserts keyframes every 150–300 frames, depending upon content, or one every 5–10 seconds.
Jan Ozer (jan at doceo.com) is a streaming media consultant and frequent contributor to industry magazines and websites on streaming related topics and the author of Critical Skills for Streaming Producers, a series of mixed media tutorials on DVD. The general tutorial is available here; there are also versions specific to Final Cut Pro and Adobe Creative Suite 3 (click the links to take you to the correct product).