Buyers' Guide: Hardware Transcoders
Although software transcoding is acceptable for transcoding most VOD streams and even low-volume live programming, most high-volume live applications need hardware for efficient transcoding, both to save you money and to save the planet. This buyers' guide will cover:
- What hardware transcoders are
- What you need to bring to the table to identify the best hardware transcoder
- Factors to consider when choosing one
- Choosing a hardware transcoder for cloud workflows
- Choosing a hardware transcoder for on-prem workflows
As with all buyer’s guides, lists are intended to be representative, not exhaustive. If you have a hardware transcoding device you think should be mentioned, leave a comment on the web version of this article, or let me know at jan.ozer[at]streaminglearningcenter.com.
For the record, all throughput, quality, and cost calculations are for H.264 only.
What Hardware Transcoders Are
For the purposes of this article, hardware transcoders include devices that enable high-volume transcoding, such as those powered by graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs). I don’t include hardware transcoding available in a CPU, like Intel’s Quick Sync, because this approach is challenging to scale. As you’ll see, you can pack 10 or more GPU- or ASIC-powered devices in a server, but that’s challenging (and much less affordable) with CPUs.
What You Need to Bring to the Table
I’ll be computing costs throughout this article using the following assumptions:
- A 100-channel FAST service
- Running 24/7/365
- Using the encoding ladder in Table 1
While the numbers will vary with your specific requirements, you should be able to adjust the analysis to fit most transcoding configurations.
There’s nothing special about the ladder in Table 1, but I needed some numbers to work with. In the table, you see the column “% of 1080p” and the line “1080p equivalents.” This represents each rung as a percentage of the pixel count of a single 1080p stream. I’ll add these percentages to calculate a total workload of 1.87 1080p streams for this encoding ladder, which I’ll use to estimate throughput in a later section. Also note the total bitrate in kilobits per second, which I’ll use to compute bandwidth costs later.

If you want to follow along, you should know the number of input streams and the number and configuration of output streams. You’ll also need your cost per gigabyte for transferred bandwidth for future calculations. I’m working off a Google Sheet that contains all of the calculations shown in this article. It’s not exhaustive, but it should be a useful starting place for anyone who is making these calculations. You won’t be able to modify the Google Sheet, but you can download it to an Excel file that you can use as you wish.
Factors to Consider When Choosing a Transcoder
Let’s review the factors to consider when choosing a hardware transcoder. Obviously, the device must support your current output format and codecs you might think about adding over the next 3–4 years. For most services, this includes H.264, HEVC, and AV1. None of the current transcoders output VVC, and I didn’t consider LCEVC, although some of the transcoders mentioned likely support it with the necessary software.
Next, consider how you’re going to control transcoding. Virtually all transcoding devices will support FFmpeg and offer a lower-level API, with GStreamer as another popular option. FFmpeg and GStreamer are straightforward, but if you’re controlling the hardware via the API, you may find significant differences in complexity and ease of support. Assess this before making a buying decision. If you’re using an application like Norsk or Wowza Streaming Engine, be sure that it supports the transcoder you’re considering.
Next up are throughput and quality, which go hand in hand. As with software transcoders, most hardware transcoders have presets that balance quality and throughput. With software encoding for VOD, you typically care about encoding speed, but with hardware transcoding for live, throughput is all about the number of real-time streams the transcoder can output, not frames per second. That’s because some hardware transcoders are limited in the number of simultaneous streams they can encode or decode; just because a transcoder can output 1080p30 at 1,200 frames per second doesn’t mean it can produce 40 1080p30 outputs. If you’re transcoding VOD, frames per second is relevant as a measure of encoding speed; if transcoding live streams, only the number of real-time outputs matters.
As I discuss in the article Choosing the Best Preset for Live Transcoding, once you get above 200–300 viewers per channel, considering hardware and bandwidth costs, it makes the most economic sense to use the highest-quality preset to deliver maximum quality at the lowest bitrate. Although your transcoding costs will be the highest using these presets, your reduced bandwidth costs at even moderate viewer levels should more than make up the difference. We’ll look at a calculation of this in a moment.
Measuring Throughput
Throughput specs are most useful when differentiated by the encoding preset, like those shown at go2sm.com/ma35d for the AMD MA35D. The density numbers in the MA35D specs are worth exploring. Single density means streams encoded with H.264, HEVC, and mid-quality AV1, while double density means H.264, HEVC, and mid-quality AV1, plus high-quality AV1.
This is because the MA35D deploys two ASICS: one capable of H.264, HEVC, and mid-quality AV1 and the other, high-quality AV1 only. If your application involves equal streams of H.264, HEVC, or mid-quality AV1 and high-quality AV1, you can double capacity at no extra cost. Specifically, the MA35D can transcode 32 1080p30 streams of H.264, HEVC, or mid-quality AV1. Simultaneously, it can handle 32 additional 1080p30 streams of high-quality AV1 on the same hardware.
In general, if a vendor doesn’t designate preset data according to its specifications, you should assume that the company used a high-throughput/low-quality preset that you likely won’t want to use for production. That means you have to test yourself.
NETINT presents throughput data on its spec sheets and some product reviews published on its site. Another valuable resource for performance and quality results is Derrick Freeman’s excellent review of Quadra for Streaming Media.
NVIDIA provides some performance data here, but it has too many encoding-capable GPUs to fully document their performance (see the decoding and encoding support matrices here). Intel documents the features but not the performance of its transcoding-capable CPUs here but expects its integrators to document the performance of their respective apps on the Intel GPU technologies.
Wowza has a resource that details the throughput of various AWS instances, including AWS EC2 G4, C5, and VT1, which I’ll refer to in the next section.
In all cases, note that throughput is shown for generic configurations, which may or may not match your own. If you’re inputting interlaced feeds, check if the transcoder can de-interlace in hardware; otherwise, you’ll have to de-interlace using the host CPU, which will cut throughput. Ditto for input formats the board might not natively support, like MPEG-2- or AV1-encoded contribution streams.
Assessing Quality
Next up is quality. Most vendors provide basic quality-related information, like this from AMD: “The MA35D card nominally produces video quality that is closely correlated to x264 medium, x265 medium and x265 slow presets, concerning its accelerated AVC, HEVC and AV1 encoders.” Obviously, this is too vague to help you pick a quality winner among available cards.
There are some published quality comparisons, but you probably won’t find a study that covers all of the devices you’re considering in your anticipated configuration. One useful study is the Moscow State University (MSU) hardware comparison, which benchmarked AMD’s Radeon RX 6800 XT, Intel’s Arc A380 GPU, and NVIDIA’s RTX 4070TI, along with several other hardware devices that aren’t as commercially available. Unfortunately, the report doesn’t include the AMD MA35D.
Table 2 shows the bitrate savings each codec delivered compared to the x265 codec using the very fast preset as measured by VMAF. As an example, the Intel ARC A380 produced H.265 at 80.8% of the reference as compared to 99.3% for the NVIDIA card. This means that the bitrate of the NVIDIA stream would have to be 23% higher to deliver the same quality as Intel’s. The situation reversed with AV1, where NVIDIA was about 7.5% more efficient than Intel.

Table 3 shows the 5-year bandwidth cost for the 100-channel FAST service example in this article, based on a $.04/GB CloudFront charge for that volume level. Using the Intel transcoder for HEVC production could reduce these charges by 23%, saving approximately $170,000. Similarly, producing AV1 with NVIDIA could reduce bandwidth costs by 7.5%, saving around $55,300. Both represent substantial savings worth considering in your purchase decision.

Note that the MSU report includes results from the NETINT Quadra. Citing a desire for privacy, NETINT declined to cooperate with MSU and wanted its results pulled from the study. For this reason, I didn’t present them in Table 2.
Also, note that MSU tested game-oriented GPUs, as opposed to data centre cards like the NVIDIA T4 that you’re more likely to deploy in a cloud platform. I would guess there are a few qualitative differences between the encoding delivered by game and data centre CPUs, but that’s just a guess.
Interestingly, a Tom’s Hardware review tested multiple gaming-oriented cards from AMD, Intel, and NVIDIA. It found NVIDIA to be first in performance and quality, with Intel in the middle and AMD a distant third. If you’re evaluating GPU transcoders—and you should—NVIDIA is the best candidate.
Before wrapping up this quality section, note that if your application involves scaling to lower resolutions, you should measure quality at those resolutions. All hardware devices use different internal scaling algorithms that are optimised for speed, rather than quality. There are likely substantive quality differences between the hardware alternatives, but you’ll need to test using your encoding ladder to quantify them.
I’ve covered basic qualifying variables (codec and application support) and the likely need to measure quality and throughput yourself. Once you have this data, here’s how you would apply it to identify the best transcoder for cloud and on-prem use.
Transcoding in the Cloud
Let’s explore hardware transcoders available in the cloud, starting with transcoding-specific hardware instances. Amazon EC2 VT1 instances configured with up to eight AMD Alveo U30 media accelerator cards have been up and running since 2021. AMD’s MA35D is available on Microsoft Azure as NMads MA35D Virtual Machines in a preview mode. No performance specs or pricing are provided, but performance should be similar to what AMD provides.
NETINT Quadra cards are available in a beta program on the Akamai Cloud as of this writing. The beta is a free program for “approved customers who have identified workloads that will benefit from NETINT T1U VPU Accelerated plans.” No pricing or performance data is provided, although information on NETINT’s web site at go2sm.com/t1t2t4 should offer some guidance.
The next option is generic GPU instances. All cloud platforms offer a variety of NVIDIA instances for GPU-based hardware transcoding, but there are too many options to list. Instead, I’ll focus on the NVIDIA T4-powered g4dn instances benchmarked in the aforementioned Wowza performance data and the VT1 instance also tested by Wowza. Specifically, these benchmarks report the number of simultaneous 1080p30 streams each instance can deliver.
Again, for this comparison, we assume a 100-channel FAST service, where each channel uses an encoding ladder consisting of multiple renditions (e.g., 720p, 540p, 360p). The ladder increases the 1080p30 workload by a factor of 1.87, as calculated in Table 1. Table 4 analyzes the hardware options, assuming a 3-year commitment to AWS for pricing.

Wowza tested two configurations of the AMD U30 instances: the vt1.6xlarge with two GPUs and the vt1.3xlarge with one GPU. Since performance and pricing were both linear, the cost for each was identical. That is, although the vt1.6xlarge (96 streams) costs twice as much as the vt1.3xlarge (48 streams), it delivers twice the throughput, making the per-stream cost identical.
With the NVIDIA T4 instances, the pricing and performance aren’t linear. In my comparison, the g4dn.xlarge delivered 60 streams for $0.21/hour. The larger g4dn.16xlarge costs 8.3x more but only delivers 10% greater throughput. The g4dn.12xlarge costs 7.4x more than the g4dn.xlarge but delivers only 3.5x the throughput.
The g4dn.xlarge emerges as the most economical cloud option, costing $36,792 over 3 years. Again, this is with a 3-year commitment to Amazon; dropping this commitment to 1 year would increase costs by roughly 50%.
Note that if there are substantial quality differences between the alternatives, you should compute the associated bandwidth costs as shown around Table 3 and factor them into the equation.
While the decision between cloud and on-prem solutions often involves a mix of practical considerations and entrenched preferences, cost comparison remains a critical factor. With that in mind, let’s turn our focus to the on-prem side and examine how the numbers stack up.
On-Prem Installations
If you’re buying your gear for on-prem or co-location, you have to compute the CapEx and OpEx components. CapEx will include the transcoders and server(s) to house them, while OpEx will include co-location or other storage allocation costs plus power.
Table 5 contains cost and performance data for three hardware transcoders. Throughput numbers for AMD are from here, with cost and power figures from the AMD website. For the 187 required 1080p30 equivalents, we need six cards. Supermicro quoted a turnkey price of $18,500 for the 2RU AS-2015HS-TNR server with six MA35D cards installed.

The NETINT data is for the Quadra Video server which is sold fully configured (and yes, it’s also a Supermicro server). If you’re considering a NETINT system and don’t need all cards, it’s worth checking if you can buy a unit with fewer than 10. Note that NETINT sells a $19,000 system with a lower-powered CPU, which might work for many applications, and also a more expensive Ampere-based system if you need de-interlacing or transcription.
T4 throughput numbers are from here and from my tests here. At 16 1080p30 streams, you’ll need 12 cards to deliver the 187 required streams. You can verify the power draw at go2sm.com/t4. The T4 costs around $750, and Supermicro quoted a price of $4,500 for the 4RU GPU SuperWorkstation 7049GP-TRT that can house up to six GPUs. This means two 4RU servers, which means twice the CapEx and OpEx.
Table 6 shows how the three systems compare. Co-location costs are from go2sm.com/4uco but will vary widely. In all cases, I assumed power would be charged separately at $0.15 per kilowatt hour.

As you can see, even with two systems needed, the T4 is the cheapest CapEx-wise, but housing and powering two 4RU servers for 5 years more than makes up the difference. Otherwise, from a cost perspective, the MA35D and Quadra are very close, although at $1,500 a card, the Quadra would gain a significant advantage if NETINT allowed you to buy a system with only six cards. Either way, it’s probably not enough financial difference to drive the decision; it will come to quality or other implementation details.
Note that if there are substantial quality differences between the alternatives, you should compute the associated bandwidth costs as shown around Table 3 and factor them into the equation.
As it is between the cloud and on-prem expenses, buying your own gear is about 22% less costly than operating in the cloud, without considering the time value of money. That assumes a 3-year AWS EC2 commitment, which seems fair given you’re making a lifetime commitment on the hardware purchase.
Of course, it’s not all about comparing the cloud to on-prem operation; it’s about choosing the best option for either operating mode. Hopefully, the analyses presented in this article and the associated Google Sheet will do just that.
Author’s Note: I would like to thank Ben Lee from Supermicro for supplying all of the system and related information and pricing. Also, you should know that I worked at NETINT between August 2022 and March 2024 and that I produced a training course for AMD’s MA35D in 2024. I have no continuing contractual agreements with either company.
Related Articles
We put hardware-based solutions from NVIDIA, Intel, and NGCodec to the test to see which offers the strongest performance and the highest quality.
02 Dec 2019