August 26, 2024
By Jake Ward Business Development Director
Featured Articles
For the rest of the Autumn 2024 issue of Streaming Media magazine please click here

Accessibility and Localisation: How AI Can Create More Accessible Content for Larger Audiences

With key streaming services such as Disney+, Amazon, and Netflix trying to drive down production costs across the board, premium content providers have spent considerable time looking at how they can develop or license content which isn’t produced in English but can offer global appeal.

This is a rather obvious step, as films and shows in several non-English languages account for a significant portion of content consumption on Netflix. For example, in the last half of 2023, Korean accounted for 9% of total viewing; Spanish, 7%; and Japanese, 5%. Global demand for non-English content grew to 40% in 2023, nearly doubling, since it comprised 23% in 2018, according to research from Rise Studios published in February 2024.

Standout examples of non-English-language breakout hits in English markets include a wide range of content in languages other than Korean, Japanese, and Spanish, including Germany’s Dear Child, with 53 million views; Poland’s Forgotten Love, with 43 million views; and India’s The Railway Men, with 11 million views.

In fact, some of Netflix’s biggest hits are in a language other than English. The obvious example is of course Squid Game, a Korean production which is still the most viewed TV series on the service, having racked up more than 2.2 billion hours of viewing since its launch.

Amazon Prime has had similar success with Culpa Mia, a Spanish-language film, and Medellin, a 2023 French film that immediately jumped to the top 10 list of non-English language films. Culpa Mia became the number-one movie worldwide and is featured in the top 10 most watched movies in more than 190 countries.

While Disney+ doesn’t feature a lot of non-English-language content, it still has a significant reach globally, with around two-thirds of its subscribers coming from outside North America and the UK. This means it has a large requirement to translate content for roughly 100 million subscribers in more than 150 territories.

Achieving Global Reach Through AI Translation and Titling

With the amount of content being produced and the possibility of the content moving from becoming a limited market release to a global breakout hit, the need to localise content effectively has always been essential to streaming services in order to gain as much reach as possible.

Historically, making content accessible in terms of multiple-language translations—including sign language—has proven expensive to do well. But with the growth of AI services, it has become apparent that this technology can extend the reach of content while enabling services to support a wider range of languages at a lower cost.

During the last 2 years, AI services have made significant leaps forward in delivering highly accurate and contextually relevant subtitles not only for on-demand but for live content as well. Many of the available tools can also modify the subtitle delivery based on the device being used, including changing the colour, size, and positioning of the subtitles, to create a better user experience.

Several Big Tech companies have shown prototypes or released new versions of their existing platforms demonstrating new developments in this space during the last 12–18 months. For example, Meta showcased its Seamless M4T Universal Speech Translator, a prototype for real-time translation and voice cloning across multiple languages. In dynamic live conversations, the AI seamlessly translated each speaker’s words into other languages while simultaneously replicating voice style and achieving lip-syncing.

Meta’s Seamless M4T Universal Speech Translator provides dynamic, real-time text and speech translation and even translates code-switching speech that incorporates multiple languages into the same conversation.

In February 2023, Google introduced a new feature to Translate. As its name suggests, Local Context considers the local context of phrases and expressions. The result is translations that adapt to regional dialects, slang, and cultural references.

In a similar vein, Microsoft Translator rolled out its Custom Speech Models platform, allowing users to train AI models with their own data, encompassing domain-specific terminology and industry jargon.

Microsoft Translator is a key component of the new paid Copilot+ service and hardware. The hardware in the form of Surface tablets and laptops will include a feature of Copilot+ Live Captions that will create live English captions on-the-fly from any content, with an audio track that had a total of 40 languages at launch.

These major developments demonstrate the general direction of travel in the AI subtitling industry, with a range of suppliers implementing similar context models to improve accuracy around local idioms and content-specific jargon. However, with these large tech firms developing their own processes and models, which will likely be integrated directly into their existing products, this trend may threaten specialist companies in the captioning, subtitling, and AI markets, such as Ai-Media, Nuvo, and others.

Live AI-powered subtitling with Ai-Media LEXI 3.0

Of course, how widely these solutions are adopted—and the ability of smaller providers to compete—will be largely determined by the accuracy and quality of subtitles. Without these, content services become something of a damp squib. This does suggest that a new area of innovation is underway, as these kinds of in-built offerings from large tech companies may dramatically reduce the need for services to directly translate their video content, as the viewing platform will do that on-the-fly—at least in the realm of PCs and mobile devices.

The question for the content platforms, in a world where the quality of device-created subtitles is high, is does creating custom subtitles and captions with greater context make economic sense? In the near term, we may see the content platforms increasingly rely on device-based subtitles as they become more prevalent while still producing platform-based subtitles to support dumber devices such as TVs and set-top boxes.

Real-Time Subtitling and Translation

The heightened speed and performance of these models means that real-time subtitles can be created with almost zero lag for live content. This is a key development for both Amazon and Netflix, as they are now creating and licensing more live content than ever before. During the last 12 months, we’ve seen a significant increase in the number of services that offer real-time subtitling and captioning, such as Microsoft Azure Speech Services’ live captioning. These services are beginning to deliver real-time subtitles for live content with a very high degree of accuracy, and so we are reaching the point where it doesn’t matter if content is live or on demand: everything can have multi-language subtitles.

We should also remember that subtitling makes all content more accessible to those with hearing loss. This is potentially a huge part of the audience in the UK alone, where more than 11 million people have moderate-to-severe hearing loss. This functionality also makes content accessible for those who want to use captions for other reasons, like not wanting sound on in the office or those watching on the bus who don’t want to use or find their headphones.

AI, Accessibility, and Sign Language

The advances in multi-language dubbing and subtitling have been impressive so far in increasing the accessibility of content to both audiences in different countries and the hearing-impaired. Now, AI may offer a real opportunity to massively expand the accessibility of content through its sign language interpretation.

Sign language interpretation has historically been difficult to add to a large amount of content due to its reliance on human interpreters who are willing to appear on camera within content. Added to this is the fact that sign language differs for almost every language in the world, outside of the more common American and British sign languages. These differences have created a bottleneck effect in which the lack of interpreters and their associated cost have limited the amount of content that can be interpreted.

Several companies, including Signapse AI, Kara Technologies, and SIMAX, have begun to offer virtual avatars which can be fed either audio or text and will interpret in a number of sign languages. These avatars range from expressive animated characters to photorealistic interpreters.

Kara Technologies’ AI-generated virtual sign language interpreters

Although many of these solutions continue to show real progress, by and large, they are not yet ready for widespread use and have a limited number of use cases in which they can be effective. The key areas for improvement are in the overall volume of vocabulary the avatars can interpret and the need for an interpreter to not only use their hands to interpret but also their whole-body movement and facial expression to carry the meanings of words. Sign language interpretation is a highly complex task, which, at its core, requires a very human understanding to produce accurate and easily understood interpretations.

However, having seen the rapid increase in accuracy of AI subtitling in the last year, it is clear that AI sign language interpretation is a promising area of development. But it will require much more work, resources, and training to make it a viable option for the future. It is definitely a space worth keeping a close eye on, given that an estimated 1% of the world’s population uses sign language.

Virtual Speakers and Actors

A final area to look at in terms of AI usage in content localisation is something that is at an even earlier stage than models for sign language interpretation, and that is the world of the virtual presenter/actor.

3D capture of actors and the rendering of virtual doubles for VFX work has been with us for a reasonably long time. However, the recent launch of several virtual speaker services, which allow you to capture a person via a webcam and then render them to deliver a speech to a camera using audio they have recorded or text that is interpreted by an AI simulation of their voice, opens up a whole new world of talent replacement for content.

When it comes to localisation, rather than simply dubbing an existing actor into a new language with lip sync, you could, in theory, replace one actor’s entire performance with an actor who is more popular/more recognisable in that region. Imagine, for example, Chris Pratt appearing in Guardians of the Galaxy 3 in English-language markets, but in India, Star-Lord being played by a photorealistic avatar of Prabhas.

The potential for this kind of technology as it becomes more accurate and cost-effective is tremendous, since it creates the possibility of customising a platform’s content not only to a region but even to a specific group of users. For example, imagine fans of Tom Cruise watching him as Iron Man.

The widespread deployment of this technology is still a fair distance away, but the idea of an actor being digitised once and then used many times in different productions has massive implications. It was at the core of the recent SAG-AFTRA strike. Again, the determining factor will be not only the technology reaching maturity, but also the development of robust agreements between actors and content creators to ensure there is a fair deal for all.

The rapid evolution of AI has opened multiple opportunities for content owners and creators to cost-effectively make their content accessible across more territories and to more audiences than ever before. This accessibility will only continue to expand.

The key elements for its growth will come down to how quickly the technology develops, the willingness of the platforms to support different requirements for different audiences, and whether they can begin to develop legal frameworks which support the usage of the tech without disenfranchising talent.

The potential for cost-effective, highly customisable accessible content for every user’s language and region is huge. However, there are legal and ethical challenges to overcome to embed these benefits into the platforms long term. In an ever-evolving technological landscape, the real issue may be the ability of creators, legislators, platform owners, and talent to keep up with the speed of change.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Voice AI is becoming the streaming industry's secret weapon

Whether you're building platforms, designing user experiences, engineering playback systems, or producing content, voice is no longer just an interface. It's becoming a data source, a creative input, a user engagement layer, and even a cost-saver. And the companies that are tapping into it now? They're not just getting ahead, they're building the future.

19 Aug 2025

Voice AI Is Becoming the Streaming Industry's Secret Weapon

12 Aug 2025

Countdown to compliance: preparing for the European Accessibility Act (EAA)

For businesses across the EU and beyond, 28 June is the start of a compliance clock ticking toward critical milestones outlined in the European Accessibility Act (EAA). The EAA isn't just another set of regulations; it's a move toward a more inclusive, accessible digital world. Any business offering services within this space must start to consider how they will be impacted.

18 Jun 2025

Deepdub’s Oz Krakowski Talks Expressive AI Dubbing, Global Monetization, and Live Translation at Scale

In this interview with Streaming Media contributing editor Jan Ozer, Oz Krakowski, Chief Business Development Officer at Deepdub, discusses how the company delivers emotionally rich AI dubbing for media and entertainment clients. From crime documentaries to sports and theatrical releases, Deepdub's hybrid approach blends machine learning with creative talent to localize content at scale while maintaining quality and nuance.

22 Apr 2025

NAB 2025: 3Play Chief Growth Officer Lily Bond talks accessibility, localisation, and service expansion

In this NAB 2025 interview with Streaming Media contributing editor Jan Ozer, Lily Bond, Chief Growth Officer at 3Play Media, discusses how the company has evolved from one of the earliest captioning providers into a full-service accessibility and localisation partner. With 15 years in the business, 3Play now offers captioning, subtitling, audio description, and hybrid AI/human dubbing services for customers looking to comply with accessibility laws and expand into global markets.

15 Apr 2025

How Google and Vevo Leverage AI-Enabled Localisation for Saving Costs and Driving Revenue

The case for AI-based content localisation is typically made in terms of how it streamlines localisation initiatives and dramatically reduces costs, particularly for smaller publishers that might not be able to reach into other markets without it. In this clip from Streaming Media Connect 2025, execs from two larger operations, Google Cloud's Albert Lai and Vevo's Natasha Potashnik, share with IntelliVid's Steve Vonder Haar not only the cost savings opportunities but also how AI-enabled localisation can drive new revenue opportunities as well.

25 Mar 2025

Improving Audience Engagement With Virtual Meetings and Events

Since the mass adoption of working from home during the pandemic, we have seen massive growth in platforms such as Teams and Zoom for not only meetings that would have previously been carried out face-to-face, but also for training sessions, events, and internal communications. While being incredibly useful and widely used communication tools, these services are not without some serious problems when it comes to engaging and retaining an audience.

26 Aug 2024

Talking Localisation

Today, localisation remains a critical budgetary line item for content owners delivering shows to diverse and transnational audiences, and it is probably one whose typical costs have not, until recently, changed considerably in quite some time. The increasingly prevalent use of AI in content localisation, subtitling, and translation promises to change all of that—particularly through the controversial and ethically fraught use of imitative synthetic voices.

26 Aug 2024

Accessibility and Localisation: How AI Can Create More Accessible Content for Larger Audiences

Achieving Global Reach Through AI Translation and Titling

Real-Time Subtitling and Translation

Other Ways AI Will Revolutionise Multi-Language Content Distribution

AI, Accessibility, and Sign Language

Virtual Speakers and Actors

Voice AI is becoming the streaming industry's secret weapon

Voice AI Is Becoming the Streaming Industry's Secret Weapon

Countdown to compliance: preparing for the European Accessibility Act (EAA)

Deepdub’s Oz Krakowski Talks Expressive AI Dubbing, Global Monetization, and Live Translation at Scale

NAB 2025: 3Play Chief Growth Officer Lily Bond talks accessibility, localisation, and service expansion

How Google and Vevo Leverage AI-Enabled Localisation for Saving Costs and Driving Revenue

Improving Audience Engagement With Virtual Meetings and Events

Talking Localisation

Best Practices: Sports and Esports Strategies That Matter Most

Best Practices: Fine Tuning the Live Stream

More

Achieving Broadcast Quality on the Web: A Deep Dive into End-to-End QoS and QoE Monitoring

Secure the Stream: Protecting Content & Revenue in a Multi-Platform World

NAB 2026: Spellbinding Streaming Solutions

More Web Events