An Object Lesson in Personalized Streaming Video Experiences
One of the powerful arguments for delivering object-based, as opposed to linear, media is the potential to have content adapt to the environment in which it is being shown. This has been standard practice on the web for years, but it is now being cautiously applied by broadcasters and other video publishers using standard internet languages to create and deliver new forms of interactive and personalised experiences as broadcast and broadband converge.
“The internet works by chopping things up, sending them over a network, and reassembling them based on audience preference or device context,” explains Jon Page, R&D head of operations at the BBC. “Object-based broadcasting (OBB) is the idea of making media work like the internet.”
Live broadcast content already comprises separate clean feeds of video, audio, and graphics before they are “baked in” to the MPEG/H.264/H.265 signal on transmission. OBB simply extracts the raw elements and delivers all the relevant assets separately along with instructions about how to render/publish them in context of the viewer’s physical surroundings, device capability, and personal needs.
The nearest parallel to what an object-based approach might mean for broadcasting can be found in video games. “In a video game, all the assets are object-based, and the decision about which assets to render for the viewer’s action or device occurs some 16 milliseconds before it appears,” says BBC research engineer Matthew Shotton. “The real-time nature of gaming at the point of consumption expresses what we are trying to achieve with OBB.”
MIT devotes a study group to object-based media, and its head and principal research scientist, V. Michael Bove (right), agrees that video games are an inherently object-based representation. “Provided the rendering capacity of the receiving device is known, this is proof that object-based media can be transmitted,” he says. The catch is that this only works provided the video is originated as an object.
The BBC’s R&D division is the acknowledged leader in OBB. Rather than keep its research a secret in its lab, the company is keen for others to explore and expand on its research.
“We want to build a community of practice, and the more people who engage in the research, the faster we can get some interesting experiences to be delivered,” says BBC research scientist Phil Stenton. “We are now engaged with web standards bodies to deliver OBB at scale.”
Back to Basics: What Is an Object?
In the BBC’s schema, an object is “some kind of media bound with some kind of metadata.” Object-based media can include a frame of video, a line from a script, or spoken dialogue. It can also be an infographic, a sound, a camera angle, or a look-up table used in grading (and which can be changed to reflect the content or to aid visual impairment). When built around story arcs, a “theme” can be conceived of as an object. Each object is automatically assigned an identifier and a time stamp as soon as it is captured or created.
Since making its first public demonstration of OBB during the 2014 Commonwealth Games, the BBC has conducted numerous spinoff experiments. These range from online video instructions for kids on how to create a 3D chicken out of cardboard to work with BBC News Labs to demonstrate how journalists can use “linked data” to build stories. It has created customised weather forecasts, a radio documentary constructed according to the listener’s time requirements, and most recently a cooking programme, CAKE, which was the first project produced and delivered entirely using an object-based approach.
All these explorations are a means to an end. “They illustrate how we build an object-based experience and help us understand if it is technically feasible for distribution and delivery for ‘in the moment’ contextual rendering,” says Stenton. “The next step is to extract common tools and make them open for others to use.”
In particular, the BBC is wrestling with discerning which objects are domain-specific and which can be used across applications, how those common objects can be related to one another, and what standards are needed to make OBB scalable.
Most websites are able to accommodate and adapt to the wide variety of devices that may be used to view them with varying layouts, font sizes, and levels of UI complexity. The BBC also expects a sizeable portion of both craft and consumer applications of the future to be based on HTML, CSS, and JavaScript. However, the tremendous flexibility afforded by that web tech is also a disadvantage.
“Repeatability and consistency of approach among production teams is extremely difficult to maintain,” says BBC research engineer Max Leonard, “especially when combined with the sheer volume of possible avenues one can take when creating new object-based media compositions.”
Object-Based Compositions
The BBC’s OBB experiments have relied on HTML/CSS/JS but have taken different approaches to accessing, describing, and combining the media, making the content from one experience fundamentally incompatible with another.
“The only way we can practice an object-based approach to broadcasting in a sustainable and scalable way at the same level of quality expected of us in our linear programming is to create some sort of standard mechanism to describe these object-based compositions, including the sequences of media and the rendering pipelines that end up processing these sequences on the client devices,” says Leonard. “The crux of the problem, as with any standard, is finding the sweet spot between being well-defined enough to be useful, but free enough to allow for creative innovation.”
BBC R&D has a number of building blocks for this language. This includes an Optic Framework (Object-based Production Tools In the Cloud) which, to the end-user, will appear as web apps in a browser, but the video processing and data are kept server-side in the BBC’s Cosmos cloud.
The Optic Framework aims to deliver reusable data models to represent this metadata, so different production tools can use the same underlying data models but present differing views and interfaces on this based on the current needs of the end user.
Optic uses the JT-NM data model as its core, and each individual component within it uses NMOS standards to allow for the development of tools within an open and interoperable framework.
The NMOS content model allows developers to easily refer to media by a single identifier, irrelevant of actual resolution, bitrate, or encoding scheme.
Inspired by the WebAudio API, the BBC has built an experimental HTML5/WebGL media processing and sequencing library for creating interactive and responsive videos on the web. VideoContext uses a graph-based rendering pipeline, with video sources, effects, and processing represented as software objects that can be connected, disconnected, created, and removed in real time during playback.
The core of the video processing in VideoContext is implemented as WebGL shaders written in GLSL. A range of common effects such as cross-fade, chroma keying, scale, flip, and crop is built in to the library. “There’s a straightforward JSON [JavaScript Object Notation] representation for effects that can be used to add your own custom ones,” explains Shotton. “It also provides a simple mechanism for mapping GLSL uniforms onto JavaScript object properties so they can be manipulated in real time in your JavaScript code.”
The library—available as an open source— works on newer builds of Chrome and Firefox on the desktop, and, with some issues, on Safari. “Due to several factors, the library isn’t fully functional on any mobile platform,” says Shotton. “This is in part due to the requirement for a human interaction to happen with a video element before it can be controlled programmatically.” The BBC is using the library internally to develop a streamable description for media composition with the working title of UMCP (Universal Media Composition Protocol).
It has taken a cue from Operational Transformation, a solution to support multiuser, single-task working, which powers Google Docs and Etherpad. “With a bit of domain-specific adaptation, this can be put to work in the arena of media production,” explains Leonard.
The kernel of the idea is that the exact same session description metadata is sent to every device, regardless of its capabilities, which can, in turn, render the experience in a way that suits it: either live, as the director makes the cuts, or at an arbitrary time later on.
“It is the NMOS content model which allows us to easily refer to media by a single identifier, irrelevant of its actual resolution, bitrate, or encoding scheme,” explains Leonard.
Related Articles
Object-based audio advances to the next stage in nonlinear content creation delivery
13 Sep 2017
By bringing interactivity to the screen on mobile devices, set-top boxes, and VR headsets, Axonista opens new avenues to monetisation
21 Mar 2016