An Impending Accessibility Backlash
Over the past 15 years, software accessibility has matured from a niche specialty to a mainstream expectation. Until around 2010, believe it or not, it wouldn’t have been considered unusual for a student or employee who is legally blind to use completely different software from their peers while doing what would pass as the same tasks. Now, with the success of the Universal Design movement and improved standards in HTML and other technologies, that sort of accommodation would be considered at best a profound failure in procurement and at worst a form of illegal discrimination. Software developers are trained in accessibility issues for front-end development and basic concepts like labeling control elements and reporting state changes to assistive technology—screen-readers—are part of a professional developer’s code testing procedures. Despite this progress, two very different forces are swirling with the potential to push back on the trend towards better technological inclusion of the disabled.
The more short-term of those forces is the debate over “return-to-office.” The mass adoption of synchronous video conferencing platforms proved since 2020 that it is possible to participate in business meetings from anywhere. Whereas previously it was possible—inadvertently or not—for someone in a wheelchair to be excluded from attending a meeting held in an older building without an elevator, that is no longer the case: either bring them in over Zoom, find a better location, or prepare for expensive and unflattering litigation. Many disabled employees learned how much decision-making they were being left out of due to these constraints and will fight tooth-and-nail to retain the inclusion they enjoyed when fully remote leveled the playing field for them. Strictly imposing return-to-office policies can lead to a perception of unfair, differential treatment for employees with disabilities for whom accommodations are necessary or discriminatory hiring policies to avoid such conflicts.
The other threat to universal design and the trend towards better technological accessibility is the emergence of Large-Language Models (LLMs). I discussed the importance of LLMs in this year’s State of Education Video (go2sm.com/edstate), contending that the LLM-based ChatGPT was a much-improved generator of mediocre writing than we’d seen previously and an impressive demonstration of how far we’ve come with software interpretation of natural language prompts. The more exciting LLM product for the online video industry released by OpenAI was the Whisper speech-to-text engine. After writing that, the GPT-4 model rolled out, generating better than mediocre text. Whisper remains a tremendously exciting tool and the “why” of that supports this point.
Before Whisper, one of the open-source speech-to-text engines I’d hoped would pan out was Mozilla Foundation’s DeepSpeech. Back in 2018, we discussed the importance of Language Models in the speech-to-text problem with the example Autumn Aided Cap Shins. The limiting factor in the accuracy of speech recognition for automatic captions was the ability to predict what words were being spoken, a task performed by inference against a language model. DeepSpeech was constructed atop Mozilla’s Common Voice language database, an ethically pure data set in that all its content was voluntarily provided: currently sitting at 3,209 hours of speech data. Whisper’s data set was scraped from the internet and consists of 211x the amount of speech data. With such a larger language model, Whisper dramatically outperforms DeepSpeech.
The LLM that Whisper is built on was not as ethically immaculately conceived as DeepSpeech. Observations that ChatGPT is simply a plagiarism engine—generating text based on a language model pilfered from the creative writing people shared on the internet—apply equally to Whisper (although I’d argue that Whisper provides a substantial public good in providing accurate video captions). The backlash will be how people try to prevent companies building LLMs from accessing their creative output. Proposals I’ve seen include a more toothful robots.txt type of solution. But making text and video on the internet less accessible to machine reading (and thus assistive technology) is another solution with severe consequences to the disabled. For educational video, the solution for protecting faculty intellectual property and student privacy is strict content security: platforms that better protect access to video (and captions) stand out from the field.