In late 2022, @jnnks asked for help in finding a thesis topic: Looking for interesting topics to write my thesis on, maybe distributed training with Elixir Axon?. The thread has gotten quite a few responses, including several from @josevalim. Clearly, @jnnks has been getting a lot of high-quality help!
However, most of the responses have been either academically oriented or related to infrastructure issues. I’d like to venture in a different direction, considering practical ways that Nx, Axon, and such might be able to assist with accessibility issues. Maybe someone will decide to start a project (:-).
Recorded Content
I really like the fact that I can “attend” assorted presentations on YouTube and other venues, listen to recorded podcasts, etc. Although I’m missing the “Hallway Track”, there are compensations:
- I can adjust the speed, skip around, etc.
- I can (re-)watch historic presentations.
- I can read other viewers’ reactions.
- I get social distancing by default.
In short, the ready availability of recorded presentations is a big win for me (and many others). Unfortunately, many of the recordings present substantial accessibility issues for people who are blind, deaf, or otherwise impaired.
I suspect that many of these problems could be mitigated substantially by current AI techniques, particularly since the processing wouldn’t have to be done “on the fly”. However, this is all way above my pay grade, so I’m just speculating below…
Audio Quality
All too often, the audio quality of recordings is compromised, making voices hard to understand. I’ve noticed problems with echoes, erratic volume, poor balance or tone control settings, etc.
Sometimes, this has to do with the A/V setup and/or recording equipment; other times, someone has decided to “improve” the audio track by adding music or sound effects. Obviously, none of this works well for the hearing impaired.
Slides, etc.
Although some presenters make their slides available online, most do not. And, even when slides are posted, they may not be in a form that is useful to the blind or visually disabled. For example, some PDF files just contain unindexed sequences of images.
In an ideal world, the slides would be available in a textual format, complete with semantic markup, time stamps, etc. This would allow them to be annotated, indexed, searched, summarized, etc. However, getting to that point from a presentation video would require some Real Work™, e.g.:
- extracting time-stamped images from the video
- recognizing and extracting slide from images
- recognizing formatting, graphics, text, etc.
- preparing a table of contents, index, etc.
- generating an integrated “companion” file.
And a pony…
Although closed captions are sometimes available, their quality varies from decent to dismal. It would be great to be able to get high-quality transcripts, preferably integrated into the sort of textual files discussed above.