Using Axon & Nx (etc) to support accessibility?

Rich_Morin · March 1, 2023, 12:13am

In late 2022, @jnnks asked for help in finding a thesis topic: Looking for interesting topics to write my thesis on, maybe distributed training with Elixir Axon?. The thread has gotten quite a few responses, including several from @josevalim. Clearly, @jnnks has been getting a lot of high-quality help!

However, most of the responses have been either academically oriented or related to infrastructure issues. I’d like to venture in a different direction, considering practical ways that Nx, Axon, and such might be able to assist with accessibility issues. Maybe someone will decide to start a project (:-).

Recorded Content

I really like the fact that I can “attend” assorted presentations on YouTube and other venues, listen to recorded podcasts, etc. Although I’m missing the “Hallway Track”, there are compensations:

I can adjust the speed, skip around, etc.
I can (re-)watch historic presentations.
I can read other viewers’ reactions.
I get social distancing by default.

In short, the ready availability of recorded presentations is a big win for me (and many others). Unfortunately, many of the recordings present substantial accessibility issues for people who are blind, deaf, or otherwise impaired.

I suspect that many of these problems could be mitigated substantially by current AI techniques, particularly since the processing wouldn’t have to be done “on the fly”. However, this is all way above my pay grade, so I’m just speculating below…

Audio Quality

All too often, the audio quality of recordings is compromised, making voices hard to understand. I’ve noticed problems with echoes, erratic volume, poor balance or tone control settings, etc.

Sometimes, this has to do with the A/V setup and/or recording equipment; other times, someone has decided to “improve” the audio track by adding music or sound effects. Obviously, none of this works well for the hearing impaired.

Slides, etc.

Although some presenters make their slides available online, most do not. And, even when slides are posted, they may not be in a form that is useful to the blind or visually disabled. For example, some PDF files just contain unindexed sequences of images.

In an ideal world, the slides would be available in a textual format, complete with semantic markup, time stamps, etc. This would allow them to be annotated, indexed, searched, summarized, etc. However, getting to that point from a presentation video would require some Real Work™, e.g.:

extracting time-stamped images from the video
recognizing and extracting slide from images
recognizing formatting, graphics, text, etc.
preparing a table of contents, index, etc.
generating an integrated “companion” file.

And a pony…

Although closed captions are sometimes available, their quality varies from decent to dismal. It would be great to be able to get high-quality transcripts, preferably integrated into the sort of textual files discussed above.

AstonJ · March 1, 2023, 1:29am

I saw this a few days ago and thought of you Rich! Apparently Microsoft’s Edge browser will have built in support for scraping info from videos in a deeper way than just the transcript…

https://www.tiktok.com/@theottochannel/video/7204008589428559105

And:

https://www.tiktok.com/@theottochannel/video/7204767540885622018

Rich_Morin · March 1, 2023, 2:34am

That could be a useful facility for any user who is trying to approach a large set of content. Blind, dyslexic, and visually impaired users, in particular, could use it to help with skimming material, etc. It could also be very useful for material that isn’t in one’s native language.

Sadly, Microsoft’s support for accessibility is a bit erratic, so I worry that any a11y-specific benefits of this facility will be largely coincidental and may not be well supported over the long term.

For example, Microsoft developed Soundscape as a research project. It has since turned into an extremely useful tool for Orientation and Mobility. However, they are now making a subset of it open source and removing the support infrastructure, so its prospects are unclear at best…