What is the best way for a beginner to approach caching, storing, and serving audio files?

PrawnMee · August 3, 2022, 9:57am

Hello, I’m a frontend engineer that is learning Elixir and the Phoenix to contribute to a side project that I’m working on. I’m hoping to get some advice on how best to architect and approach a feature that I’m working on that involves converting text to audio files, caching, and storage.

Here’s how the feature would/could work:

The client sends a request with specific parameters (text string, language, dialect, gender, etc.) to the backend api.
The api checks a cache to see if these exact parameters have been passed before.

a. If the parameters have not been passed before, the backend API makes a call to a text-to-speech cloud service (Amazon Polly, Google Text-to-Speech) to generate an audio file. The audio file is cached or stored (or both) somewhere and then sent back to the client.

b. If the parameters have been passed before, the audio file is retrieved from a cache or storage and sent back to the client.

Each audio file will be average around 4 KB in size. The number of parameters is theoretically infinite but it’s unlikely there will be more than ~36,000 unique requests/files.

My goal is to come up with the simplest implementation possible to limit the number of calls to a text-to-speech cloud service, and serve media files quickly to the client. Should I cache the audio files in redis? Store them in object storage like S3 and serve them with a CDN? Both? Are there any special considerations I should make given how Phoenix and Elixir work, and the third party libraries that might be available? Any best practices for a junior Elixir engineer to follow?

Any thoughts or advice would be really appreciated. Thank you!

speeddragon · August 3, 2022, 10:52am

For storage in S3, you can use Waffle. You can also use ETS (in memory key-value storage) but not sure if there is any drawback if it reaches hundreds of megabytes. You can start with storing the files locally and reading them when needed. If the IO or latency becomes a problem you can iterate to use CDN or in-memory cache.

Do you know how you’re going to deploy the application? Docker image with EBS or have an EC2 and use mix releases?

PrawnMee · August 3, 2022, 1:14pm

Thanks for taking the time to respond. I’ll definitely look into Waffle and ETS.

The application is deployed using mix releases on Render.