Hello, I’m a frontend engineer that is learning Elixir and the Phoenix to contribute to a side project that I’m working on. I’m hoping to get some advice on how best to architect and approach a feature that I’m working on that involves converting text to audio files, caching, and storage.
Here’s how the feature would/could work:
-
The client sends a request with specific parameters (text string, language, dialect, gender, etc.) to the backend api.
-
The api checks a cache to see if these exact parameters have been passed before.
a. If the parameters have not been passed before, the backend API makes a call to a text-to-speech cloud service (Amazon Polly, Google Text-to-Speech) to generate an audio file. The audio file is cached or stored (or both) somewhere and then sent back to the client.
b. If the parameters have been passed before, the audio file is retrieved from a cache or storage and sent back to the client.
Each audio file will be average around 4 KB in size. The number of parameters is theoretically infinite but it’s unlikely there will be more than ~36,000 unique requests/files.
My goal is to come up with the simplest implementation possible to limit the number of calls to a text-to-speech cloud service, and serve media files quickly to the client. Should I cache the audio files in redis? Store them in object storage like S3 and serve them with a CDN? Both? Are there any special considerations I should make given how Phoenix and Elixir work, and the third party libraries that might be available? Any best practices for a junior Elixir engineer to follow?
Any thoughts or advice would be really appreciated. Thank you!