Image Processing API Architecture

Hello everyone, I will start an image processing API in elixir that will have 2 main functionalities. The 1st functionality is to simply upload images to an S3 bucket. The 2nd functionality is the critical one, since this api will be consumed by a Ruby app for exhibiting several of these digital assets in different sizes and applying different transformations. My main concern is not the dependency that will process the images in Elixir, but instead is the architecture behind the api. The ruby app will be constantly fetching the images from S3 via the elixir app, I’ve been researching several technologies, and one that caught my attention was Broadway. I’ve used gen stage before, however I’m not sure if Broadway is exactly what I’ll need, how can I now that my system can handle all of those requests? that is one of my main concerns. If you guys have any suggestions or tips in order to achieve what I’ve mentioned, I appreciate it.

1 Like

I usually see GenStage as a way to throttle requests into external systems with back pressure. Do you need throttling / back pressure for your use case?

Are the transformations constantly changing, or will the Ruby app be requesting relatively consistent sizes and transformations? If they don’t change much, consider putting Cloudfront in front of the requests to cache the assets after the first request with the specific size/transformation.

The advantages are much less load on your api (only needs called when the asset doesn’t exist) and you reduce your S3 egress costs because Cloudfront will cache it for a year (or whatever lifetime you set). The fallback URL for Cloudfront would hit your Elixir app to do the initial resize or transformation and the result would then be cached for subsequent requests. This might alleviate the need for any throttling/backpressure in your Elixir app.

If the resize/transformation requests vary, then the above won’t help you much, but there are addons in AWS, such as Imagizer https://aws.amazon.com/marketplace/pp/Nventify-Nventify-Imagizer-for-Amazon-S3/B019YEIK7M, which are tailor-made for this (why reinvent the wheel). Imagizer lets you use either their SAS platform, or run it on your own EC2 instances.

Need both of them for my specific case

Haven’t seen that add-on you are describing. I’ll check it out!

There are a few SAS offerings out there that do something similar (https://www.imgix.com/ is another one). The advantage is that they take care of supporting all of the new image types that come out (HEIC/HEIF, etc.) and you can focus on your core business logic. All depends what kind of ‘transformations’ you are doing and if you need the Exif data maintained/updated in the images after modification (I think Imagizer strips the Exif out by default). If you can live with the limitations of these SAS offerings, you may not even need the second app since they’d be doing the hard work…it would essentially just become an asset server.

Another benefit of Cloudfront is that the cache gets copies stored geographically close to the end users, so it improves their page load times, even if your api has only one availability zone.

Without knowing your entire use model, not sure if Broadway, GenStage or even Flow would actually solve your bandwidth problem or not. If the end users expect the image to be served right away, back-pressure on the asset delivery isn’t the greatest solution and you’ll need to build a queuing or retry system in the consuming app to deal with that back-pressure/delay that the Elixir server would be applying. If this is the case, and you actually have a throughput problem, you might look at ways to alleviate the bottleneck before adding a lot of overhead in managing the back-pressure on both sides.

3 Likes

One way to do it is to use presigned urls and the user uploads to s3 directly… only using elixir to generate the presigned url.

Then you can point cloudfront to s3.

Then you can request the file directly from cloudfront… passing the cloudfront url to https://github.com/imgproxy/imgproxy running on ecs/fargate.

The advantage is there’s not too much code you have to write and images are available as soon as the user has uploaded. If imgproxy or another app like it (there’s a couple) has the transformations you need, it might be worth exploring.

1 Like

So I’ve really been kicking this idea around for sometime.

I know this maybe not the ideal solution for you, but I’m activity working to learn rust so I can make a web assembly app that will scale the images client side and then I will have my client side app directly upload them via signed urls to s3. I think cost alone will make this a much more effective solution long term.

https://silvia-odwyer.github.io/photon/ looks promising

2 Likes

So basically with Imagizer I just need an EC2 instance and a bucket and that’s it right?. The application just needs to deliver transformed images, perhaps having a Phoenix application is an unnecessary intermediate step… What I need is pretty basic, and this feels like can achieve what I need, in terms of delivering the images. Have you use it before (Imagizer) @drl123 ?

Yes I’ve used this before and it performed quite well. If the files are really large, you may need to scale them down and then apply the transforms to the resized image if it is taking too long. If you find you need more than one instance, you just set up a load-balancer with multiple EC2’s behind it and auto-scale them. There’s an AMI for Imagizer in the AWS marketplace and you spin up an EC2 instance using it, then just pass query string params for the transforms. I believe Imagizer’s documentation explains all of the setup…been a while since I looked at it last (and I wasn’t involved in the initial set up either).

We actually used it without any middle app to do transforms right from the JS front-end of the app. In that case, the main api app only managed the pre-signed URLs for upload and the access keys for reading and Imagizer did the rest of the hard work.

Cloudfronting what is a common request eliminates much of the load from the main app freeing up the resources for the rest of the business logic. This also means you don’t have to store the same image in multiple sizes…the other sizes are just transforms and just pull from the cache, falling back to a new request only if they’ve expired, which just causes them to be re-cached. It’s pretty efficient.

Cloudfront is both super cheap, and super performant for the end user because it keeps copies geographically close to them, so time to glass is kept to a minimum (way faster than trying to do this on the fly, even with parallel transforms going on).

Technically, you could even use infrequent access instead of standard S3 with the CF caching and save on your storage costs too.

Again, it all depends on your application and performance needs. For our use case, it worked extremely well and was nearly maintenance free. The only time we had to touch the mechanism was when a new version of the AMI came out…otherwise, it just worked. Depending upon your request volumes (and keep in mind that CF helps keep that minimal after initial caching) you could also use their SAS offering and not have to deal with setting up the EC2 instances. With our volumes, EC2 was a cheaper solution, but you will need to evaluate that for yourself.

Hope this was helpful.

2 Likes

This has been very helpful, having a middle Elixir application just for transforming really seems like reinventing the wheel, even more since we are using AWS for all of our projects. As a team we have decided that Imagizer is the path we will explore first because looks the most promising and we really liked the simplicity of this add on. However, there is a last personal question regarding CloudFront. How necessary is having this service? Is it absolutely a must?

1 Like

Using CloudFront is not a must, but if you don’t do it, then you’ll be making multiple calls to Imagizer if the image with the same transformation is re-requested (i.e. even for a page refresh). If you are running your own instances of Imagizer on EC2, that may not be an issue for you. If you go the SAS route, you’ll be paying for every request.

Keep things simple at first and move to CF if you find you need to.

You could also run a processing job right after upload to build the resized versions using Imagizer and store them in S3 with the original, then just serve the S3 versions so you only pay Imagizer once per size/transformation. If you do this, you may need a ‘processing’ state while the images are built and put on S3 before you could serve them. You can orchestrate a lot of that work with AWS Lambda’s, which are also super cheap, and not burden the API server…just post back to the API when the images are all built.

2 Likes