Hey folks,
Does anyone have experience using Postgrex with AWS Redshift, preferably recently and/or in production?
My team is looking to use Elixir to be a broker for a lot of our ETL data engineering work, and we make heavy use of Redshift, among other tools that traditionally have very good support for JVM languages.
I know there has been work done on this front before, but I’m unclear as to where it sits now. For example:
i use postgrex with redshift internally. you need version 0.13.1 or later (not 1.0.0-rc.0 or 1.0.0-rc.1 which predate the 0.13 series) but it should work fine
i don’t use ecto with that particular system. it may generate sql that redshift can’t handle (as it supports only a subset of what postgres supports) but i think if you stick to queries that redshift can handle it would be fine
We’ve gone with using Ecto to run stringified/templated SQL against our Redshift cluster. The system is purely internal and only runs trusted queries, so we elected to go this route despite string interpolation in SQL being wildly insecure otherwise.
One thing working in our favor is that the queries we’re running against Redshift using this method are 1. largely administrative, 2. simple, 3. rarely change. For those reasons, we haven’t felt a great need to use Ecto’s query builder or anything on that level of sophistication.
Overall this works well enough for our particular workflow, but I would definitely not use this method if any piece of the system was public facing.
thanks! I started trying out postgrex as suggested by the other user. Just wondering do you do upserts on your redshift data? It sucks that in order to update records, i have to delete and re-insert as per redshift document.
Thanks! =)
redshift doesn’t support upsert and the redshift client library is actually just postgrex so if you try to use ecto upserts with redshift you’ll get sql errors in your queries
Thanks! I’m sticking with postgrex then. Do you have to use a pool to manage db connections. poolboy? I think Ecto repo has a pool of connections to work off of.
Since you are also using Redshift from Elixir you might be interested that we released RedshiftEcto (Redshift adapter for Ecto) yesterday. It aims to provide a good experience working with a Redshift database from Ecto. Features unsupported by Redshift will raise an error and the adapter supports table and column options specific to Redshift in migrations.
We’re in a VPC and I was trying to tunnel through a bastion box to connect from my machine. Otherwise nothing is really special, it’s phoenix 1.4, ecto 2.2, and that’s about it. We we just getting timeouts on any attempt to connect.
We may try to run this on regular Postgres, and see if that meets out latency goals.
We have a similar setup, but using a VPN instead of ssh tunnelling to connect to a private Redshift instance inside the VPC’s private network.
I just tested it with an ssh tunnel instead and that works for me. Can you otherwise connect to Redshift through the same tunnel with a regular client (e.g. psql or a desktop app)?
I appreciate the quick response given that you hadn’t posted in this thread in over four years!
I asked because my new team is working heavily with Redshift using Postgrex, but there are a lot of quirks that force us to just send raw parameterized SQL strings to Redshift. What we have kind of works but isn’t ideal.