At work, we have a process that captures small snapshots of data for an analysis routine that we run hundreds of times per day. The snapshots are analyzed by data science as well. We default to storing these snapshots in s3, but usually the data shifts over to being stored in postgres as we mature how each snapshot is used. I don’t see any value in defaulting to s3. I don’t know how much more it costs to store a json object in s3 file versus in a postgres/rds row, but I can’t imagine that being a compelling reason. Has anybody else faced this choice and chosen s3? If so, why?
The only times where I find plain flat files compelling for json data are when you are sending data between organizations, which may not share access to a relational database, or if you want portable data sets which can be exported and saved individually for use offline by users.
Unless you need to manipulate / update the objects I’d just stick with S3 (I think your intuition is right)
The difference is 5-10x depending on your IOPs needs. So if it would cost you $23/mo to store something in S3 (100gb), it’ll cost you around $230/mo to store it in RDS, and that isn’t even including the actual RDS instance, just basic SSD disk space. This may not matter to you but it definitely adds up. If you need terabytes the difference becomes quite hefty.
Depends on the size of the objects. If the average size is <= 16KB then database for sure. >=256KB then object store for sure. If it is in between then you pick whichever is more convenient for you.
Consider: Parquet format in S3 with S3 Select or Athena
RDS is very expensive, if you are not doing OLTP you may find other solutions to be more affordable & appropriate
Is it rds or postgres that is expensive? That is, would launching a container or ec2 with postgres be much cheaper than rds?
Yes, it used to be that running your own Postgres on EC2 would be cheaper (RDS gives you a lot of bells and whistles), you have to have the discipline and operational knowledge of running Postgres though, it is not a turn key solution
ps. I don’t think the words Postgres and Docker ever mixed well, always this or that problem