Where do you choose to keep your data, managed DB or a DIY solution?

IVR · April 7, 2021, 6:21am

Hi all,

I’m building a Phoenix API that will need a DB, so I was wondering where do people tend to keep their data. I’ve seen some managed DB options e.g. DigitalOcean but it’s not that cheap. I was also considering rolling out my own DB if I’d rent a VM and launch docker-compose on it with Postgres. I understand that scaling would be a bit difficult, but if needed, then I should be able to copy my DB and move it to a different VM that would still probably be cheaper than a VM + a managed DB.

I’d love to hear what you think!

egze · April 7, 2021, 8:00am

For prototyping phase and also some finished side-projects - I host everything myself. Just do backups if needed and you should be good. I have no issues at all and use https://dokku.com.

If it’s a serious commercial project and a loss of DB will lose you money - a managed DB is probably better.

Exadra37 · April 7, 2021, 8:59am

Just be very careful deploying this to an online server, because it’s easy to leave it open to the world:

Unless I misunderstanding the warning it seems that we even have a window to be hacked while we setup everything. You can say that is very unlikely, and I will say you that it can happen if an automated script hits your brand new server at the right time, and this type of scripts run 24/7, they never stop, and they do an infinite loop through all public ips. Just launch a brand new server and then immediately tail its logs and you will soon see automated scripts hitting it to iook for known vulnerable systems that can be hacked.

IVR · April 7, 2021, 10:47am

@Exadra37 @egze thank you for commenting!

I’ve never used dokku, I’ve heard good things about it but admittedly, I still don’t quite understand how is it different from, say, just deploying docker on a VM. The vulnerabilities @Exadra37 mentioned sound pretty horrific though! It seems like enough of a put-off to look for alternative approaches, unless I’m missing something here

egze · April 7, 2021, 10:58am

First I want to make clear that what @Exadra37 wrote is not a vulnerability, but just something to keep in mind. When you setup a VM there are also steps needed to make it secure. And nobody forces you to use a web installer, I didn’t.

Then what is better with dokku than doing it on your own is ease of use. Things like adding a DB to a project, rotating SSL certificates, hosting multiple sites on the same machine is just 1 command. It’s the same like Heroku or Gigalixir, but still under your control and cheaper.

Exadra37 · April 7, 2021, 11:04am

Unfortunately our industry as this type of bad examples everywhere, read MySQL, MongoDB, Elasticsearch and many others are insecure by default(last time I checked), and if we don’t grok how to properly deploy them securely then we will shoot ourselves in the foot pretty dam easily.

Even deployment of VMs tutorials out there tend to be weak in terms of security or to not have any hardening at all.

This is all due to make it easier as possible to adopt a software… aka developer convenience over security.

You know, us humans have a natural tendency to choose whats easy to use, but in our industry this as an high cost in terms of security.

I am a strong believer that security MUST BE opt-out, not opt-in as it is now in the majority of released software.

Exadra37 · April 7, 2021, 11:55am

From my point of view it’s a vulnerability in the way they deploy their software, but this is me that never understood why this mentality of releasing software doesn’t change, despite all data-breaches occurring due to software that is insecure by default. Just use shodan.io to lookup for your software of choice and find some open installations.

In my opinion, software released nowadays shouldn’t give a tool to make it easy for us to shoot ourselves in the foot, and then put a huge warning on the docs as a disclaimer to put the onus on the end user of such software

derek-zhou · April 7, 2021, 1:29pm

I don’t trust any database, PostgreSQL included, with port open to the wild. For small scale deployment, I always run the DB in the same VM, through a UNIX domain socket. This is dead easy for PostgreSQL.

vincer · April 7, 2021, 1:41pm

I use Pulumi to deploy the Bitnami Postgres helm chart to Digital Ocean.

IVR · April 7, 2021, 2:49pm

Ok, so the data resides on the same VM then, right? That’s what I was thinking of doing, but I wasn’t sure if it’s a bad practice

IVR · April 7, 2021, 2:53pm

Thanks for sharing! Man, I need to read up on all this infra tech. Right now, I can’t quite tell how is Pulumi different from Dokku. I’ve seen Pulumi compared to the likes of Terraform which I think would be a bit of an overkill for me if all I need is a single VM

Exadra37 · April 7, 2021, 2:56pm

I just deploy Postgres and my app with docker behind Traefik loadbalancer for docker containers as I mentioned to you in this thread:

I deploy with a small bash script from my computer.

I build the VM in digital ocean with an init script, therefore when it comes online its already secured, as I mention here:

IVR · April 7, 2021, 2:57pm

I’m not an expert, but do you need a load balancer for a single VM?

Exadra37 · April 7, 2021, 3:05pm

It’s good for deploying without too much downtime or no downtime at all.

In the case of Traefik it allows me to run many apps in docker and have Traefik managing TLS certificates automatically for me. I just need to start the docker container with some Traefik labels

vincer · April 7, 2021, 3:19pm

Pulumi is a better version of Terraform in my opinion. I use it for my one VM deployments which autodeploys on every push to Github and I use the same recipe for every project now, but indeed some will consider it overkill. The recipe is here in case anyone is interested.

derek-zhou · April 7, 2021, 4:08pm

Don’t think about scaling until you need to. One VM can get you pretty far. Most hosting companies let you upgrade your VM to a higher tier with very small downtime.

Exadra37 · April 7, 2021, 4:27pm

In Digital Ocean and Linode you have floating IPs, therefore you can setup a new VM and when it’s ready you just switch the internal IP associated with the floating IP, thus the downtime is seconds.

IVR · April 7, 2021, 4:53pm

Possibly a silly question: would this new VM retain the data from the old VM? I’m just thinking about this in the context of a DYI DB solution where I’d spin up a postgres instance pointing to a particular location on the VM.

derek-zhou · April 7, 2021, 5:05pm

Worst come to worst, you have to shut down the VM, migrate the VM image to another physical machine in the same data center. boot up the new VM in the higher tier, retain the old IP. The provider will handle all this for you, and the whole process take a few minutes.

If you want to migrate the data your self, you can flip a switch in postgres config, reload the DB config (a few seconds)
Then replicate the data to a new DB server. The replication could take over night in the worse case but your service is up all the time. Then you take down the old server, assign the old IP to the new DB server, and reboot the new DB server. In this case you will have 2 downtime, each maybe 15 seconds.

All of the above are solved problem that exists for years. You are not going to do this very often anyway.

Exadra37 · April 7, 2021, 5:07pm

For a new VM you need to copy the data from the old VM or instead you can try to use data volumes, that both DigitalOcean and Linode provide at additional cost. They are like external disks, thus you can swap VMs and always point them to the same disks.