What are the disadvantages of deploying Elixir with Docker?

elie · January 25, 2017, 12:12am

I think I read somewhere that you shouldn’t deploy elixir with Docker but I can’t find the source now. Is there any truth to this?
And beyond that, what is the most hassle free of deploying Elixir in your opinion?

mkunikow · January 25, 2017, 12:37am

There was one discussion

If you deploy containers on production you mostly put them into some orchestration service
like … How this orchestration service play nice with Erlang VM …

AstonJ · January 25, 2017, 1:29am

Speaking generally, there’s the extra overhead which will equate to lower performance for the CPUs and R/W to disk. There are also possible issues with security and when your docker container needs to speak to something else on the server (or another container or to the web). And of course it’ll mean there’s yet another layer of tech you need to learn and stay on top of.

I would ask why you want to use Docker? A server is more than capable of running numerous apps of different tech perfectly well. On this server for example I’m running PHP sites, Ruby sites, Docker sites (Discourse forums) and soon Elixir sites

Docker makes sense for software providers like Discourse Imo, where they can very precisely select the software their platform runs on - means less headache for them when it comes to support. But for everyone else…

DianaOlympos · January 25, 2017, 8:11pm

You will need to use the Docker engine. Except if you deploy elsewhere.

In general, you will not lose a lot of things (hot code upgrade probably, that is doable with other container tech). But you will have to deal with network pain.

shanesveller · January 25, 2017, 8:40pm

Docker can provide system administrators with the encapsulation that is common (and usually considered desirable) among well-designed software code. It embraces the positive outcomes of immutability that many of us enjoy about languages like Elixir. It can isolate quirks in software packaging, mitigate differences between operating systems, and reduce the cognitive workload of deploying new or updated software. It is especially beneficial in polyglot organizations to unify the deployment process across many different project runtimes or languages, different versions of runtimes, etc. It can efficiently apply quotas for shared resources like CPU, memory, and disk IO, and isolate containers into one or several custom networks within the host.

It is certainly not a panacea, and honestly, if your workloads are small enough (or price-conscious enough) to deploy on a single VPS, you may not see huge wins.

CPU performance overhead on Docker is not statistically significant for most workloads, and is often less overhead than KVM virtualization, for example. Network I/O can be affected under basic port-forwarding but can be mitigated by using --net=host or other configuration tweaks. Similarly, disk I/O into the container filesystem can be slow under i.e. AUFS, but has little to no significant overhead if you move write-heavy workloads to output into a volume, which bypasses all the union filesystem shenanigans.

I’d like to see more supporting details here. The default behavior for outgoing traffic is a simple NAT over a bridge interface on the host, while incoming traffic is only exposed to the outside world by port-forwarding as part of the Dockerfile or the docker run command (or --net=host). This is both well-documented and fairly predictable, IMHO. As for security, you can run as non-root within the container, drop capabilities, constrain what ports you expose and what addresses you bind to, etc. I don’t find in practice that Docker is inherently more or less secure than running directly on the OS - you just need to establish good practices and you’re fine, same as without Docker.

jwarlander · January 25, 2017, 10:52pm

What many forget is of course that “good practices” for a well-maintained environment includes things like security updates of OS packages, kernel updates, etc - even for your Docker containers.

Since Docker containers are immutable, this involves rebuilding and deploying updated containers not only when applications are upgraded, but also when security updates are released for eg. the Arch, Ubuntu or CentOS environment your container is built with. Automation is key to making this feasible.

AstonJ · January 26, 2017, 1:03am

We had issues where the container would not connect to the internet so things like one-box, GitHub logins etc kept breaking. It required custom rules to the firewall.

There have been other anomalies too - where I had thought a container had been rebuilt but actually hadn’t - we ended up losing a couple of weeks worth of posts because of it (on MetaRuby).

Personally I would avoid Docker unless you have a specific use-case for it.

krapans · January 26, 2017, 6:29am

How would you suggest then scale trough AWS Elastic Beanstalk elixir apps?

mkunikow · January 26, 2017, 10:41am

There is option on docker hub to automatic rebuild you container if container you based on is changed. I think this is good practice.

You need someone from IT operations who knows this stuff
Personally I like containers very much:
you need postgres, mongodb, couchdb, jenkins … no problem

Also I like this setup
code change → build → test → create docker image → put into repository
gitlab can do this

For docker lovers great free training’s made by community
https://training.docker.com/category/self-paced-online

hosh · February 3, 2017, 6:07pm

I have an Elixir app in use, though it is in a non-critical area. I’ve been giving some thoughts about this.

The app I wrote was trivial, though necessary. It is essentially an HTTP proxy, used accept incoming github webhook and translates them in a way that go.cd understands. We don’t really get that much traffic on it. It is a stateless app. I could have used Ruby, but I wanted to play with Elixir.

I spent less than 1 hour on the code. I spent more than 8 hours trying to figure out how to get it into a container. I don’t regret that decision since my team is planning on using Elixir for microservices in the future.

There are some interesting gotchas:

(1) Figuring out how to accept configuration via environmental variables. This is fairly well-documented, but it’s part of what you have to figure out when building a release.

(2) Figuring out how to build a release in a way that works inside the container. Further, since I chose an Alpine base image (which uses muslc instead of libc), I had to modify other people’s Dockerfile to get that working.

(3) Figuring out how to build an exrm release into the container. This was shortly before Distillery went 1.0.

(4) Figuring out Unix signals and startup. Since I am starting this up with Docker and not with Kubernetes, I don’t have to translate SIGINT, SIGTERM, and SIGKILL into something the BEAM VM will accept.

When people speak of containers these days, they don’t usually mean docker. It’s what containers enable. The philosophical frame around this is still being worked out. Containers by themselves are not interesting, but what they enable is a greater degree of orchestration.

One of the key insights is that you treat containers like cattle, not pets. You name a pet. You feed a pet by hand. If the pet gets sick, you rush the pet to the doctor. You cry when your pet dies. Not so much with cattle. You number them, you don’t name them. If they die, that’s ok, as long as the herd is healthy. They are inherently transient, and the whole system is designed around containers being transient. (This works very well with what Mark Burgess is talking about with Promise Theory, which spawned the whole devops movement in the first place). Counter-intuitively, treating containers like cattle and assuming transient makes it easier to build in self-healing and makes the whole system more reliable.

The next key insight is this: treating containers like cattle is easy if you have stateless applications. Stateful applications are hard to treat like cattle. This is why, while people might run Mogodb or Postgresql with Docker, supervised by systemd, it’s only the brave that will try to run Mongodb on Kubernetes in production. The Kubernetes team have been working on identifying, exactly what is necesssary for stateful pods. Most of them need strong identity (pet names), stable IP addresses, and stable membership.

Even that is not necessarily enough. Each stateful app has it’s own characteristics. CoreOS introduced the idea of “Operators” (but this is really an older idea coming back from the Promise Theory / CFEngine days). The idea here is that there is an active software that determines the current state of the stateful app and coverges it towards the desired state. For example, there are some things you need to do if you want to increase the number of Mongodb replicas … as well as decrease them. Those things are specific to Mongodb and how it works. Capturing that in an Operator would allow a stateful app to work well in a environment that assumes transient containers.

Here’s the thing about Elixir and the BEAM VM: It does a pretty good job for running stateless applications, but it’s unfair advantage is that it opens up a lot of options for writing stateful applications. You don’t have to reach for a datastore Mongodb or Postgres. You can chose an in-memory store if that is the engineering tradeoff that makes sense. You don’t have to reach for Redis or a pub/sub system to do background processing. You can do that within the application itself. This allows you to gracefully scale up the complexity of your platform.

But if Elixir’s unfair advantage is that it is easier to write stateful apps, how would that work on an orchestration system designed for stateless apps?

For example: If you wanted to do hot code reload, how would you do that? Someone from the perspective of cattle might say, oh, don’t bother. Just do a rolling upgrade. That would be true to the extent we can separate out stateless apps (such as HTTP). What about chat? Chat is the precursor to changes coming down (relating to voice interfaces and AIs).

Some of the things that Kubernetes does overlaps what you can do in memory with Elixir. Kubernetes has what is effectively supervision in the form of replicaset, replication, and daemonset controllers. These controllers watch for when something starts diverging from the desired state and try to bring them back in line. The specs you feed them define not only the desired state, but under what circumstances something is considered dead, or needs to get replaced.

We can say that, sure, just go all the way with Elixir. Forget about the orchestration. The latest generation in orchestration are in many ways, catching up to what BEAM already provides. So why bother? But I’m also inclined to think, without some way for Elixir deployment to work with the current generation of deployment tools, it may go the way of Smalltalk. Smalltalk used to be big, before Java. You don’t really hear much about them anymore. The ecosystem actually has it’s own built-in version control system for installing packages … and none of that is visible on the web the way code distributed on Github is.

I’m thinking ideas such as – what if an Elixir app can function at it’s own Kubernetes controller? (It just needs to talk to the api via REST or rpc; service account token can be provided) What if you can use the hot code update with a rolling update mechanism? What if the erl-to-erl meshing is better integrated with Kubernetes service discovery?

ericlathrop · February 3, 2017, 6:49pm

I’ve got a set of elixir apps running in Docker (AWS ECS), and the main problem is distribution / clustering. The way that nodes find each other, is through epmd which runs on port 4369. When nodes try to connect, they will talk to port 4369 and ask where the “www” node is. This is a problem with Docker because you can only have one container exposed on port 4369, so essentially you can only have a single container per host if you want to use clustering. If you’re running Erlang 19.0, you can write your own epmd replacement to work around this limitation, but it looks like a bit of work.

OvermindDL1 · February 3, 2017, 8:02pm

You cannot expose the epmd port of 4369 of one ‘master’ container or so to all the others so they just share the single epmd? I’m curious if epmd would even work that way, hmm…
Why not just run epmd inside the container and just link to the other containers IP’s?

DianaOlympos · February 3, 2017, 8:47pm

The other solution is… Use SmartOS container. Nothing stop you from doing them as “distributed stateful app” or even a stateful app. They deal with it nicely. There are other solutions than K8s out there

OvermindDL1 · February 3, 2017, 8:53pm

Exactly this, these are awesome!