Choosing host for 1k concurrent users

First off I need to thank everyone active here for their generosity.
I’ve just gotten into elixir a few months ago and really, there are couple of people here who make the difference for the ecosystem. Coming from jsland the general feeling is spoiled.

I’m trying to determine how/where to host a system that expects 1k concurrent users few weeks after launch, and about 10 times that by the end of first year. Being confident in these expectations and solo on the dev team, a Sacrificial Architecture or infrastructure, is undesirable. By sacrificial in this case I mean swapping infra e.g. from Gigalixir/Render/Docker/bare metal in mid flight

Extra parameters to the system:

  • Monolith

  • Extra services: Monitoring (promotheus/grafana), push notifications (pigeon), email (bamboo), background jobs (oban), and more

  • db: Postgres

  • Budget: targeting a developing country, so bootstrapped for foreseeable future

  • Time spent: most likely solo on this for a while, but double full time

  • No preference wrt Docker or not.

  • Fairly new to infra/devops

Sorry for not asking a specific question (there prob isn’t specific answer); I am essentially looking for general insight on the deployment of such a system, in particular on the following things:

  1. General app architecture/infra topology. 2 servers/vm’s min? App, monitoring, push notification on diff/same servers?

  2. Hosted vs self hosted Postgres (most likely going for hosted)

  3. Static file hosting (s3/wasabi/hetzner)

  4. Costs (savings) to expect running on bare metal vs PaaS (e.g. Hetzner vs Render) at said scale

  5. Complexity of learning to configure and deploy to bare metal, i.e. setting up networking, security, patching, maybe CI/CD, etc.

  6. Amount of extra time to expect to spend per week for bare metal maintenance (vs PaaS)

Past experience, case studies, articles, insight, anything helps.

Cheers !


I had a similar situation before. I developed a system that handled like 500 req/s and spiked to like 2000.

It was like 4 years ago, so some tools had changed. But I used these:

  • Bare metal servers
  • Ansible for provisioning
  • Self-hosted postgres, one master, provisioned using Ansible
  • Also self hosted Cassandra nodes using Ansible
  • Used distributed nodes to fan out work load
  • Used nginx as the front server, started with one and then increased
  • Used DNS to load balance between nginx instances

I provisioned servers as shared, meaning I used one server for different purposes. Like I had a Postgres slave and a Cassandra node on one server. This can be easily automated using Ansible.

Anyway, Ansible (and similar tools) are great to do dev ops. I didn’t use any container system, so Ansible was the choice. There are a lot of playbooks for anything you want, so give it a try.

If you have any questions, I will be glad to answer.


Thanks for the reply !

Plenty ! :smiley:

  • Were you previously experienced in bare metal deployments/Ansible? If not what should I expect in terms of time expenditure to get up to speed ?
  • Were you previously experienced in dba? What factors led you to self host your db’s?
  • What were the specs of the boxes? How many boxes to handle those spikes? What did the bill for this setup look like ?
  • How did you determine which services shared servers with others ?
  • What were the security considerations/implications ?
  • I was experiences with bare metal servers before. I always handle my own servers, I don’t like hosted solutions. I was not experiences in Ansible, but it was rather easy to hop in. If you have done any sysadmin/devops thing before, it will be rather easy. Consider a week to a month based on your own experience to get to the point where you can manage your servers with Ansible.

  • I was. I manage MySQL/Postgres on a daily basis, for work and for my own projects.

  • I don’t remember the costs exactly, but we had 4 I think. 4/8 core CPU with 64GB RAM, from Hetzner. You can look it up. We had private network between them, and also rented some racks for future expansion, so you have to look for yourself.

  • Based on expected load. Postgres slaves had minimum loads, and Cassandra distributes loads, so having them on one server was not an issue. Nginx uses network a lot, and a bit of CPU, and Postgres master uses disk and ram a lot, so they can be shared too. Our own app needed cpu much more than RAM and disk, so it could also be shared with a postgres slave. We had zabbix for monitoring, and it did not impose any load so it could be anywhere.

  • We had internal network totally open between all servers, and only 80/443 opened on public ips. Basic security practices in coding should be considered, e.g. XSS CSRF injections … .


Server security is much more complex then this, unless you just decide to stick with the defaults that may already have or not a decent level of security.

Mandatory for any server:

If your server doesn’t have enough entropy and probably don’t have, then all the cryptography performed with it will be weak, and pretty much all apps we write will use cryptographic operations, like for user authentication and tokens.

Just to give an idea of what more should be done:

A proper secured Linux server takes a lot of effort and knowledge and needs ongoing attention.


Thanks for the links! Question though, they all refer to securing individual servers/vm’s. I may be misreading your reply, but is it not a thing to just secure the network containing the individual servers/vm’s?

1 Like

Security is always about layers, like in medieval castles.

So, if you have your servers inside a virtual private network that doesn’t mean you just leave them open.

What happens if anyone bypasses the network security firewall?

What happens if someone breakouts from within an application in one of the servers on this virtual private network?

The answer: Free lunch, because now he has free access to everything inside that network.

Do know how jails are built with a huge amount of layers of defense? Exactly to avoid the free lunch scenario when one of the layers is breached :wink:


You’re totally right. I just mentioned what worked for us there. Security should always be measured against your application. That was enough for us, and we didn’t have a single attack.
It is also a thing that you must always keep an eye on, not just a fire and forget thing.


That you were aware off :wink:

And just because we had not experienced an attack in our server(s), that doesn’t mean that the default setup is enough… sometimes is only a question of being lucky, other times it’s just because we haven’t caught the bad guys attention, because when we got their attention we will have a very hard time to keep up with them.

1 Like

I agree with that. Anyway, we were not of interest to those professional bad guys, or we thought so. Maybe luck, maybe this, our security model was successful :slight_smile:

1 Like

I get that, but what if you’re not building a jail. Security is ever expanding, it could quickly become overwhelming. I may be wrong, but it seems like most mere mortal businesses get by without anywhere near exhaustively addressing their threat models. Balancing expenditure and business value is the topic here in fact. Surely there is a sweet spot between sane defaults and some of the gov-level-security/PCI compliance checklists you linked above?


That I listed is far away from gov-level-security/PCI. For that you need to do even more :wink:

What is listed in the links I gave doesn’t need to be all applied, and can’t, because you will need to make choices between things, and some you may not need, depending on your deployment strategy and for where you are deploying.

Also, and as you said you will need to balance the security model to adopt against your use case and law requirements, like GDPR in Europe.

So, I am not saying that all of the above needs to be done, but the usual tendency of following setup guides for servers and leave them at it will not be enough.


This is key, @Exadra37 advice should not be taken as a list of must do or as some annoying security weirdo that just wants to show off how aware he is about security. It’s quite solid and reasonable advice.

A couple years ago I did a security course with a security consultant. The key takeaway from it was that the security that surrounds your servers should be the result of a process in which you balance the security needs, the system’s usability and what you’re able to do right now(budget, overall knowledge, time), but it should never be the result of neglecting security or deeming it as unimportant. Techniques are important but more important is the mindset/culture around it.

I understand security advices may come off as annoying, maybe because it’s something we don’t want to deal with but we know it’s of critical importance so we do it reluctantly, but the point is not to take it to the limit of logging directly to paper sheets so attackers can’t read logs, or to think about every “but what if someone does x???” but to be aware of what can you do to improve your current security and which pitfalls to avoid.

You don’t send data to the database before sanitizing it, even if you’re not building a jail. The same applies to your infrastructure as a whole. You never know when that guy with lots of free time will stumble with your server and start trying to find well known weak points in your walls.


Thanks a lot for your kind words.

I really loved the way you putted your entire post. Many thanks for sharing your experience and insights :slight_smile:

You nailed it :tada:

Plus a lot of automated tooling exist to try to breach servers automatically and/or to just find interesting target to manually breach afterwards. Just setup a server online, and then the first thing you do his to tail its logs and you can see immediately requests coming in to try to find vulnerable software.

The thing that many don’t realize is that they may have their server breached for years, and being used wisely to not raise suspicions of its owner. They are used a lot as command and control servers, and the hacker that breached it will secure the server very well, so that they can be sure that no other hacker get access to their command and control server :wink:

The philosophy of following setup guides and do nothing afterwards, because it’s enough for everyone may end-up having the server listed here:

This is the Google search engine for servers open to the internet, aka mis-configured, like with open databases and other interesting stuff.


This is how two months ago I found out that I left my PiHole open to the internet and for about ~32 hours random people were using it as a DNS server. :003: (Later I restored an older snapshot of the server just to make sure that nobody got in; plus made stricter rules in my router.)

Plus it was a really dumb blunder with my router settings, I simply misclicked on a field in my Firewall / NAT settings.

Lesson learned! I periodically check Shodan for my IP addresses these days.


During the time you setup manually a server you can be breached, and this is not theoretically, it happens, thus you must always use the automated mechanisms that your hosting provider gives you to configure the server upon creation, like the cloud-init file.

The server firewall should be all locked by default and you just open the ports strictly necessary to operate:

And in the spirit that I want to show how you can do it I will share my current Work In Progress for a security guide to be used as part of my Phoenix 360 Web Apps guide.

So, when configuring the creation of a Debian server add this bash script to the cloud-init file:


set -eux


# * Set sudo to timeout in 1 minute
# * Set alert on .profile or .bashrc to send an email each time a login occurs???
# * Secure Docker with TLS
# * Uninstall or secure Perl

# @link

# @link

# @link

# @link

# @link

# @link

# @link
# @link
# * Check open ports: sudo ss -lntup

### ---> FIREWALL
apt install ufw

ufw default deny outgoing comment 'deny all outgoing traffic'
ufw default deny incoming comment 'deny all incoming traffic'

# ufw limit in ssh comment 'allow SSH connections in'
ufw allow in ${SSH_PORT} comment 'allow incoming SSH connections'

ufw allow out 53 comment 'allow DNS calls out'

ufw allow out 123 comment 'allow NTP calls out'

ufw allow in http comment 'allow HTTP traffic in'
ufw allow in https comment 'allow HTTPS traffic in'
ufw allow out http comment 'allow HTTP traffic out'
ufw allow out https comment 'allow HTTPS traffic out'

ufw enable

# allow whois if installed
# ufw allow out whois comment 'allow whois'

# allow traffic out on port 68 -- the DHCP client
# you only need this if a DHCP server is installed and being used
# ufw allow out 68 comment 'allow the DHCP client to update'
### <--- FIREWALL

### ---> Iptables Intrusion Detection and Prevention wihr PSAD
# @link
# NOT SURE: it install a lot of dependencies and the one that puts me off is the
# mail server exim.
### <--- Iptables Intrusion Detection and Prevention wihr PSAD

### ---> Application Intrusion Detection And Prevention With Fail2Ban
# @link
# NOT SURE: Needs investigation if makes sense for a server running only docker containers
### <--- Application Intrusion Detection And Prevention With Fail2Ban

apt update
apt install -y --no-install-recommends git haveged

# @link
update-rc.d haveged defaults

git clone
cd debian-traefik-setup

cp -r ./traefik /opt

  cat <<'EOF'
) > .env

mv .env /opt/traefik
chmod 660 /opt/traefik/.env


rm -rf /debian-traefik-setup

if grep -q :1000: /etc/passwd; then
  echo "---> Debian user exists and is being deleted."
  userdel -Z -r -f $(id -nu 1000)
  echo "Debian user doesn't exist."

# Creates an unprivileged user, without sudo access, but belonging to the docker
# group to allow for programmatic launch of docker containers.
useradd traefik --create-home --uid 1000 --shell /bin/sh
usermod -aG docker traefik

# @link
# On Linux, you can disable password-based access to an account while allowing
# SSH access (with some other authentication method, typically a key pair).
# Using `*` as a placeholder for the password hash is just a convention to
# make the sytem think the user as a password, but it's an invalid one,
# because it's not a valid crypto hash, therefore the user will never be
# able to use the `*` as a valid password when prompted to input one.
usermod -p '*' traefik

cp -R /root/.ssh /home/traefik
chown -R traefik:traefik /home/traefik/.ssh
chown root:traefik /usr/local/bin/docker-compose

# Creates an unprivileged user with sudo privileges. Use only to perform
# administrative task in the server.
# @link
USER_PASSWORD_HASH='---> GENERATE ONE IN YOUR PC WITH: openssl passwd -6 your-password-string-here <---'
useradd traefik_admin --create-home --uid 1001 --shell /bin/bash --password "${USER_PASSWORD_HASH}"
usermod -aG sudo traefik_admin

cp -R /root/.ssh /home/traefik_admin
chown -R traefik_admin:traefik_admin /home/traefik_admin/.ssh

# rm -rf /root/.ssh

# Protocol 1 is insecure and must not be used.
echo "Protocol 2 # $(date -R)" >> /etc/ssh/sshd_config

# Unless we really need it its best to have it disabled.
sed -i -E "/^#?X11Forwarding/s/^.*$/X11Forwarding no # $(date -R)/" /etc/ssh/sshd_config

# Can be used by hackers and malware to open backdoors in the server.
sed -i -E "/^#?AllowTcpForwarding/s/^.*$/AllowTcpForwarding no # $(date -R)/" /etc/ssh/sshd_config

# @link
# ---> Enhanced from the github repo
# sed -i -E "/^#?AllowStreamLocalForwarding/s/^.*$/AllowStreamLocalForwarding no # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?GatewayPorts/s/^.*$/GatewayPorts no # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?PermitTunnel/s/^.*$/PermitTunnel no # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?Compression/s/^.*$/Compression no # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?TCPKeepAlive/s/^.*$/TCPKeepAlive no # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?AllowAgentForwarding/s/^.*$/AllowAgentForwarding no # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?MaxAuthTries/s/^.*$/MaxAuthTries 2 # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?MaxSessions/s/^.*$/MaxSessions 2 # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?MaxStartups/s/^.*$/MaxStartups 2 # $(date -R)/" /etc/ssh/sshd_config
# sed -i -E "/^#?LoginGraceTime/s/^.*$/LoginGraceTime 2 # $(date -R)/" /etc/ssh/sshd_config
# <---

# @link

# ---> Disable root user login:
sed -i -E "/^#?PermitRootLogin/s/^.*$/PermitRootLogin no # $(date -R)/" /etc/ssh/sshd_config

sed -i -E "/^#?ChallengeResponseAuthentication/s/^.*$/ChallengeResponseAuthentication no # $(date -R)/" /etc/ssh/sshd_config

sed -i -E "/^#?PasswordAuthentication/s/^.*$/PasswordAuthentication no # $(date -R)/" /etc/ssh/sshd_config

sed -i -E "/^#?UsePAM/s/^.*$/UsePAM no # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Disable password based login
echo "AuthenticationMethods publickey # $(date -R)" >> /etc/ssh/sshd_config
sed -i -E "/^#?PubkeyAuthentication/s/^.*$/PubkeyAuthentication yes # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Limit Users ssh access
echo "AllowUsers traefik traefik_admin # $(date -R)" >> /etc/ssh/sshd_config
# <---

# ---> Disable Empty Passwords
sed -i -E "/^#?PermitEmptyPasswords/s/^.*$/PermitEmptyPasswords no # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Change SSH Port and limit IP binding
# changing here the default port for SSH, implies also to open it in the firewall
sed -i -E "/^#?Port/s/^.*$/Port 26928 # $(date -R)/" /etc/ssh/sshd_config

# listen only in IPV6
#sed -i -E "/^#?AddressFamily/s/^.*$/AddressFamily inet # $(date -R)/" /etc/ssh/sshd_config

# optionally lock it down to your IPv6 address
#sed -i -E "/^#?ListenAddress/s/^.*$/ListenAddress YOUR.IPV6.HERE # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Thwart SSH crackers/brute force attacks
# @TODO Install DenyHosts, Fail2Ban or similar

# ---> Rate-limit incoming traffic at TCP port for SSH and HTTP(DDOS attacks prevention)
# @TODO Configure the firewall with specific rules

# ---> Use port knocking (optional)
# @TODO Maybe with knockd

# ---> Configure idle log out timeout interval
# set for two minutes
sed -i -E "/^#?ClientAliveInterval/s/^.*$/ClientAliveInterval 120 # $(date -R)/" /etc/ssh/sshd_config

# only 1 ssh session per user is allowed
sed -i -E "/^#?ClientAliveCountMax/s/^.*$/ClientAliveCountMax 0 # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Disable .rhosts files (verification)
sed -i -E "/^#?IgnoreRhosts/s/^.*$/IgnoreRhosts yes # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Disable host-based authentication (verification)
sed -i -E "/^#?HostbasedAuthentication/s/^.*$/HostbasedAuthentication no # $(date -R)/" /etc/ssh/sshd_config
# <---

# ---> Chroot OpenSSH (Lock down users to their home directories)
# @TODO This one depends on the operational needs for each user of the system.

# ---> Bonus tips from Mozilla
# @link
# Supported HostKey algorithms by order of preference.
echo "HostKey /etc/ssh/ssh_host_ed25519_key # $(date -R)" >> /etc/ssh/sshd_config

echo "HostKey /etc/ssh/ssh_host_rsa_key # $(date -R)" >> /etc/ssh/sshd_config

echo "HostKey /etc/ssh/ssh_host_ecdsa_key # $(date -R)" >> /etc/ssh/sshd_config

echo "KexAlgorithms,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256 # $(date -R)" >> /etc/ssh/sshd_config

echo "Ciphers,,,aes256-ctr,aes192-ctr,aes128-ctr # $(date -R)" >> /etc/ssh/sshd_config

echo "MACs,,,hmac-sha2-512,hmac-sha2-256, # $(date -R)" >> /etc/ssh/sshd_config
# <---

# All Diffie-Hellman moduli in use should be at least 3072-bit-long (they are used for diffie-hellman-group-exchange-sha256) as per our Key management Guidelines recommendations. See also man moduli. To deactivate short moduli in two commands:
awk '$5 >= 3071' /etc/ssh/moduli > /etc/ssh/moduli.tmp && mv /etc/ssh/moduli.tmp /etc/ssh/moduli

# LogLevel VERBOSE logs user's key fingerprint on login. Needed to have a clear audit track of which key was using to log in.
echo "LogLevel VERBOSE # $(date -R)" >> /etc/ssh/sshd_config

# Log sftp level file access (read/write/etc.) that would not be easily logged otherwise.
sed -i -E "/^#?Subsystem sftp/s/^.*$/Subsystem sftp \/usr\/lib\/openssh\/sftp-server -f AUTHPRIV -l INFO # $(date -R)/" /etc/ssh/sshd_config

sshd -t

systemctl restart ssh.service

systemctl status ssh.service

### ---> Perl Hardening
# Doing it as the last step, just in case something before adds some more Perl
# packages
find /usr/bin -type f -name perl* | xargs chmod 700
### <--- Perl Hardening

This is far from being completed, but already provides a good starting point in terms of security.

IMPORTANT: Don’t forget to manually generate the password hash to add to the var USER_PASSWORD_HASH in the above bash script. Follow the instructions in the comments.

If you don’t need/want docker and traefik, just remove them from the script :wink:

NOTE: Just for you to have an idea I spent 1 entire week of my holidays in full-time to research and try stuff in order to come up with the initial draft of the above script. Then I already spent many more hours improving it. So, security takes time and investment from your part, that you must be willing to invest in, otherwise you are just one more prey waiting for the day you will be hunted :wink:


Thanks to everyone for their insights!

Just to be clear I wasn’t implying that you (@Exadra37) were advocating gov-level-security/PCI compliance for systems that don’t need it. This thread got slightly hijacked :slight_smile: and my point was in line with the topic of this thread: there is surely a sweet spot where investments on security for a critical mass of systems start diminishing, and it would be highly beneficial for society at large for these security-illiterate (or poorly literate) developers (which I suspect is the majority) to be educated on what that point is (or how to define it for themselves), without having to get into an arms race to play cat and mouse with a potential

This is straight up shlepping paranoia. Most teams do not have a dedicated person to play with that guy. If we just focus on that guy, 99% of real world devs will suffer from paralysis by analysis.

Admittedly this is coming from someone (me) who hasn’t suffered the damage of getting hacked yet. But it really isn’t farfetched to suggest that “awareness raised” on this topic amongst the less security-literate can quickly evolve into paranoia which can seriously hinder productivity. Which in turn would be ironic because the vast majority of systems get by without.

App developers want to build apps, they don’t want nor have the time or money to play capture the flag. It is important for security that the security community manage its own expectations on what such devs should be implementing in terms of security. IMHO the security community’s job is to assist regular system builders on determining security baselines for various system topologies.

So please don’t take this the wrong way when I say that a kitchen sink of links that address such a vast array of layers, cases and compliances does not help devs find a “sweet spot”. Personally, I literally spent 3 hours reading some of your links yesterday and my takeaway was “I really don’t have time for this”, yet I budgeted 2 weeks of my time in building this app, for security.

EDIT: Also just want to add I’m looking forward to this !


For completeness sake, (and I will most likely have extra non-security-related questions down the line), what other tools/strategies were used to secure the network?

Besides somewhat basic security that I mentioned, there were really nothing special. Close all the ports. Changes SSH ports on all servers, only allow ssh into one server only with key, and open http/s ports where needed. Keep system updated. Hide software versions sent in headers. Sanitize everything. Pretty much basic things.

As I mentioned before, we were not really a target for professionals and these measurements were enough to keep out script kiddies.