Just Heroku-ing the whole thing, even though I’d lose some BEAM-y goodness by doing so
@landric Would a distillery/docker deployment on Heroku be an option?
My guess is your 3rd option is a mix-based deployment using Heroku’s buildpacks, which is not how Elixir code should be deployed imho… You would have a 100% BEAM compatible binary, plus the peace of mind not to have to deal with managing instances!
@landric Would a distillery/docker deployment on Heroku be an option?
mix is a development tool, it was not meant to run code in production! Building releases directly from mix is actually an ongoing effort by the Elixir core team…
Theoretically? But Docker is just a whole other thing for me to learn. Could I take the time? Sure. And I’d be interested to hear more about the practical (as opposed to… philosophical? Not trying to be perjorative, jut not sure I have the right word) benefits of Distillery -> Docker -> Heroku vs. just using the Elixir Buildpack to run the app with mix.
There are a few documented limitations in Phoenix’s Deploying on Heroku guide, which might not be relevant to your project. Running an Elixir project in production using the language development tool is clever, but not what was meant to be.
This thread from 2017 actually explain it better than I could ever do :
But above all, I like the self-contained, isolated nature of the release. You won’t pickup some unexpected module from somewhere else (e.g. mix app is not included in OTP release, so less chances of invoking Mix.env at runtime), and you can easily run multiple different systems (services) with different versions of Elixir/Erlang.
I also agree that the cost is marginal. But even if it takes longer to properly setup your own build and deployment, it’s a one-off cost. Pay the price once, reap the benefits many times
Keep in mind this thread is almost 2 years old! Distillery 2.0 was officially released last August and simplified the setup by a full order of magnitude. Especially the ENV handling that is now almost as simple as System.get_env/2 with mix configurations.
Unlike source code, which requires that you have the build tools, fetch dependencies, and compile a new version (which may differ from the last time it was compiled, due to different tools or dependency resolution resolving newer versions of your deps), OTP releases disconnect development and deployment in such a way that you only need to build the artifact once, and can then deploy it many times; rather than build the artifact each time.
You may want to check out Gigalixir for a heroku-like experience (more heavy on command line) but tailored for elixir. I’ve had good luck with it so far. Based on google cloud by default but I believe that’s flexible.
It’s default settings are very generic, not optimized for a ‘specific’ load but decent in average.
I’ve had to tweak some things on it on occasion but some minor tweaks can have some significant changes on some workload types!
I find it easier just to use a plugin to replicate all changes to another backup server in real time. If the load ever got high enough I could have both serve ‘most’ queries without issue even. PostgreSQL has some fantastic plugins!
Yeah this is fairly true about PostgreSQL, changing memory is the main thing I do.
In my cases it would have been more like $30k+/month. If your site is low enough for $50 to be enough for AWS then it is probably easier just to get a droplet or whatever they are called instead. ^.^;
And cost, they are SOOOO much cheaper!!
For me the dedicated server has always be significantly cheaper than *aaS’s.
Never an issue here either. My big server is replicated in near real-time to both another smaller server and my home server, ‘if’ it ever goes down yet (never has yet in 19/20 years time! knocks on wood) then I can reverse proxy a backup in seconds (I could automate that but eh, never had a need yet…).
Doesn’t really take that much time, and for saving $30k I’ll happily do it myself, plus its enjoyable figuring out everything. ^.^
This is really the entire conversation for any aaS offering though. Why use Sendgrid when you can setup your own Postfix, etc.
When it comes to the database, it’s usually the most critical part of your system and people tend to feel a lot better about trusting a platform utilized by many people over learning and setting up each line item themselves. I manage several largish PG instances now (couple of TB each) and if we were running in AWS I’d have used theirs in a heartbeat (RDS or Aurora).
The trade off bonus is that I can install extensions that aren’t on their whitelist, which I do and I appreciate.
Without RDS/Aurora I have to:
Setup WAL backups
Manually create a replica
Manually run version upgrades
Don’t get automatic zero downtime upgrades through their managed failover
Manually setup monitoring both within the DB and the server
Manually monitor logs
Manually setup alerting and paging
Running on GCP, the PG offering they haven’t isn’t really as polished but compute engine persistent disks do make managing your own a lot easier. Running our own has been very stable but I’d still sleep better with everything AWS has to offer.
I have. When time costs are factored in Aurora PG is a clear winner (even over Aurora RDS because that still requires a little bit of hand holding). With large databases on RDS I still used to spend about 6 hours a month looking into things. Aurora took that to 0 hours and lower service costs.
Aurora is the primary reason I’d choose AWS based platforms for anything serious. It’s that good.
Cost isn’t simple, it varies with a number of factors, including risk.
You’re talking about never having had a failure, which is fine, and, if you’re considering a single node then banking on it can totally make sense. I’m considering 100K+ servers, 1K+ applications and 1K+ databases deployed across 1K+ physical locations. With a mean-time to failure of various components in the 10 years range, that means I need to think in terms of continuous partial failure as the norm. Servers will fail every day, probably in cascades/clusters because that’s how shit happens.
You’re talking about being able to flip things over manually. I’m in a world where a critical failure at the wrong time can cost tens of thousands of dollars per second. How many seconds will it take you to solve the problem? I’m in a world where an increase in latency can cost millions of dollars as people abandon an app, or a site, or a shopping cart in a store.
Say I’m in a highly seasonal business, where our hardware utilization is high for a small part of the year and very low for the rest: *aaS services let me scale and pay for what I need, when I need it. That makes it cheaper in real terms.
Say I have processes that only run 5 * 8. I can turn off my databases for the rest of the week and pay only for storage. That makes them much cheaper.
Combine scaling only when I need it and scheduled shutdowns and I can reduce my costs from *aaS offerings by a really high percentage.
Can you build something that solves those problems yourself? Absolutely! Is it easy or cheap? No, it isn’t. It also isn’t easy to test it and make sure that it will actually work when you need it to. Cloud/*aaS offerings are much more capable of investing the time and energy necessary for that.
Not all money is created equal, either: in general purchasing hardware is a capital expenditure which means I need to carry the value of that hardware for years, pay taxes on the value of it and I need to either have the money or borrow it. SaaS services are all operational expenditure, which makes it more attractive from a taxation and net-present-value of capital perspective.
Say I’m in San Francisco, where the full cost of an employee is somewhere in the region of $250K per year - salary, pension, healthcare, training, insurance, holidays, office-space, travel, etc. etc. etc. If I’m considering 2000 staff, what percentage of our total productive output should we spend on database administration?
Overall, I want to minimize the total cost of ownership of our databases whilst creating an acceptable risk profile and the right performance characteristics. *aaS is sometimes the right option for that, and sometimes it isn’t.
I toss out these figures, not because I, or the company I work for are special, but precisely because we’re absolutely not. There are thousands of companies like us around the world. This is the reality of enterprise software; we’re not optimizing for cost of individual servers, we’re optimizing for organizational level concerns and we’re architecting for risk because - in a big enough corpus of possibilities - ten thousand to one risks happen dozens of times every day.
Your concerns are different from mine, I totally respect that your choices works for you, and I’m not suggesting you should do anything different. It doesn’t work for me and it certainly isn’t always cheaper, in real terms.
For many ‘solopreneurs’ (quoting the OP here ) time is something they might have, whereas they might not have limitless funds.
I’ll just speak to my own experiences, time spent learning is available, though TBH (in year four) I’m at a point where I’m past the threshold of cost of an “aaS” is a limiting factor. In Heroku-speak, I’m still on Standard dynos and the first non-Hobby level of database.
For me, the biggest cost comes in terms of a recovery situation, since my escalation path is… myself learning something new under pressure.
So, “time to learn” is certainly there. “Time to fix” isn’t. Does that resonate?
For just over 8 years I worked at a place managing about 40000 old SCO UNIX servers (that they tried to replace with Windows 2003 but never could so they ended up running two servers at most of the locations, one of each, but I primarily managed the SCO UNIX ones). Very much not cloud, they were all physical servers at each physical location, they had to work if the internet went down, if the central servers couldn’t push updates, etc… etc… And they were rock solid unlike the Windows ones, generally only lost 1 or 2 a day (of which we sent out replacements but on-sites usually had a backup in the city for immediate replacement as well).
Precisely! That is why we had to have such uptime, even in the case of Internet or even Power failures, you can’t have that if you are completely cloud based. Even the low-yield places still managed multiple $thousands of dollars a day, the average was around $30k/day (the high yield was… impressive…).
This is why physical servers are so important.
And then your Internet goes down and you are out $750 every minute for example. With our servers we still ran fine without Internet, I could still connect over dial-up if I needed to, and it used dial-up to verify credit/debit transactions, and even if dial-up was down (like power is out in the city so they are running on backup) then it could still batch the transactions (we’d eat the cost if a debit/credit transaction ended up not going through, it was worth it to keep up the clients goodwill though and it was usually pittance if anything).
Or just pay $2500 for one of these SCO UNIX servers (~$5000 for full server/backup setup) per location just outright and it would last for 7-10 years on average (excepting a few outliers, had a fun issue where a magnetic field generated by a motor was killing one site much much faster until that was figured out…), through downtime, power outages, etc… Even with the cost of power that’s still generally less than a dollar of cost a day for reliability that is unachievable with *aaS.
Although these places were on and operational 18 to 24 hours a day, depending on the location and day of week.
It was actually quite easy to manage, and it’s significantly cheaper than *aaS would have been, especially with the very common downtime (on average ~2-5% of all sites had Internet connectivity issues at any given time, generally for multiple days at a time because local ISPs tend to really suck, I have a special hatred for AT&T and Comcast because of that job…).
Except it’s also a service that you don’t own or control at all, you are at the whims of, say, if AWS goes down for a period of time as happened not long ago. We only have to worry about each site individually, not everything going down at once. There is no reason for a site not to even take credit cards for more than 15 minutes once a day (a certain report generation accessed and locked that system), though many on-site locations did because they weren’t doing that right (and if customers complained to us at corporate we would raise hell on the site, check their logs and all and always got proof).
We had ~4 Level 3 people (the programmers), ~8 Level 2 people (system diagnostic and repair), and about 30-50 Level 1 people (limited access, they mostly walked the sites through most issues they could solve themselves or passed it up to L2 or L3 for us to handle), for all ~40k locations (China we didn’t handle, their locations were… another team because China is irritating, though they had a total of like 20 people altogether). That’s to handle the database, synchronizations, communication issues, failures, everything, and we handled it efficiently, fast, and properly.
And that is precisely my thing, with *aaS you have a lot of things outside of your control, even a trivial internet outage for just a couple of days can put a site in red, we can’t stand that, and I doubt most other places could either.
I run my own personal things on dedicated servers as well, I truly think it is the best way of running things especially if you are reliant on Internet being up and accessible (that’s why even my 4 current personal servers are hosted with 3 different companies, if any have issues then I can have another take up its slack immediately, you can’t do that if you stuff is hosted at, say, AWS and AWS stops responding for 2 hours).
Oh, and as for the above servers, if the server went down then the individual client machines on-site would take over about 60% of its functionality, batching all the data until the server was set back up again and it would take all the accumulated information back.
In both of these sections I think you might be conflating two different things: edge devices which need to be in physical locations and servers which can be anywhere. Capable edge devices with durable on-board storage and fallback manual processes are definitely important to overall reliability. But that says nothing about where server infrastructure should be. In fact, the more capable the edge device the less important it is where the server infrastructure is.
I totally agree, you can’t use cloud services for edge devices, because they need to be at the edge, not in the middle.
The cloud offers me the option of multi-provider, multi-region, multi-availability-zone, global replication over dedicated links, giving you tested, hardened, reliable and fast redundancy against more types of failure than I can manage by myself.
The other consideration is that I’m not talking about edge devices or internet access, but both. I have physical locations, customer facing web properties and mobile applications, as well as global supply-chain and logistics concerns.
I can’t pay for 1 server, because I want multiple redundant nodes, in multiple data-centers, in each of eastern-US, western-US, western-Europe, north-India, china, and Japan. I also want to have production, development and test setups that replicate my production environment That means I need to buy 3 servers * 3 locations * 6 regions * 3 environments, giving me an ideal state of 162 servers. Don’t want to do that. Even if I decide I might only want dev and test clusters near my engineers because I don’t want to actually develop or test my global failover plans or my intra-region failover or my regional latency, I’m still buying dozens of servers.
With cloud options I don’t need any of that, I pay for exactly what I use, not what I might need. When I’m buying servers I always have to pay for what I might need.
Maybe loss-free, global replication of large numbers of petabyes of data and 10s of millions of transactions per second, across a 1K+ databases and 5K+ physical data-base servers balancing consistency and availability concerns, whilst providing consistent latency is easy for you. But I suspect that most people don’t find it to be so.
There are tiers of concerns here, each of which has an on-prem analog:
Loss of a node.
Loss of part of a data-center (a rack, hypervisor, whatever…)
Loss of a whole data-center.
Loss of all of the data-centers in a region.
Loss of all of the data-centers in a country.
Loss of all of the data-centers of a provider
Loss of all of the data-centers of all providers everywhere.
If the last one happens, we’re screwed whatever happens because it probably means the internet itself is down everywhere, for everyone.
All of the rest are solvable with a multi-cloud strategy, almost for free since you don’t need to pay for these features until you use them, unless you need to reserve capacity. They are only solvable with your own servers at great expense.
I was talking about 2000 or so people just involved in the process of creating and maintaining software. The total headcount is a couple of orders of magnitude higher than that. Even 1% of 2000 * 1/4 million is… lots.
Defense in depth my-friend, defense in depth. Building that yourself is possible but expensive since you have to buy the infrastructure before you need it. You get most of it with a single cloud provider in a trivial manner and a multi-cloud strategy gives you additional layers - at additional cost and complexity.
You can only fallback to other hardware if you’ve pre-purchased the spare capacity to do so. If you’re running at 89+% utilization rates you can’t do that. If you’re not running at high utilization rates under peak-load you’ve over-provisioned.
Like I said, I’m not judging the solutions that you’ve created to solve your problems, but they don’t solve my problems.
I also think I’ve said enough on this subject, so I’ll leave the last word to you.