Some of these issues, at least from the point of view of the runtime system, are addressed in Designing for Scalability with Erlang/OTP; especially the last few chapters:
- System Principles and Release Handling
- Release Upgrades
- Distributed Architectures
- Systems That Never Stop
- Scaling Out
- Monitoring and Preemptive Support
Many of the others feel like they depend a lot on your chosen hosting strategy; if you rely on a database service, and deploy to private or public cloud, perhaps with Docker, or some kind of Heroku-like service, you wouldn't need to worry about things like backups, hardware, etc. On the other hand, maybe then you can't cluster in the most straightforward way, perform hot code upgrades, and so forth.
Most of what I know in this area comes from practical experience after 15 years as a sysadmin, DBA and ERP developer (the mix has varied over time), followed by a few years more focused on automation, "devops" and BI development, so I'd be hard pressed to point to a specific set of learning resources, unfortunately
One book that stands out though, in the system administration area, is The Practice of System and Network Administration (I own an earlier edition).