Do you use SSDs or standard HDDs in your production servers?

Seems to be mixed opinion on this for web servers, so curious what you use in your production servers.

On this server, I have 2x 2TB HDDs in Raid 1 for contingency (wondering whether SSDs would make much difference).

Are you IO bound on the drives? If yes then SSD, if no then don’t. ^.^

2 Likes

Depends on your workload. A Kafka server would probably prefer HDD to SSD. At the same time, an Influx server would want SSD. And an ElasticSearch one would just need a ton of RAM.

What is your use case ? What is the shape of your workload ? How big is the data set that is “hot” ? If it is mainly “cold” data, more RAM, ZFS and HDD will probably be better, with a big FS cache than anything else. You will probably always hit the FS cache.

The only really important thing to keep in mind is that your storage should be local to your server if possible. No network/block storage if you can avoid please.

3 Likes

The only really important thing to keep in mind is that your storage should be local to your server if possible.

Network block storage can outperform local HDD by a couple of orders of magnitude. Plus it generally has advantages like instant snapshots that can be mounted on another server, easy backups etc. To just categorically exclude them from your options suggests you are extrapolating from a rather small data-set.

2 Likes

No. Quite the contrary. You can do instant snapshots that can be mounted on any server, easy backups and all that with local storage too. I would advise to have a look at ZFS feature set for an example.

But what you can not get with local storage are :

  1. Having to deal with NFS
  2. Losing network access
  3. Having to deal with the fallacies of distributed computing
  4. Having to deal with others mistakes
  5. Having to deal with NFS
  6. Losing your data randomly
  7. Basically that can be summed up by “having to deal with NFS”

Network Storage, at the filesystem level, is a fundamentally wrong. Thinking that it is acceptable in production setup means that you are extrapolating from a rather small dataset. You can build something at the application level that work through network. That what any database do. That is what S3 does.

But the behaviour that is expected by every OS in the planet for a filesystem is fundamentally impossible to do right on a networked filesystem.

EDIT: For the pov of people that spent a lot of their life designing networked file systems…

https://www.joyent.com/blog/network-storage-in-the-cloud-delicious-but-deadly

When I began to talk to Joyent, I was relieved to hear that their experiences so closely mirrored mine – and they had made the decision to abandon the fantasia of network storage for local data (root file systems, databases, etc.), pushing that data instead back to the local spindles and focussing on making it reliable and available. Yes, this design decision was (and remains) a trade-off – when local data is local, compute nodes are no longer stateless – and Joyent has needed to invest in technologies to allow for replication, migration and backup between nodes.

https://www.joyent.com/blog/magical-block-store-when-abstractions-fail-us

All is not lost, however. In my experience, it isn’t that we need “disks”, it’s that our apps need POSIX’s read() and write() and friends combined with behaviour that roughly approximates a physical disk. For the times we actually need a block interface vs. filesystem semantics, I would argue, the abstraction sitting on top (the filesystem, usually) is even more sensitive to the behaviour of the disk device than a POSIX application like a database is.

My opinion is that the only reason the big enterprise storage vendors have gotten away with network block storage for the last decade is that they can afford to over-engineer the hell out of them and have the luxury of running enterprise workloads, which is a code phrase for “consolidated idle workloads.” When the going gets tough in enterprise storage systems, you do capacity planning and make sure your hot apps are on dedicated spindles, controllers, and network ports.

BTW, from my pov and experience in that type of “entreprise workload”, the reason they have gotten away with it is that everything is shit and everyone accept these types of problems as “normal”.

3 Likes

SSDs are generally advised for ElasticSearch – beside enough memory.

2 Likes

ElasticSearch use and abuse mmaping all its index by default. So in general, SSD or not will not matter that much, but a lot of RAM for the FS cache will matter a lot.

2 Likes

If you can afford SSDs, they are by far superior to any spinning media. SSD-backed nodes see boosts in both query and indexing performance. If you can afford it, SSDs are the way to go.

see: https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_disks

Disks are usually the bottleneck of any modern server. Elasticsearch heavily uses disks, and the more throughput your disks can handle, the more stable your nodes will be. Here are some tips for optimizing disk I/O:

  • Use SSDs. As mentioned elsewhere, they are superior to spinning media

https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#_storage

1 Like

I know i did support ES in prod during one year. SSD or not, it is a piece of crap that is slow as hell. Also do not believe the doc in that case…

3 Likes

Not sure how current this article is, but they are giving HDDs the edge in terms of reliability - which was a factor I was taken into account (thinking SSDs would be more reliable since there are no moving parts):

http://www.enterprisestorageforum.com/storage-hardware/ssd-vs.-hdd-performance-and-reliability-1.html

I think all things considered, particularly as it is easier to run Elixir apps in memory, I will probably opt for the greater storage of HDDs vs the speed of SSDs… for now at least - when SSDs are more readily available at 1TB drives I may give them a try :003:

Oh god no, SSD’s have a short lifetime. You use them for their speed and as a cache, you do not use them for reliability in any form at all. O.o!

The reason is that, though they have no moving parts, they do have transistors, and transistors like to get wiped out when their power vanishes (either immediately or some short (sometimes) time later). So a while back now some transistors were developed that hold on to their memory when power goes down (via an interesting quantum locking quirk, it is quite cool actually), however in doing so it has a chance of damaging the locking mechanism of the transistor. The general ‘average’ life of modern high-quality SSD’s is about 100,000 ‘state changes’ of a given transistor before a high chance of failure, and a state change is writing to it.

So if your SSD is storing a massive amount of never-changed read-only data, it should last you many many years.

But if you are making changes to it all the time then expect it to fail pretty rapidly, even within 6 months for a high quality SSD if you heavily write to it.

2 Likes

Definitely give it a miss for servers then :lol:

My MBA is about 4 years old now and the SSD hasn’t given me any problem… I don’t think I’ve ever had a standard HDD last this long on my home computer.

You don’t write to it a lot then. ^.^

The firmware on modern SSD’s ‘shuffle’ around where data is written, so say writing to the same file repeatedly actually shuffles it around the SSD so different transistors are used each time. That let’s them get to generally a few million ‘overall writes’ for modern SSD’s before they fail, but when they fail they fail pretty spectacularly!

But nah, I use a 512 gig SSD and 6 1-tb RAID on my main server. The 512 meg SSD doubles as a boot drive (the OS rarely changes after all, updates are not a heavy write load) and as a cache for the RAID (split into 2 256 gig chunks, one for each purpose).

Compiling on a RAID can take, say, 10 minutes for a complex project, where compiling the same one on the RAID can take over an hour (that’s C++ for you!). ^.^

So they definitely have uses on servers, just have to use them appropriately. :slight_smile:

And if you really need high IO workload then an SSD is great in general, just factor in replacing them every 6 months into the running costs. ^.^;

1 Like

It depends on whether we think we’ll have time to keep those clusters in good shape. Often it’s cheaper to just throw money at cluster performance problems then it is to optimize an often-ignored set of nodes.

1 Like

Anybody have any thoughts on NVMe SSDs?

If you had to choose between two servers, one with a NVMe SSD with an Intel Xeon E3-1275 v5 Quad-Core (Skylake) and with ECC ram, or another with a standard Hard Drive with an Intel Core i7-8700 Hexa-Core (Coffee Lake) but with non-ECC ram - which would you go for?

This is interesting about NVMe’s (particularly the but in bold):

Source: https://www.pcworld.com/article/2899351/storage/everything-you-need-to-know-about-nvme.html

Assuming the SSD is backed up via a rain or so, I’d pick that, faster access, plus ECC ram is awesome (there is on average 1 random bit flip per day or something like that otherwise).

I use SSD’s in software RAID’s that mirrors to another file-dedicated server and can hot-swap back in quickly if anything happens.

1 Like

There will be two SSDs in a raid 1 array with daily backups of the apps/sites :003: