I'm aware of what purpose the HDD cache serves, but I'm curious if it's really all that relevant in modern storage solutions like ZFS (JBOD) or RAID(z/2/3).
Typically homelabbers can find two types of drives: enterprise and consumer. I found these two here, same brand, to use as an example:
Enterprise: 3TB, 7200RPM, SATAIII 6Gb/s, 1 year warranty, 64MB cache - $72
Consumer: 3TB, 7200RPM, SATAIII 6Gb/s, 1 year warranty, 128MB cache - $60
As you can see, the consumer drive has 2x cache size and is $12 cheaper.
In my experience, most homelabs have at most 6-24 HDDs in 1-2 RAIDz1 or RAIDz2 VDEVs.
My question is this: Does the larger cache and cheaper price of consumer drives outweigh the disadvantage of not being purpose-built for use in a NAS/SAN? When used in an array, does the cache affect I/O performance significantly?
Most articles I've come across tend to focus on raw read and write throughput while ignoring I/O altogether. My situation is that I'm building a NAS to be the backend for a compute node running multiple VMs and databases. I/O is very important!
...but so is keeping the cost down while retaining data integrity (losing all my storage would be bad, y'know?)
I'm currently looking to build a RAIDz1 pool of 5 of these drives with LZ4 compression enabled. This seems to be a good mix of capacity and throughput for my budget. I'll eventually expand by adding a second identical pool when I can afford it. The part I'm unsure of is how to maximize my I/O, and how disk cache affects that.
NOTE:
Enterprise drive: HGST Ultrastar
Consumer drive: HGST DeskStar
While both drives in my example are refurbished you can also look at the two most common homelab drives used today (January, 2017): WD Red (shucked from My Book Duo) and Toshiba x300.
NOTE2: I think this would be a great topic for Wendell to cover in his ZFS storage series.