I need to set ashift values for SSDs in ZFS (Proxmox). What value (Smartctl is Confusing)?

I’ve previously asked about how to utilize my various types of SSDs in ZFS/Proxmox, here: New Proxmox Install with Filesystem Set to ZFS RAID 1 on 2xNVME PCIe SSDs: Optimization Questions .

Thanks for all the help over there. I’m to the point of actually setting up things now, and running into issues with figuring out what ashift values to use. I know that this flag is pool level, and getting it wrong will permanently contaminate the VDEVs. I also know that some (all?) SSDs will lie about their sector size for legacy OS compatibility.

I’ve got 3 types of SSDs in play, and they’re all giving me different output formats in smartctl, with different reported sector sizes, so I would appreciate a sanity check and some advice on what to do. I don’t want to have to redo the install because I got this wrong. :slight_smile:

Question: What sector size should I set for each of the following drives inside Proxmox (ZFS settings for pool)? Each pool is made of one type of drive. smartctl output follows.

Drive Type 1: Sabrent Rocket 4.0 m.2 PCIe 4.0 NVME:


So, it looks like these are 512 bytes by default, but can support 4096 LBA size?

So if I set ashift=12 (block size: 4096 bytes), Proxmox should format the drives that way, and all should be well?

Or do I need to do something else?

Drive Type 2: Lexar LS100 500 GB 2.5" SSD

The only reported value here is “logical/physical” at 512 bytes. So, I’m guessing I should set ashift to 9 (512 bytes)?

Or should I use 4096 again even though nothing is reporting that?

Drive Type 3: Sandisk SDLKAC6M800 G5CA1

Logical is 512.
Physical is 4096.
Frustration is up to 11.

Since physical is 4096, I’m thinking there’s no risk in using 4096 for ashift?

1 Like

For SSD’s, set ashift to 12 (or 13) unless you know otherwise with testing. Any modern drive that reports a sector size of 512 is telling a horrible lie. This is meant for compatibility reasons with ancient poorly coded software/hardware that needs to be sent to a landfill. The actual sector size used in SSD’s is a complicated mess that most companies won’t even share info on. The internal NAND page size of SSDs are generally much larger than 4096. Samsung for example used to use 8KiB, but probably uses 16 or even 32KiB sizes now. Various black magic controller bullshit hides this from the user and results in them being in practice optimized for ashift 12 or 13 sizes. At least for consumer drives. Enterprise drives may have some special caveats that allow for better performance in very specific and highly optimized situations, but you don’t realistically have to worry about that because you aren’t being payed to squeeze out 5% more database performance on million dollar systems.

Note that ZFS has the “technically correct” default behavior of “trusting” these lies when creating a pool unless you specify it manually or the drive is in its internal list of corrected values. This is frankly awful and has fucked over a lot of people.

The only offhand exception may be optane, which is apparently byte addressable.

If ashift is too high, you lose a bit of space if you have a massive amount of very small files. If it’s too low, you get write amplification and performance degradation.

5 Likes

Ashift=12 is the way I go on flash.

I have a bunch of small files, but even if they are quadrupled in size, they still don’t amount to much space at all, in relation to like media files.

I haven’t seen and 8k drives (yet) but I guess they are coming, maybe with PLC / 5 bit cell SSD’s perhaps?

2 Likes

Thanks! That clears it up rather nicely.

(It’s also nice to see someone as annoyed by this crap as I am, @Log . Hardware shouldn’t lie about what it is. Samsung T7s won’t even talk to smartctl, for instance.

EDIT: Am I correct that ZFS does ashift at pool level? It seems like this is the sort of thing that should be able to be set by vdev.

It is set at pool creation time.

which means new drives later might play fun and games with write amplification later, if a different format added later.

That’s why I go 4k, even on rust, because AF / 4K is very common

That’s indeed a common misconception, ashift is defined per vdev, not per pool. It’s just set at pool creation for those initial vdevs, and cannot be changed once they are added. You can have mixed ashifts, though generally you do not want that, and should seriously reconsider your pool design and requirements if that seems like realistic option.

Example

zpool create -o ashift=12 testpool mirror sda sdb
zpool add -o ashift=12 testpool mirror sdc sdd

However, now that you bring that up, I don’t offhand know what the default behavior is when adding a new vdev. Does ZFS use the previously defined ashift for the pool, or does it blindly trust the lying drives and contaminates your pool with a fucked up vdev, requiring destruction to fix? I also don’t know if mirrors or raidz would be handled different either.

Special vdevs also can have an ashift defined.

In order to remove mirror vdevs or special vdevs, I believe ALL the ashifts of everything must be the same value. I have not personally played around with this, so I’m not familiar with any of the gotcha’s. To remove mirror vdevs I believe all vdevs must be mirrors, no raidz.

Well, it looks like the QVO’s might be 8k, so ashift=13 might indeed be the way to go, if they might be on the radar?

Ah! It seemed to make more sense to me that they should be definable at the vdev level. Glad to know my instincts were actually right for once (ZFS is hard, yo :stuck_out_tongue: ).

@Log , I wasn’t thinking of mixing ashift values, but I couldn’t help thinking of the possible scenario in the future where I needed to expand a vdev and things had progressed to the point I actually was using a type of drive where I needed to choose an ashift value aside from 12.

I honestly don’t see that happening anytime soon, if ever.

However, now that you bring that up, I don’t offhand know what the default behavior is when adding a new vdev. Does ZFS use the previously defined ashift for the pool, or does it blindly trust the lying drives and contaminates your pool with a fucked up vdev, requiring destruction to fix? I also don’t know if mirrors or raidz would be handled different either.

LOL. That’s a whole new pile of worry. :stuck_out_tongue: Luckily, I’m just trying to install Proxmox right now, so I should just end up with a root pool with the OS on it (ZFS RAID1, crappy consumer SSDs I will upgrade later), and a second pool with my VMs (ZFS RAID1, ridiculously overamped PCI 4 NVMe drives).

Special vdevs also can have an ashift defined.

I’m not defining separate vdevs for slog or dedup at this time. My array is all SSD, so I don’t see any speed benefits in doing so, and I don’t have enough bays to set aside any drives just for dedup or slog. I’m depending on my UPS to save me in the event of power issues.

Well, it looks like the QVO’s might be 8k, so ashift=13 might indeed be the way to go, if they might be on the radar?

As far as storage goes, I’m out of spleens to sell, so I’m thinking I won’t be upgrading in this direction for a while. :stuck_out_tongue:

I’m about to deploy 4 Seagate Exos X18 18TB Enterprise HDD (ST18000NM000J) drives, described as “512e and 4Kn FastFormat”.
Per the manual, the “Default shipping format is 512E”, but the drive “supports either 512E or 4KN logical sector size formats”.

Apparently, I can use SeaChest utilities to convert the default 512E logical sector size to 4k.

--setSectorSize [new sector size]	 (Seagate Only)
This option is only available for drives that support sector
size changes.... Use the --showSupportedFormats option to
 see the sector sizes the drive reports supporting. If this option
does not list anything, please consult your product manual.
This option should be used to quickly change between 5xxe and
4xxx sector sizes....
-- http://support.seagate.com/seachest/SeaChest_Combo_UserGuides.html

Is changing the logical sectors to match the physical something worth doing going into a modern ZFS host? Or would setting the ashift correctly be just as good?

1 Like

It’d probably be good to change the setting, it certainly wouldn’t hurt anything. My hunch is that the firmware could potentially have some minor optimizations if it can expect to deal with one size vs another. You’d need to benchmark it to be certain.

2 Likes

Interesting.

EDIT: SB-ROCKET-NVMe4-500 - SB-ROCKET-NVMe4-1TB - SB-ROCKET-NVMe4-2TB - SB-ROCKET-NVMe4-HTSK-500 - SB-ROCKET-NVMe4-HTSK-1TB - SB-ROCKET-NVMe4-HTSK-2TB | Sabrent

It’s ridiculously difficult to find things on Sabrent’s website, but apparently, there’s a sector adjustment tool. I’ve already installed these drives into the server, and can’t easily get them out, so I’m going to have to figure out how to boot into Windows and run this thing.

1 Like

@Log

Thought you (and others) might be interested in an update.

  1. Lexar’s utility, for my boot drives, has no option to change the sector size. I’m not so concerned with this, as I’ll be swapping these out before changing any other drives.
  2. Sabrent’s utility made switching from the default 512 to 4k extremely easy. I did that on two NVME Rocket 4 disks without a problem.
  3. SANDISK’s utility is a clown show.

The SANDISK utility only offers options for formatting in 512, 520, and 528. I’ve never even heard of those last two, and they’re not a power of 2, so I’m thinking I can’t use them, even if I wanted to.

Smartctl sees these drives as having a logical block size of 512 and a physical size of 4k. The manual for the drive utility doesn’t offer any options to reformat with 4k block sizes. Just the aforementioned 512, 520, and 528.

EDIT: I should mention that what i think is the spec sheet for these drives does not list 4k as an option, only the 5xx values, even though smartctl sees 4k as the “physical sector size.”

I’m very confused right now, and extremely frustrated.

The SANDISK utility only offers options for formatting in 512, 520, and 528. I’ve never even heard of those last two, and they’re not a power of 2, so I’m thinking I can’t use them, even if I wanted to.

There’s definitely compatibility issues that come up though I’ve never personally played with it myself. Basically that extra 8/16 bytes is for storing checksum and other data on special systems. Netapp seems to be one such system. ZFS makes this redundant. So yes, just keep them at 512 if 4KiB isn’t an option. I wouldn’t worry too much about having logical 512 byte sectors, all that really matters is that the ZFS vdevs are set to 4KiB so it doesn’t cause write amplification on a disk that is really 4KiB behind the firmware anyways.

A brief summary: Other - Deciding what to do with 520byte sector size SSD | The FreeBSD Forums

Thanks for the link. I’m gonna check that out now.

I’m still trying to update the firmware on the disks. I doubt, from looking at the release notes, that I’ll suddenly get a 4KiB formatting option, but it’s supposed to make them more reliable at speed, so it’s worth doing.

When I have the instructions, I’ll post them somewhere.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.