Kioxia also says, in that same blog post that you linked,
T10 Protection Information is a feature first enabled on enterprise SAS HDDs and SSDs that has been included in the NVMe specification, and has been included in KIOXIA Enterprise class NVMe SSDs since CM5.
We all would’ve liked if Kioxia wrote this clearly in the data sheet or specifications instead of having us parse a blog post to extract this information…
This problem of the expression “end-to-end protection” having no certain meaning seems to be as widespread as “AES encryption” not meaning access to the contents of the drive being protected by any sort of authentication.
Perhaps there is some implicit agreement that if you’re an enterprise buying enterprise drives, T10-PI/DIF/DIX is as implicit as water being wet—something that us plebs wouldn’t know.
Perhaps there is some implicit agreement that if you’re an enterprise buying enterprise drives, T10-PI/DIF/DIX is as implicit as water being wet—something that us plebs wouldn’t know.
But we keep finding enterprise drives that don’t actually support it (e.g. intel P4500s)
I recently acquired an enterprise grade NVMe with T10-PI support (a Netlist NS1952 7.68T) and got PI working (I think). I figured I’d post some notes here since there was some troubleshooting involved.
nvme id-ns listed out the logical block formats that the drive supported, and LBAF 3 corresponded to 4096 sectors with 8 bytes of metadata. So I formatted the drive as follows:
nvme format /dev/nvme0n1 --lbaf=3 --pi=1 --pil=1 --ses=1
The --pil=1 flag was necessary on this device (the command fails claiming lack of support otherwise) but since, according to the manual page for nvme format, it just determines whether the protection info is the first or last 8 bytes of metadata, in this configuration I think there is actually no difference.
After the format, most interactions with the SSD would trigger ref tag errors in dmesg, presumably because the format didn’t actually update the metadata to be correct across the disk. I had substantial trouble creating partitions and luks on the device, until I zeroed out the whole drive with dd. Note that blkdiscard did not work in my case, I had to write zeros to the whole drive. After that, I was able to proceed using the drive as normal.
Note that at one point (before I figured out that I needed to zero the drive), I tried a format with --ms=1. This caused the drive to show up as not integrity capable at all.
Oooh good catch! fstrim on the mount point says “the discard operation is not supported”. I have xfs on lvm on luks2 on the raw partition, so I may have just missed a setting somewhere… looking into it.
Edit: the issue what that I didn’t use --allow-discards in cryptsetup open. I’ve rectified that and fstrim now works.
I’m not too familiar with reading smartctl outputs on nvme devices, but here’s smartctl -a. I don’t see anything which I can obviously correlate with my actions.
=== START OF INFORMATION SECTION ===
Model Number: NS1952UF17T6
Serial Number: [redacted]
Firmware Version: 283001K0
PCI Vendor/Subsystem ID: 0x1c1b
IEEE OUI Identifier: 0x38b19e
Total NVM Capacity: 7,681,501,126,656 [7.68 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.2
Number of Namespaces: 32
Local Time is: Mon Mar 3 10:25:29 2025 EST
Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x000e): Format Frmw_DL NS_Mngmt
Optional NVM Commands (0x0034): DS_Mngmt Sav/Sel_Feat Resv
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 25.00W - - 0 0 0 0 100 100
1 + 24.00W - - 1 1 1 1 115 115
2 + 23.00W - - 2 2 2 2 130 130
3 + 22.00W - - 3 3 3 3 145 145
4 + 21.00W - - 4 4 4 4 160 160
5 + 20.00W - - 5 5 5 5 175 175
6 + 19.00W - - 6 6 6 6 190 190
7 + 18.00W - - 7 7 7 7 205 205
8 + 17.00W - - 8 8 8 8 220 220
9 + 16.00W - - 9 9 9 9 235 235
10 + 15.00W - - 10 10 10 10 250 250
11 + 14.00W - - 11 11 11 11 265 265
12 + 13.00W - - 12 12 12 12 280 280
13 + 12.00W - - 13 13 13 13 295 295
14 + 11.00W - - 14 14 14 14 310 310
15 + 10.00W - - 15 15 15 15 325 325
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 53 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 1,585,022 [811 GB]
Data Units Written: 37,411,179 [19.1 TB]
Host Read Commands: 6,452,500
Host Write Commands: 2,641,022,858
Controller Busy Time: 748
Power Cycles: 7
Power On Hours: 38
Unsafe Shutdowns: 2
Media and Data Integrity Errors: 0
Error Information Log Entries: 3
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 64 Celsius
Temperature Sensor 2: 53 Celsius
Temperature Sensor 3: 48 Celsius
Temperature Sensor 4: 47 Celsius
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message
0 3 0 0x0000 0x4004 0x028 - 0 - Invalid Field in Command
1 2 0 0x0000 0x4004 0x028 - 1 - Invalid Field in Command
2 1 0 0x0000 0x4004 0x028 - 0 - Invalid Field in Command
Self-tests not supported
I also tried nvme error-log, it did emit 63 entries in total, but the ones numbered 4 and higher are identical to the one numbered 3. I don’t know how to interpret this, but will do some reading. I don’t think these are the PI errors I was seeing in dmesg, but I could be wrong.
Error Log Entries for device:nvme0 entries:63
.................
Entry[ 0]
.................
error_count : 3
sqid : 0
cmdid : 0
status_field : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag : 0
parm_err_loc : 0x28
lba : 0xffffffffffffffff
nsid : 0
vs : 0
trtype : Fibre Channel Transport error.
csi : 0
opcode : 0
cs : 0
trtype_spec_info: 0
log_page_version: 0
.................
Entry[ 1]
.................
error_count : 2
sqid : 0
cmdid : 0
status_field : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag : 0
parm_err_loc : 0x28
lba : 0xffffffffffffffff
nsid : 0x1
vs : 0
trtype : Fibre Channel Transport error.
csi : 0
opcode : 0
cs : 0
trtype_spec_info: 0
log_page_version: 0
.................
Entry[ 2]
.................
error_count : 1
sqid : 0
cmdid : 0
status_field : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag : 0
parm_err_loc : 0x28
lba : 0xffffffffffffffff
nsid : 0
vs : 0
trtype : Fibre Channel Transport error.
csi : 0
opcode : 0
cs : 0
trtype_spec_info: 0
log_page_version: 0
.................
Entry[ 3]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
csi : 0
opcode : 0
cs : 0
trtype_spec_info: 0
log_page_version: 0
Looks like I spoke too soon about T10-PI working. After a power cycle, dmesg contained a whole lot of these:
[ 962.112578] Buffer I/O error on dev nvme0n1p1, logical block 1875366128, async page read
[ 962.113459] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 962.130956] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 962.131065] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1001.063037] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1001.063159] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1145.538343] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1145.538385] Buffer I/O error on dev nvme0n1p1, logical block 1875366128, async page read
[ 1145.539247] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1145.556586] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1145.556695] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1149.762937] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
[ 1149.763007] nvme0n1: ref tag error at location 15002931072 (rcvd 0)
The LUKS container failed to come up automatically. I was able to manually open it, and activate the LVM volume thereon, but I wasn’t able to mount the XFS filesystem (or even detect it with blkid) without disabling integrity validation via echo 0 > /sys/block/nvme0n1/integrity/read_verify. That allowed me to mount the FS.
I’m currently running an xfsdump so that I have an up-to-date backup, and will be reformatting the drive to remove T10-PI, at least for now. I don’t think this is expected behavior and I’m feeling mistrustful of the T10-PI implementation in the drive firmware.
At first I was confused because it looks like the protection bytes are all zero, but the man page, in the table describing the effects of the -p flag, notes that bit 3 causes metadata to be stripped on read. Trying again with -p 6 gives
Looks like the guard tag is populated, the app tag is not, and the ref tag is populated. The strange thing is that the ref tag is correct here… Type 1 protection → ref tag should be the lower 32 bits of the LBA, i.e. 00b7e700. So either it got generated on the read by nvme-read itself, or I really don’t know what’s going on. But the behavior is the same if I omit the -r argument.
So the whole sector is full of zeros, and the metadata is also full of zeros. Therefore, the reference tag doesn’t match, and yet the nvme read completed successfully.
So, still confused, but at least I’m probably checking the right block now…
Okay I have my filesystem working again. I think I have an incorrect ordering dependency between LVM and Cryptsetup, so I have to manually activate the lv after booting, and then mount the fs, but that’s fine for now.
Speaking of the FS, XFS was refusing to mount due to a ref tag error on the last block in the lv (probably backup superblock?) I was able to fix this by writing zeros to that region with dd. So perhaps trimming this ssd causes the ref tags of free space to get zeroed out, or something, and yet various stages of filesystem/partition probing, and even filesystem mounting, check blocks that aren’t allocated and therefore get trimmed.
Annoying.
I’m trying to decide if I’m better off leaving T10-PI on or off… It appears to “work”, but in ways which the rest of the stack haven’t necessarily accounted for. In the meantime, it’s on and I’m back to using the filesystem.