I have 2 almost new WD20EFRX (WD Red NASware 3.0 – CMR) I was playing around with what various drives do when you pull the power in the middle of an IOP. Do you get old data? New data? a mix? If it’s a mix, where is the boundary? What’s really strange is that after plugging the drive back in after the shutdown I see the following problems:
[12620.956669] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[12621.090251] ata1.00: ATA-9: WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133
[12621.090565] ata1.00: ATA Identify Device Log not supported
[12621.090569] ata1.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[12621.091492] ata1.00: ATA Identify Device Log not supported
[12621.091508] ata1.00: configured for UDMA/133
[12621.091634] scsi 0:0:0:0: Direct-Access ATA WDC WD20EFRX-68E 0A82 PQ: 0 ANSI: 5
[12621.091902] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[12621.091907] sd 0:0:0:0: [sda] 4096-byte physical blocks
[12621.091915] sd 0:0:0:0: Attached scsi generic sg0 type 0
[12621.091919] sd 0:0:0:0: [sda] Write Protect is off
[12621.091923] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[12621.091941] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[12621.100913] sda: sda1
[12621.101190] sd 0:0:0:0: [sda] Attached SCSI disk
[12639.232247] EXT4-fs (sda1): recovery complete
[12639.259366] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[12722.363242] ata1.00: exception Emask 0x0 SAct 0x80e08001 SErr 0x0 action 0x0
[12722.363260] ata1.00: irq_stat 0x40000008
[12722.363267] ata1.00: cmd 60/00:a8:00:70:09/02:00:00:00:00/40 tag 21 ncq dma 262144 in
res 41/40:00:60:71:09/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[12722.364041] ata1.00: ATA Identify Device Log not supported
[12722.364832] ata1.00: ATA Identify Device Log not supported
[12722.364843] ata1.00: configured for UDMA/133
[12722.364892] sd 0:0:0:0: [sda] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[12722.364902] sd 0:0:0:0: [sda] tag#21 Sense Key : Medium Error [current]
[12722.364908] sd 0:0:0:0: [sda] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
[12722.364915] sd 0:0:0:0: [sda] tag#21 CDB: Read(10) 28 00 00 09 70 00 00 02 00 00
[12722.364919] blk_update_request: I/O error, dev sda, sector 618848 op 0x0:(READ) flags 0x80700 phys_seg 10 prio class 0
[12722.364957] ata1: EH complete
[12725.963238] ata1.00: exception Emask 0x0 SAct 0xf02000 SErr 0x0 action 0x0
[12725.963244] ata1.00: irq_stat 0x40000008
[12725.963245] ata1.00: cmd 60/08:68:60:71:09/00:00:00:00:00/40 tag 13 ncq dma 4096 in
res 41/40:00:60:71:09/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[12725.963976] ata1.00: ATA Identify Device Log not supported
[12725.964689] ata1.00: ATA Identify Device Log not supported
[12725.964691] ata1.00: configured for UDMA/133
[12725.964703] sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[12725.964705] sd 0:0:0:0: [sda] tag#13 Sense Key : Medium Error [current]
[12725.964707] sd 0:0:0:0: [sda] tag#13 Add. Sense: Unrecovered read error - auto reallocate failed
[12725.964708] sd 0:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 00 09 71 60 00 00 08 00
[12725.964709] blk_update_request: I/O error, dev sda, sector 618848 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[12725.964736] ata1: EH complete
[12729.463163] ata1.00: exception Emask 0x0 SAct 0x3c5000 SErr 0x0 action 0x0
[12729.463182] ata1.00: irq_stat 0x40000008
[12729.463189] ata1.00: cmd 60/08:70:60:71:09/00:00:00:00:00/40 tag 14 ncq dma 4096 in
res 41/40:00:60:71:09/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[12729.464412] ata1.00: ATA Identify Device Log not supported
[12729.465405] ata1.00: ATA Identify Device Log not supported
[12729.465416] ata1.00: configured for UDMA/133
[12729.465462] sd 0:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[12729.465471] sd 0:0:0:0: [sda] tag#14 Sense Key : Medium Error [current]
[12729.465478] sd 0:0:0:0: [sda] tag#14 Add. Sense: Unrecovered read error - auto reallocate failed
[12729.465484] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 00 09 71 60 00 00 08 00
[12729.465488] blk_update_request: I/O error, dev sda, sector 618848 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[12729.465520] ata1: EH complete
Those are physical read failures. The data is GONE. It’s not like it’s giving me old or corrupted data, it’s refusing the reads. Consumer SSDs (samsung evo for example) return a mix of old an new data in 4k-aligned 4k-blocks, but never a read failure or smartctl error. I assumed it was a failing drive and tried the exact same experiment on an identical drive. Now both of the WD drives have Current_Pending_Sector
counts potentially reducing the lifetime of the drives. This seems so bizarre to me. I mean, all I did was unplug it. With all the marketing around how these drives are very resilient physically I wouldn’t expect permanent drive degradation from a power outage.
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 5
3 Spin_Up_Time 0x0027 172 170 021 Pre-fail Always - 4400
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 70
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 094 093 000 Old_age Always - 4830
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 32
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 637
194 Temperature_Celsius 0x0022 116 113 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
Do WD drives have a “feature” where they refuse to return data from blocks they have detected to be potentially half-written in fear of sending “corrupt” data to the OS like the consumer drive does? If so, how can I get my block back? You can’t have it both ways and say the drives are resilient to power failures yet permanently degrade every time there is a power outage under load. I can think of a few ways I’m wrong about what’s going on here:
- The drive isn’t actually permanently degraded and there is some kind of reset I don’t know about
- This is actually a really nice feature and large companies don’t care about power outages because they almost never happen and drives are cheap to them. The data is more important.
- The filesystem somehow plays a roll in all this I’m not seeing. As far as I can tell, that dmesg output is a physical read failure, not a filesystem error. The only IO on the disk was overwriting a pre-allocated file, so it’s not like the filesystem could be getting confused. Only user data could be potentially corrupted. dentry info was always correct after the outage.
- Pulling the drive by hand causes enough vibration to physically crash the head into the platter. I’d be surprised given the supposed quality of these drives.
I could be convinced to repeat the experiment with block-level IO to rule out the filesystem completely, but I would need a pretty convincing argument. It would take me a couple of days to write the software to do that correctly. I need tools to track expected state and more tools to verify data after power losses.
It probably doesn’t matter, but the IO load in question was random 1M writes in a 1G O_DIRECT | O_SYNC fallocate()'d file over ext4 with default journaling. that is, no fast_commit, metadata is journaled, but user payload isn’t.
I hate to sound arrogant, but in the interest of keeping any discussion on track if the previous paragraph didn’t make sense to you please just DM me your comments or questions first. I’m be happy to discuss, educate or be educated offline. I tried discussing elsewhere first but kept getting derailed with “pick a better filesystem” or “caching is bad” comments.
I’m hoping to find some documentation on exactly what it is about a power failure that can cause block read failures.
I’d be willing to share the code I used to run this test if you have a drive you’re willing to risk and want to compare results. It requires linux 5.10+ and c++17 or higher.