PSA: don't change sector size of external USB WD drives

risk · November 25, 2022, 10:15am

Hi, so I accidentally-ish a brand new 20TiB HDD.

Even newest and biggest external 3.5" USB drives are still shipping using 512e sector size (4096 bytes physical, 512 bytes logical).

If you’re thinking “isn’t that inefficient?” … or … “whyyyyy” … or … “I’ll just change it”. … don’t .

Either the firmware on the drive or the firmware of the HDD enclosure that these large external drives ship in, is dumb, and changing sector sizes will most likely brick your drive.

Most of the repair tools work on SATA drives, I haven’t tried shucking the drive to check, since I need/want the USB interface, and the drive is a bit expensive and there might be kapton tape needed to be applied over some of the pins.

Why might it be more efficient to use 4k sectors?

Well, you’re sending commands over USB, and if you need to write e.g. 1M of data, you can send either 2048 write commands (with 512) or 256 write commands (with 4k). Fewer commands is more efficient.

So you have the latest hdparm and you run:

hdparm -I /dev/sdi

And it shows

...
        CHS current addressable sectors:    16514064
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors: 39063650304
        Logical  Sector size:                   512 bytes [ Supported: 4096 512 ]
        Physical Sector size:                  4096 bytes
        Logical Sector-0 offset:                  0 bytes
        device size with M = 1024*1024:    19074048 MBytes
        device size with M = 1000*1000:    20000588 MBytes (20000 GB)
        cache/buffer size  = unknown
        Form Factor: 3.5 inch
        Nominal Media Rotation Rate: 7200
...

All good stuff - 4k logical sizes are supported.

… and you read the hdparm manual and you issue:

hdparm --set-sector-size 4096 /dev/sdi and it shows you a warning that this will scramble your data, and since there’s no data on drive you issue a following:

hdparm --set-sector-size 4096 --please-destroy-my-drive /dev/sdi

… what follows is a “success” message (I didn’t capture it, I’m sorry) and an error message/crash causing your usb stack to partially freeze and your dmesg will have a bunch of “hung tasks” and “timeouts” talking to device.

… plugging in the disk into another machine will yield kernel logging the following in dmesg:

[1733124.832014] sd 6:0:0:0: [sde] Read Capacity(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[1733124.832020] sd 6:0:0:0: [sde] Sense not available.
[1733124.832027] sd 6:0:0:0: [sde] 0 512-byte logical blocks: (0 B/0 B)
[1733124.832030] sd 6:0:0:0: [sde] 0-byte physical blocks
[1733124.832036] sd 6:0:0:0: [sde] Write Protect is off
[1733124.832040] sd 6:0:0:0: [sde] Mode Sense: 00 00 00 00
[1733124.832046] sd 6:0:0:0: [sde] Asking for cache data failed
[1733124.832048] sd 6:0:0:0: [sde] Assuming drive cache: write through
[1733124.942144] sd 6:0:0:0: [sde] Read Capacity(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[1733124.942155] sd 6:0:0:0: [sde] Sense not available.
[1733124.942178] sd 6:0:0:0: [sde] Attached SCSI disk
[1733128.012128] usb 1-1: new high-speed USB device number 7 using ehci-pci
[1733128.214080] usb 1-1: New USB device found, idVendor=1058, idProduct=25a3, bcdDevice=10.31
[1733128.214090] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[1733128.214095] usb 1-1: Product: Elements 25A3
[1733128.214100] usb 1-1: Manufacturer: Western Digital
[1733128.214103] usb 1-1: SerialNumber: 3~~~~~~~~~~~~~4B
[1733128.215383] usb-storage 1-1:1.0: USB Mass Storage device detected
[1733128.216550] scsi host6: usb-storage 1-1:1.0
[1733129.243066] scsi 6:0:0:0: Direct-Access     WD       Elements 25A3    1031 PQ: 0 ANSI: 6
[1733129.243911] sd 6:0:0:0: Attached scsi generic sg4 type 0
[1733129.247977] sd 6:0:0:0: [sde] Unit Not Ready
[1733129.247990] sd 6:0:0:0: [sde] Sense Key : Hardware Error [current]
[1733129.248001] sd 6:0:0:0: [sde] ASC=0x30 <<vendor>>ASCQ=0x81
[1733309.270105] sd 6:0:0:0: tag#0 timing out command, waited 180s
[1733489.286779] sd 6:0:0:0: tag#0 timing out command, waited 180s
[1733669.324287] sd 6:0:0:0: tag#0 timing out command, waited 180s
[1733669.324325] sd 6:0:0:0: [sde] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[1733669.324330] sd 6:0:0:0: [sde] Sense Key : Hardware Error [current]
[1733669.324337] sd 6:0:0:0: [sde] ASC=0x30 <<vendor>>ASCQ=0x81
[1733669.324343] sd 6:0:0:0: [sde] 0 512-byte logical blocks: (0 B/0 B)
[1733669.324346] sd 6:0:0:0: [sde] 0-byte physical blocks
[1733849.341196] sd 6:0:0:0: tag#0 timing out command, waited 180s
[1733849.341294] sd 6:0:0:0: [sde] Test WP failed, assume Write Enabled
[1734029.387442] sd 6:0:0:0: tag#0 timing out command, waited 180s
[1734029.387488] sd 6:0:0:0: [sde] Asking for cache data failed
[1734029.387493] sd 6:0:0:0: [sde] Assuming drive cache: write through
[1734029.469972] sd 6:0:0:0: [sde] Unit Not Ready
[1734029.469985] sd 6:0:0:0: [sde] Sense Key : Hardware Error [current]
[1734029.469995] sd 6:0:0:0: [sde] ASC=0x30 <<vendor>>ASCQ=0x81
[1734209.513860] sd 6:0:0:0: tag#0 timing out command, waited 180s

… the drive might still be fine if shucked, and connected to a decent SAS/SATA controller, where you might be able to format it, but this external drive seems bricked for all practical intents and purposes, … a pitty for a 20TB drive to end its life like this (about to return to Amazon using regular mail, RMA as broken).

n.b. I no longer have contacts in WD, people I knew worked there have moved on… there’s no one I know any more in a position to try to reproduce the issue at basically no cost to them other than time, or who can offer any tooling in the form of python scripts that send mysterious SCSI commands.

n.b.b. … sector writing is never atomic, regardless of sector size … by that I mean that if you yoink power from a drive while in the middle of writing a sector you may end up with an unreadable/bad sector that’s recoverable by overwriting it, true for whatever sector size. This is only an issue if you rely on software that relies on individual random 512 byte writes not blurring nearby sectors. As long as you use a cow filesystem (zfs, btrfs, bcachefs, …), or a filesystem that just happens to work in at least 4k chunks (ashift=12 or higher; realistically most filesystems read/write 4k blocks), or a 4k device-mapper volume there’s nothing to worry about.

Personally, I align my stuff on 1M boundaries, and use 4k block size LUKS on top of LVM - it works fine.

Mach3.2 · November 25, 2022, 10:41am

Drive is probably fine…

Did you try reformatting the drive while it’s directly connected to your mobo’s sata controller?

risk · November 25, 2022, 11:43am

Nope, I didn’t try taking it out of the enclosure, sent back to Amazon (RMA as broken), and the replacement is already hooked up running badblocks.

GigaBusterEXE · November 25, 2022, 4:33pm

And thus the circle of life continues

New or like new condition as far as Amazon’s concerned

rcxb · November 25, 2022, 4:34pm

It’s not surprising at all that a USB → SATA controller (that shipped with a 512b sector drive) can ONLY handle 512b sectors and can’t read the drive once you’ve reformatted it to 4Kn.

Trooper_ish · November 25, 2022, 4:53pm

would it help if returning, to sharpie “DOA / ded on arrival” on the outside? then the next victim does not even open it?

or warehouse ppl not care

risk · November 25, 2022, 5:49pm

It came in a retail box, and I was told to put some returns barcode paper on the inside - and send everything back in retail box, which I did.

It’s surprising for the drive to be advertising support for 512/4096 blocks, when it doesn’t, the usb->sata controller can rewrite what it advertises.

…

I suspect the fields are bogus, because this is what the previous generation 18T model reports

...
ATA device, with non-removable media
        Model Number:       WDC WD180EDGZ-11B2DA0
        Serial Number:      xxxxxxx
        Firmware Revision:  85.00A85
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
        Used: unknown (minor revision code 0x009c)
        Supported: 11 10 9 8 7 6 5
        Likely used: 11
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:    16514064
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors: 35156656128
        Logical  Sector size:                   512 bytes [ Supported: 2048 256 ]
        Physical Sector size:                  4096 bytes
        Logical Sector-0 offset:                  0 bytes
        device size with M = 1024*1024:    17166336 MBytes
        device size with M = 1000*1000:    18000207 MBytes (18000 GB)
        cache/buffer size  = unknown
        Form Factor: 3.5 inch
        Nominal Media Rotation Rate: 7200

256b/2048b sector size? Really?

diizzy · November 25, 2022, 7:14pm

Another day another dishonest customer and people are wondering why RMA sucks in US…

thro · November 25, 2022, 9:49pm

I’d say The drive is dishonest first

risk · November 25, 2022, 10:17pm

Here’s a crazy thought, how about making drives that won’t advertise to support 4k blocks, only to brick themselves when trying use the feature?

I doubt anyone at WD cares about making these better, if they did they’d have taken care of this basically software issue.

..on the RMA / return to Amazon path

I’d considered going through WD RMA process for a replacement, but as I live in Ireland, going down this route after brexit has become complicated, because a lot of companies pre brexit used to have stuff around Redding UK, … so now, RMA stuff sometimes ends up entangled in customs, or ends up taking weeks between random warehouses. WD just wasn’t specific enough what to expect ahead of time.

With Amazon it’s more hassle free - I got a replacement ready to be dispatched and an “international shipping invoice” for a return within 20minutes. Which worked, but the international-ness was ridiculous, the destination address is in another part of Ireland, the company on a shipping invoice is Amazon SARL (Luxembourg limited company) with an address in the UK, and the tracking number for destination starts with CH (Switzerland), despite destination address being in Ireland.

There’s a reason they employ 1.4M people, I guess a significant fraction deal with this international logistics bs.

I did tell Amazon the drive is broken, and it never finished formatting before it stopped working and I couldn’t even start using it … their rep over chat asked if I tried a different cable.

I’d have preferred to send it to WD, I have an account on their support site for RMAs from before, I think maybe they could have recycled/refurbished it somehow, … but this time I bought stuff from Amazon and WD isn’t as convenient to deal with.

diizzy · November 25, 2022, 10:27pm

It’s not specificed or supported, get another tier of drives if its important to you. That still doesn’t justify you bricking the device or did you actually submit the actual cause of failure?

risk · November 25, 2022, 10:50pm

I guess I implied the operation was supported, because hdparm returned 512b and 4096b as the two options, and trying to set it to 4096 returned success, just before the thing bricked itself?

I’m curious why that be a wrong assumption in your view?

(Also, I checked various utilities and spent about an hour researching how to change the drive firmware before giving up and opting for some exchange)

The Amazon return web form had a box for comment that was limited to about 100-ish chars where I filled something out, and the person I chatted to didn’t seen interested in the detail.

I’m assuming if the drive ever reaches someone who care, they’d probably disassemble the enclosure and extract the drive, and if it reaches WD and and they care, they’d probably notice the sector size change, and if they don’t care why should I?

gysi · November 25, 2022, 10:53pm

The drive probably by itself supports 4K blocks, but not the USB driver/firmware. I guess it would have worked if you would have taken the drive out and connect directly via SATA.

I also think now probably another customer will have a problem with the drive.

risk · November 25, 2022, 10:54pm

Why would they send a reported broken drive to another customer?

gysi · November 25, 2022, 10:59pm

Well, I’m just sceptical after what happened to Steve from gamer’s nexus.
Having someone around to actually check all the RMA stuff is probably too expensive

Quension · November 25, 2022, 11:15pm

A device bricking itself in response to a standard command it claims to support, on top of a standard protocol it’s literally designed to support, is absolutely the device’s fault.

The PSA is useful, and could have gone unreported for some other poor soul to make a mistake. Some of y’all are way off base here and could do to brush up on the forum FAQ.

jaskij · November 25, 2022, 11:37pm

USB3 packet size is 1024 bytes… So yes, you get less overhead, but only half the overhead, not the eighth you imply.

From the talk in Discord, I recall you actually went through Amazon’s RMA process, right? Might want to word that better next time, so people don’t jump you.

Might be that the drive does support 4k, and WD just dropped the ball and didn’t filter that out in the SATA <-> USB converter. I feel it likely even. The case would’ve probably been different if the HDD had a native USB controller, as the unshuckable drives do.

If risk went through Amazon’s RMA process, as seems to be the case, and Amazon fucks up and sends it to someone else, it’s on them. Not everyone religiously watches GN.

gysi · November 26, 2022, 12:46am

I’m not familiar with those WD drives (only have some SATA WD Ultrastars), but my guess is that just some usb firmware/config change to look for 4K sectors instead of 512b sectors would be necessary for the drive to work again. It might also be that some of the WD disk utilities might have resurrected the drive, or would have done the necessary changes on the usb side. I don’t know.

Anyways that’s a moot point as Risk no longer has the drive.

It seems to be an oversight by WD a la “this is marketed for consumer, we don’t expect anyone to change settings on linux”. That would be my though

Yes the PSA is usefull for sure. That could have happened easily to me as well

jaskij · November 26, 2022, 1:00am

Exactly.

Personally, I don’t have enough data to require HDDs, and honestly started only having interest in this stuff after all my storage was SSD. I’m scared of HDDs now.

Trooper_ish · November 26, 2022, 1:47am

not sure how much I like Amazon just intending to slip stream known bad back into circulation, if at all. but look like Risk reported problem, and Amazon agreed RMA.

Drive should be DOA, not recirculated, and if it is, Amazon are the assholes here.

Amazon could have chosen not to RMA / return? but they don’t much care,. and go path of cheapest resistance