Why are hard drive sectors still 512 bytes by default?

Probably all hard drives today have physical sectors that are 4096 bytes long (+metadata).

And similarly, practically all hard drives’ firmware emulates the old 512 byte sectors to the OS.

4K sectors (aka Advanced Format) are 15 years old at this point. Windows has had support since Windows 8. Linux since kernel version 2.6-ish.

Why is the default still 512e?


P.S. Drives will correctly report their physical sector size (tho many SSDs will lie about it).
fdisk -l on linux
fsutil fsinfo ntfsinfo c: on windows
if you wanna check.

P.P.S. I dare anyone to report they’re running 4Kn drives. :grin:

3 Likes

There was a space efficiency benefit on spinning disks when 512n sectors went to 4k sectors, however the 512 emulation layer was added for various reasons with performance and better compatibility being the two highest.

Without getting into sector alignment, it basically boils down to 512e being more performant than 4kn on HDDs and for SSDs there is no difference.

History, plain and simple.

There used to be quite a bit of variety in the very early days, I think some 8" floppies might have had 128byte sectors, too. Even earlier magnetic drums and fixed head drives probably had crazy proprietary formats.

I remember some of the early attempts to switch to 2k and 4k sectors mostly for swappable media perhaps the Syquest SQ800 88MB 5 1/4" drives, but certainly with magneto optical drives, which I used a lot at the time in 3 1/2" and in capacities from 128MB to 640MB, which all used non-512b bigger sectors (probably 4k). The 550MB LIMDOW variants were using the same physical density as the 640s but finally offered 512b sectors and thus true compatibility with all kinds of operating systems plus direct sector overwrites while earlier versions had to rewrite tracks much like flash or SMR much later.

E.g. I distinctily remember having plenty of trouble with Solaris on non 512b media, while DOS and BSD seemed to have no issues for secondary media: I don’t think you can boot any OS with a PC master boot block on media that doesn’t have 512b emulation, because they all look for the 0x55 0xAA signature at bytes 511 and 512 of block zero.

For HDDs the overhead of sector gaps motivated the transition to 4k some time ago, but bigger sectors offered diminishing returns for tons of code breakage. As to the 512e emulation overhead the effort is so low in terms of what HDD controllers do these days, it’s simply not worth eliminating and backward compatibility is king.

4 Likes

Always appreciate some interesting history.

But yea, I suppose that makes sense.

I got some new drives recently to make a NAS out of some spare parts and put them in 4Kn mode. They’re obviously not boot drives, but we’ll see how that goes.

Tho I’m not sure if it should matter for boot in anyway - you can still put 0x55 AA in bytes 511,512 on a 4K sector too. And gpt puts protective MBR in block 0, but most modern BIOSes (i.e. UEFI) support booting from GPT - block 1+.

P.S. twin, that answer sounds like something ChatGPT would spit out.

lol, I was being a little vague because its super easy to say something wrong on an actually fairly complex topic.

I think something that shouldn’t be overlooked though is that for most workloads 512e is more performant than 4kn. There’s a STH thread with some actual benchmarks on this where 512e has a fairly strong performance lead over 4kn in a random fio workload; as I understand it the 512e hdd’s controller can coalesce read/write requests that a 4kn hdd drive can’t causing the performance disparity.
On the flip side there is a small benefit to 4kn drives in sequential workloads, but not nearly as big as the 512e random speedup.

Feel free to link that thread. And get into sector alignment. This is a technical forum. We can take it.

Otherwise (apologies if this sounds unkind) it just seems like you’re talking out of your bum.

1 Like

Here you are:

The findings align with my experience too which is why I wince when I constantly hear internet advice to format to 4kn for maximum performance.

I’m going to need to read up on the sector alignment before I say anything meaningful, I just remember they add a lot of caveats.

2 Likes

Thanks. Will check it out.

Btw, if by sector alignment you mean allocation unit/block misalignment issues on 512e - you may do so for the benefit of other readers, but I myself am pretty familiar with the topic.

Random rant:
FUCK serve the home. Registration on that site is impossible. I got like 10 capchas in a row. Gave up. Burn in hell.

1 Like

yup, that’s it.

Here goes a layman’s explanation for others:
Sector unalignment can happen on any drive that that has larger physical blocks than logical.

Under ideal conditions the logical block addresses align with the beginning of the physical blocks and all is well in the world.

Under very non-ideal conditions when the LBA is offset from the physical block and small writes are request of the HDD, the HDD is forced to shuffle data around kind of similar to how SSDs need to dump an entire page, modify a small portion of data on it and then write the entire page back to flash, the HDD incurs a performance penalty in this situation.

cool picture that brings more understanding to the issue than my explanation:
png

4 Likes

Quick alternate take with no fancy graphics.

There’s 3 properties to pay attention to here:
Physical sector size: The actual magnetic tracks on the platter.
Logical sector size: What the drive tells the OS it’s sectors are.
Block size: The minimal piece of data the the OS will write.

Problems happen when blocks misalign with physical sectors, because the drive is lying about its sector size(OS only sees logical sectors). It causes the drive to write 2 blocks for every 1 in an aligned scenario.

3 Likes

The 512e drive is able to consolidate various 512b updates fio produces in its internal in-flight buffers, e.g. for blocks 1,2,3 and 4 in a sequential or very localized write.

For a 4kn drive fio would send 4x4k writes and the drive would have to figure out that the data in 3 out of 4 “quadrants” did not actually differ, to coalesce the writes: very few hard drives are paid to operate on content like that, even if WD was trying to sell such technology recently (still do?), trying to take advantage of all that RISC-V intelligence.

So as long as fio writes smaller data blocks, 4kn is sure to loose, but that’s basically “holding it wrong”. Even on random 512b reads, the 4kn hard disk would always transfer 7 blocks fio didn’t ask for, keeping the bus busy.

With the type of controller power modern hard drives have (hey, they are much more powerful than my early 80386 and 80486 based Unix workstations and easily 1000x a VAX 780!), the 512 e emulation overhead probably counts in tiny fractional Joules. Sure, globally the difference might heat my home and run my car, but overall…

2 Likes

fio was reading/writing 16k and 4k blocks when showing superior 512e performance over 4kn:

RUN 1    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=16k --ioengine=libaio --iodepth=16 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
RUN 2    fio --filename=fiotest --size=4GB --rw=randrw --rwmixread=70 --rwmixwrite=30 --bs=4k --ioengine=libaio --iodepth=32 --runtime=120 --numjobs=8 --time_based --group_reporting --name=iops-test-job --direct=1 --end_fsync=1
512e read iops 512e read MiB/s 512e write iops 512e write MiB/s 4kn read iops 4kn read MiB/s 4kn write iops 4kn write MiB/s
RUN 1 359​ 5,62​ 156​ 2,45​ 298​ 4,67​ 130​ 2,04​
RUN 2 374​ 1,46​ 162​ 0,63​ 305​ 1,19​ 132​ 0,52​
3 Likes

I am starting to understand the plight of high school physics teachers.
No one ever cares about units. Everything becomes ambiguous and all you can do is guess.

Does fio ask for the sector (512 bytes), or for the block data structure the operating system works with? Because I read linux really likes its blocks 4K sized.
In linux drives are presented as block devices. So I will assume we are talking about OS-level blocks.
(See? Lots of ambiguity and assumptions.)

So fio literally cannot ask for anything smaller thank 4K. It will always get what it asks for, even if it is interested in only a chunk of that.

What does iops mean? What is 1 io on a 512e drive? The (emulated) hardware-level unit of 512 bytes? The os-level unit of 1 block (4K)?

These results make no sense to me.

I can already see where this is going. I’m gonna end up reverifying the results. (Replication studies, yay! :tada::tada::tada:)

Unfortunately I sent my drives off to bootcamp for a week.

I might have some results around this time next week.

3 Likes

I think some of the perceived IOPs reporting discrepancy is because of user level vs kernel level data access statistics.

There’s an nvidia deep learning developer blog that briefly discusses here:
Storage Performance Basics for Deep Learning | NVIDIA Technical Blog.

I had thought fio creates 4k-normalized IOPs numbers, but after reading that I’m not so sure. This wrinkle doesn’t discredit fios use for comparative performance tests however.

Please do! It’d be nice to have something to point to when advice is given.

2 Likes

In the real old days of HDD’s, the electronics were slow compared to the rotational speed of the platter. They had “sector skewing”, that is the sectors were not consecutive, one after the other. Sectors were numbered, i.e.; 1,5,3,4,2 for example, and were 512k in size because that was the largest amount of bytes + overhead CRC checking the electronics could read reliability process before the next sector was under the read/write head to be read.

Of course now speed of the electronics and CPU speeds are more than enough the read much larger sectors 4k with sequential sectors.

Anyhow that’s my take on it. I used to write in Z80 assembly HDD formatting software for CP/M.

4 Likes

@BruceD Neat historical perspective.

@twin_savage
I have completed the tests!

Here is the raw data:

fio sector test.zip (7.3 KB)

fio spits out a lot of numbers, but from what I can tell 4K sectors exhibit a consistent lead over 512 in an apples-to-apples comparison (same drive). A slight lead in random IOPS and quite substantial in more sequential workloads.

I’d appreciate help collating and interpreting the data. A table like the one above would be nice, if you know which numbers to put in there. :slight_smile:

Methodology:

I chose not to test both drives because switching sector size is a pretty dangerous operation (or so the firmware utilities warn me) - an untimely power outage or something might brick the drive, and every time the command is ran, there’s a slightly greater chance something might happen. (This is a small home NAS I cobbled together from spare parts, a UPS would be the height of luxury.)

Misc note: /dev/sdc proved slightly slower over the course of the burn-in test, finishing 1h 11min slower than /dev/sdb


I would post my results in the thread linked above, but I cannot register on that site, nor contact the admins about it.
Those with a working registration, feel free to post in my name.

6 Likes

The results are interesting, it clearly shows 4kn to be superior to 512e.
Now this has me wondering if this is universally true now for all drives (maybe 4kn performance was hampered in drives around the time it was a new thing?) or if this is something specific to the seagate firmware on this drive, because I’ve had anecdotal evidence in the past that 512e beat 4kn on all random workloads (I think I picked this thinking up not too long after 512e became popular).

Seagate ST18000NM000J:

512e read iops 512e read MiB/s 512e write iops 512e write MiB/s 4kn read iops 4kn read MiB/s 4kn write iops 4kn write MiB/s
RUN 1 479 7.677​ 207​ 3.323​ 486​ 7.780​ 210​ 3.368​
RUN 2 550​ 2.202​ 237​ 0.951​ 569​ 2.278​ 245​ 0.983​
RUN 3 205​ 206​ -​ -​ 257​ 258​ -​ -​
RUN 4 -​ - 186​ 187​ - -​ 245​ 245​

I’m registered there. I’ll post it to see what kind of response it’ll incur, hopefully more drives’ll be benchmarked to either show a trend that 4kn is better across the board now, or that different vendors have firmware optimized for 512e or 4kn.

4 Likes

Neat.

I believe the point some people were trying to make in the other forum is that OP tested with 4K on one drive and 512e on another drive.
Hence the results not being especially valid in the first place.
Like trying to compare a toyota on 98 fuel vs a honda on 95 fuel and trying to make conclusions about the fuel’s performance from that.

I’m really tempted to buy a 4kn version of the 512e drives I’m running to test now. The differences between Toshiba MG08SCA16TE (512e) and MG08SCA16TA’s (4kn) is more than just logical sector length. The actual firmware is different too, for example the 512e version of the drive implements a persistent write cache technology where as the 4kn one does not.

I was immediately curious why they’d make that choice; reportedly it’s to buffer the Read-Modify-Write cycles for 512 emulation, so unaligned performance doesn’t hit a wall when the standard write cache is disabled.