HDD sectors - equal in radians or in length?

programster · May 2, 2017, 5:32pm

I'm currently taking a closer look at block storage and how filesystems actually store data but have stumbled on trying to figure out whether disk sectors are equal in radians or in terms of length.

This can best be described by the two following images. One showing equal lengths:

and the diagram from the wikipedia page showing equal in terms of radians (pie slicing):

This sentence from tldp.org has both in it:

The surface area of your disk, where it stores data, is divided up something like a dartboard — into circular tracks which are then pie-sliced into sectors. Because tracks near the outer edge have more area than those close to the spindle at the center of the disk, the outer tracks have more sector slices in them than the inner ones. Each sector (or disk block) has the same size, which under modern Unixes is generally 1 binary K (1024 8-bit bytes). Each disk block has a unique address or disk block number.

If the tracks towards the outer edge have more sectors than those closer to the center, then the disk is not pie sliced. If the sectors are the same length in terms of circumference instead of radians, then surely this means that the data towards the outside of the disk can be read sequentially a heck of a lot faster than the data towards the center if the disk maintains a constant speed of 7200 or 5400 RPM. This might legitimately be the case, in which case I assume disks are written from the outside in to take advantage of this so that the first partition (usually for boot) is fastest?

jak_ub · May 2, 2017, 11:49pm

If the tracks towards the outer edge have more sectors than those closer to the center, then the disk is not pie sliced. If the sectors are the same length in terms of circumference instead of radians, then surely this means that the data towards the outside of the disk can be read sequentially a heck of a lot faster than the data towards the center if the disk maintains a constant speed of 7200 or 5400 RPM. This might legitimately be the case, in which case I assume disks are written from the outside in to take advantage of this so that the first partition (usually for boot) is fastest?

The above quoted part is very true. As a main testing tool I use HD Tune and they have basic test that demonstrate just that:

http://www.hdtune.com/

Up to the point that one can argue (or actually calculate) if old VelociRaptor 10'000 RPM SATA disk was actually worth its price given that it had only 1/5 of the capacity of normal SATA 7200 RPM disk (in which you could just do the 1/5th of the size partition in the begging).

As for the sector and size in radians vs size in length both pictures are correct. There is simply time difference in the technology that evolved over time. And naming too, see LBA for example:

MarcT · May 3, 2017, 5:14pm

On modern hard drives, the sectors are equal length - so there are more sectors around the outer edge, and those will indeed have a faster transfer time than the ones near the centre.

Floppy drives tended to have a fixed number of sectors per track, but even then some systems (eg Amiga) found ways of formatting their disks to squeeze more data into a track (in the case of Amiga, by omitting inter-sector gaps). This of course made their disks incompatible with other systems, including PCs.

Peanut253 · May 3, 2017, 10:30pm

The name of this is Zone bit recording btw.

No.

/rant start

Disk controllers live in their own little world completely divorced from reality and they intentionally lie to us about their contents in every conceivable way. In relation to the actual physical media, there is no way to instruct a disk controller to put any data anywhere specifically or to identify where on the media the data has been placed (except experimentally). We only know that we write to cylinder 0, head 0 of track 0 and that somehow the disk returns that data back later on during reading.

wmic diskdrive get interfaceType,model,bytesPerSector,sectorsPerTrack,totalCylinders,totalHeads,totalSectors,totalTracks,size,name /format:list

gives

BytesPerSector=512
InterfaceType=IDE
Model=Samsung SSD 850 EVO 250GB ATA Device
Name=\\.\PHYSICALDRIVE5
SectorsPerTrack=63
Size=250056737280
TotalCylinders=30401
TotalHeads=255
TotalSectors=488392065
TotalTracks=7752255

My SSD does not have any "heads" and physical hard drives of similar size definitely do not have 128 spinning platters (heads/2).

My point is, if you are trying to learn about computers, learn to think of everything in terms of stacks, interfaces and compatibility. That will help you understand technology in more tangible way than trying to understand ZBR.

An operating system cannot understand the physical media because it would then have to understand it, limiting the ability of the media to change and providing unnecessary complexity in operating system design, e.g. storage space on floppies has not increased compared to HDDs. Neither can the storage driver know anything about the physical media for the same reason. The only thing on this planet that understands how that data was written to the disk is the PCB on the disk itself, and the only thing anyone else needs to understand is the interface: IDE which means having physical heads/cylinders/tracks, regardless of whether or not the media (HDD/SSD/flash drive) actually has any such characteristics. The media itself does not dictate how it gets used (optimal configurations are not guaranteed). It merely functions in a compatible way with the layers above.

/rant end

dmj · May 4, 2017, 12:52am

Yes. That's exactly how spinning rust works. Example:

While HDD controllers do lie a bit about drive real physical geometry, disks are getting filled from the outside in.

Here's a good read about how it works (although they talk about it from a very... special perspective):

and there's a link to hddscan with further low-level explanations.

Peanut253 · May 4, 2017, 7:58am

Thanks for the links. Hopefully it will help programster understand disks better. Your post is what I was referring to when I said "experimentally."

Naturally hard disk manufacturers will want to make sure their devices operate as efficiently as possible and, in light of the medium, that means designing controllers that will write to every outermost track of every platter before moving to the inner ones. That cannot be guaranteed, and are disks are not required to operate this way. Nor can they be instructed (except experimentally) to specifically write from the outside in or visa-versa.

My core point, is that to understand technology in general, a stack based approach is more appropriate, instead of getting mired in the details of any one layer, unless actually appropriate. The forensics link explains the bad sector replacement table part of modern disks, a concept heavily utilized in more modern SSDs, and further emphasizes the point that we do not really know what disks are actually doing on an individual sector basis.

The hddscan.com link is more... an engineers knowledge of exactly how disk controllers write data. Skarry stuff o.o

jak_ub · May 4, 2017, 9:59am

that means designing controllers that will write to every outermost track of every platter before moving to the inner ones.

I think you are going too far with assuming that remapping feature is being so randomly used. Yes, bad sectors are being remapped. Yes special areas exist (and we do not know by normal means where they are). But other than those exceptional breaks, the sector order of user data is predictable and this is being in fact used to the benefit.

There is one huge reason for that. And it comes from the already mentioned characteristics of the disk storage. Additionally unlike other mediums like SSD/NAND any change in natural (physical) order would mean heavy penalty in performance (need of relocating the head - the further the worst). And any remapping could potentially change any read/write operation into the worst case scenario.

All software (OS) actually almost always uses this predictable nature of HDDs to the benefit. Leaning to keep, if possible, all data in the beginning of the partition. And all software (OS) actually tries to keep files together and also sequentially written (or at least in bigger chunks).

As with any abstraction layer of either hardware or software nature that tries to hide details of the stuff behind it, some characteristics prevail from the top to the bottom and are actually used to the benefit. But as you said no guarantees (just 95% probability that such approach will work).

Peanut253 · May 4, 2017, 11:45am

I do agree with most of what you said, but disagree on the ramifications of the underlying technology, specifically in regard to the intention of software enginners and the ability of the software itself to understand the hardware.

SSDs are addressed in exactly the same way HDDs are. Essentially that means USB flash drives are constantly being asked to read/write from cylinders/heads/tracks. This highlights the level of discontinuity between what software actually understands and the media itself.

A piece of software that is optimized to read from HDDs can and will write to said flash media using that same "HDD optimized" code. The only way it does not is if the software engineer specifically trys to detect the nature of the underlying media and deactivate/disable that code path. This is... literally the last thing on the mind of modern engineers. "Send it to the disk, give it back" is about the level of interaction software people want with the details of hardware engineering.

For OS or storage driver level engineers, it has become exceedingly difficult to identify where optimizations that take into account the media characteristics can actually be performed because there are so many layers between the software and modern physical media. Basically, they are only known because of "weird" performance drops. Wendel posted an interview with a ZFS enthusiast that highlighted at one point that Intel (as a storage controller manufacturer) funded some research into why certain ZFS configurations relating to the number of drives used when in certain settings were not getting the expected thoroughput. The point being that engineers who write the drivers for storage controllers don't even care what the media is actually doing unless they have to and that software must be intentionally crafted to take into account media characteristics.

This is exactly wrong, and the word you were looking for was "contiguous." Also: files are written to "file systems," an abstraction typically placed on top of a partition, both of which are logical structures not comparable to disks, which are physical structures. So anyway, if what you are saying were true, then fragmentation would never be an issue because "all software (OS) actually tries to keep files together and also sequentially written." Yet, fragmentation is always an issue exactly because files are not guaranteed to be written in a "contiguous" manner to file systems, let alone written that way to the actual disk. Writing to a file system/partition != writing to a disk.

From a software engineers perspective, a file system is just a big square block of free space. Obvious algorithm is obvious: If a file needs to be written, then write it to the file system starting from the first available cluster. If the file is large, and a "used" cluster of sectors is encountered, then skip over the used clustors and continue writing starting at the next available cluster. NTFS/EXT4/FAT32 and the software that reads them typically works this way. I am not sure about HFS+/OSX and ZFS is an entirely different can of worms.

Hard drive engineers are motivated to initially give their drives the best performance possible, and file system engineers write files in the way that makes the most logical sense. It is a happy coincidence that the sequential reads/writes that are easy to perform on the media just so happen to (initially at least) match the most logical way to write files to file systems. Why do we know this is a coincidence and software is not intentionally designed to write files contagiously?

Because files are written starting from the first available cluster or set, not actually actually written to file systems as contiguous, logically coherent "files," which precludes the possibility of them being written to disks contiguously.
Software engineers only care about writing to file systems, not to disks. Having to care about the storage media characteristics for optimizations means a lot of extra work that no one, not even Intel, will devote any resources to unless they can identify a bottleneck.

This is exactly why fragmentation is a problem. Because os-level engineers do not typically do not use any storage media characteristics when writing software. Even the basic notion of keeping files contiguous relies on extra software (defragmentors) to clean up the problem after creating it because modern software is so dumb when it comes to the basic properties of the storage media.

The idea here, as it was in my first post, and my message to programster, is to look at computers in terms of stacks, with each layer distinct from the layers above and below. Any one layer, by design, should not know the details of the inner workings of another layer, only the interface used to communicate with it. This is exactly why writing to the first platter of a USB thumb drive works and why software cannot reasonably expect, much less guarantee, that any particular datum will be written to HDD in any particular way, only that it can expect to get it back when it asks for it later (probably).

Edit: clarified first sentence of second to last paragraph

dmj · May 4, 2017, 12:14pm

Yes. By LBA.

No. CHS addressing isn't used for a very long time already. PIO, DMA, SCSI Read/Write commands use LBA, which is just a block number, basically. Controllers do all the job translating LBA into CHS (for HDDs) or into whatever flash memory uses (bank-row-page, probably).

programster · May 4, 2017, 1:26pm

Hey guys,
Thank you all for your contributions. I definitely have plenty to go on here. Some of you have really put in a lot of effort/material. I particularly like the tom's hardware graph showing the performance against how full the hdd is with the clear steps.

Adubs · May 4, 2017, 1:27pm

Seconded, I hope they continue. I'm learning.

jak_ub · May 4, 2017, 2:23pm

I think you put to much attention to such details like head/cylinder/cluster instead of overall characteristics like semi/non-random access memory storage vs random access memory storage.

In addition to that at one point you acknowledge that there is a software dedicated to defragmentation and optimization of file placement on the disk. At another point you state that no software engineer ever ever would wrote such software.

Having to care about the storage media characteristics for optimizations means a lot of extra work that no one, not even Intel, will devote any resources to unless they can identify a bottleneck.

There is absolutely no effort to try to write growing file into sequential blocks. Actually it is easier than selecting completely new place (when the next sequential block is already occupied). Top level examples would be: swap file in the OS is either preallocated file (at least partially) or even separate partition? Sure, for one it skips the file system allocation (as allocation at random moments might introduce random delays), but also gives potentially good sequential access (which is important for HDDs). That becomes less and less important with SSDs. But we all started this discussion about HDDs.

I'm not sure what you miss exactly from the whole picture or maybe you wrongly assume that we miss something from our picture (e.g. we all think that all files are always sequential, or we are not aware that the current API used by applications do not operate on cylinders/heads/.. ).

Peanut253 · May 4, 2017, 7:01pm

/sigh My post was carefully worded. Read it again. A programmer is not a hardware engineer.

There are different engineers for OS-level structures (file systems and related LVM software) than there are for application layer software (defrag), than there are for disk controller driver software (Intel), then there are for the hardware controller PCB itself (WD/Seagate).

The existence and relative popularity/usefulness of defrag software proves that OS-level engineers typically do not care about how files actually get written to storage media.

In order to write a single file in a sequential block, that potentially requires effort by every engineer type that I listed above. This effort is non-trivial due to the many associated layers that must somehow do that AND remain backwards compatible, hence requires substantial incentives to even try such a feat.

Except swap files are neither preallocated nor contigious, only in part. They can dynamically grow, fragmenting them without any regard for their contiguity on the storage media. In addition, preallocated files are not guaranteed to be contigious in the first place because OS-level engineers can't be bothered to care. Their solution? Just move it to a different partition if you actually care about that.

I just checked right now, my swapfile is in 957 fragments and that is after making sure it was in a static 2048 MB allocation after I did a fresh install, to prevent it from becoming even more fragmented. The reason if of course that even static swap files are actually dynamic in the sense that they get re-allocated at every boot. And, since allocation cannot gurantee either sequential or contiguity, even if starting from a large group of empty sectors, this will just naturally happen over time as the disk fills up over time and the OS is rebooted constantly.

It is the answer to this question:

so where should the OP go next to learn more about how exactly computers work and technology next?

Ultimately is his choice, and seems to have benefited from the discussion so it is worthwhile.

Anyway, my answer is to think of things in terms of stacks, and different engineers designing different layers that have to interact in a compatible way, necessarily black-boxing every layer from every other layer.

I do acknowledge that sometimes you can gleam information through multiple layers, like defrag software getting a picture of a file system, that it assumes has some passing relationship to sector allocation at the actual disk. There is no way to know that because there are too many layers between a file system and bits on a disk. It is an assumption, not guaranteed, that can be checked experimentally, and should be treated as such. My USB flash drive does not have platters, but my OS thinks it does. Check for yourself:

wmic diskdrive get interfaceType,model,totalHeads /format:list

Your answer seems to point more towards focusing on APIs/addressing schemes believing they relate to the inner workings of the media. (?)

My core disagreement being that not focusing explicitly on numerous layers involved in the overall design takes into account that the internal workings of whatever structures on whatever layer being discussed too much. Frankly, that is less important than understanding the entire system overall.

The second disagreement being more specific in that that layers normally obfuscate the internal workings and that obfuscation is deeply magnified when crossing the software-controller layer in the OS with the PCB controller on a physical disk. My SSD does not have platters! Layers lie to each other, every day, all day long. These deliberate lies and their necessity in the name of compatibility are a core take-away from understanding computers, first and foremost, as stacks. We should not pretend to know, and cannot reasonably to know what lies beyond a layer from the perspective of any other layer. We can make reasonable assumptions in light of how the technology was engineered and run experiments to falsify our conclusions, but, from the perspective of another layer, we cannot know.

This is why computers work as well as they do. As long as a layer gets the request fulfilled by a layer below it, everything just works. An OS does not care about disks and fragmentation and contiguity because it does not have to and therefore should not. Specific applications can be created that care, but the OS doesn't have to; that's the point. An OS, from the perspective of an application requesting a file, only reads to fulfill requests from/to file systems. As long as the LVM given that request, returns the file, what does it matter to the application if that file is in a directory that is part of local media, hdd or ssd, USB flash media, or a network share speaking a compatible protocol, on a GPT disk or exFat or ZFS file system? Myprogram.exe does not have to care where cat.jpeg actually is in order to display it, just like OS-level software does not have to and should not take into account storage media characteristics. It cannot reasonably expect the storage media to always be the same, work the same, or benefit from the same optimizations and necessarily adds non-trivial complexity to the existing engineering design due to the numerous layers involved.

Edits 1 & 2: their vs there. Grammar so hard :(

Freaksmacker · May 4, 2017, 7:10pm

mmm windows or linux

dmj · May 4, 2017, 8:14pm

@Peanut253, what you're talking about is called "virtualization". Yes, there are layers upon layers of logical entities. Yes, it generally means that you don't care how exactly it's done on a layer below. But in reality, we do align partitions to 1MB (and I still remember aligning them to actual cylinders when disks were much smaller). If you're working on high-load infrastructure, you have to care about best practices set by vendors/manufacturers, which tell you not to mix different types of I/O on the same parity group, to change queue length depending on which storage you're using, to set I/O scheduler to nop or deadline. You have to care about what's down there, under all of these layers. Because otherwise every layer will add to latency or subtract from throughput, tiny penalty which adds up with every layer of virtualization.

And none of this argument has anything to do with the original question: Are HDDs (again: spinning magnetic platter data storage) written from the outside in? The answer is still the same: Yes, they are.

PS Oh, almost forgot: your FreeNAS must have direct access to physical disks, apparently. I guess it's just cranky for no reason. Right? =)

jak_ub · May 4, 2017, 8:27pm

I just checked right now, my swapfile is in 957 fragments and that is after making sure it was in a static 2048 MB allocation

Actually you proved my point. You do not have 524'000 fragments (assuming 4K cluster as a fragment). OS did what possible to keep them together (in 1MB chunks).
Look, my 32GB swap file on SSD partition is one continous file on the partition. The fact that you do not know conditions when such file and how can be created do not mean they do not exits.
And to be fair I was expecting this file to have many fragments.

This effort is non-trivial due to the many associated layers that must somehow do that AND remain backwards compatible, hence requires substantial incentives to even try such a feat.

The only compatibility and effort required is ability to count from 0 to N. And before going to possibly worst case scenario, just to check next cluster if it is free on the file-system level. And I assume most systems does that.

Except swap files are neither preallocated nor contigious, only in part.

How far you will go to take your single example and project it over every possible OS there is. I have partition with preallocated fixed size swap file. Every linux I installed suggested to create dedicated swap parition.

We should not pretend to know, and cannot reasonably to know what lies beyond a layer from the perspective of any other layer.

The what? If that is your standpoint in participating in the discussion then you just shattered the glass plate you are standing on.

My USB flash drive does not have platters, but my OS thinks it does.

And this proves what exactly? The point that nobody argued?

This is why computers work as well as they do.

No they would work very badly in such complete ignorance of how technology work. Abstraction that is provided by each layer is very good.

Myprogram.exe does not have to care where cat.jpeg actually is in order to display it.

And who said that EXE program must care about such things.

OS-level software does not have to and should not take into account storage media characteristics.

Yes, especially the capacity should never ever be used by any other higher layer than the PCB on the disk. All that user should see should be the infinity

Look, in regards to software and hardware engineering abstraction is good, but total ignorance of potential best/worst case scenario is plane stupidity,
And when you see then best case scenario not always happen it does not mean that everything happens at random.

Home users do not need to usually care about that, but those that do not care about that completely usually also do reinstall of the same windows version because "it was about time/performing badly" (again not so much with SSDs nowadays).

dmj · May 4, 2017, 8:44pm

So he's like Seagate? Or was it Matrox? One of them had to recall a lot of HDDs back in... 2001, I think? Glass platters shattered because of a temperature gradient inside the drive, iirc.

jak_ub · May 4, 2017, 8:48pm

No, I do not think so (if you are referring to virtualization like VMware, Zen. etc. )
The abstraction layers @Peanut253 mentioned are within one single OS and hardware it self. The whole virtualization or even simple containers are simply yet another set of abstraction layers.

And when I well understand @Peanut253 point of view I simply fail to explain that it is not 100% true.

jak_ub · May 4, 2017, 8:54pm

So he's like Seagate? Or was it Matrox? One of them had to recall a lot of HDDs back in... 2001, I think? Glass platters shattered because of a temperature gradient inside the drive, iirc.

I did not know that. But that would be clever reference if used intentionally

dmj · May 4, 2017, 9:03pm

Abstraction and virtualization are going hand-in-hand together. When you go up these layers, say, from CHS geometry to LBA map, technically, it's virtualization. But when you're up there, you use an abstract "write" command and it doesn't matter for you if LBA will be converted into CHS or bank-row-page in the end.

You have a set of entities. Creating new logical entities based on those is virtualization. Using the same instruments on the final entities without caring what's down there is abstraction.
Another example: you have a bunch of disks. You combine them into parity groups, then combine parity groups into pool, then split pool into logical units. That's virtualization. You work with logical units as if they were physical disks - that's abstraction.
I hope it makes sense.