Position of files on disk in XFS file system

The story:
I am running unRAID on my home media server. I was moving tens of TBs of data from one server onto unRAID and over-filled some of the drives. I have plenty of space on the pool (Total: 175 TB, Free: 58.3 TB), so now I would like to move some files of those over-filled drives, into the emptier ones.
All drives on my unRAID have XFS file system on them.

The goal:
All those over-filled disks have mostly been filled in one swoop - as in I used Midnight Commander and moved files from one disk onto another, until that one was “full”.
When I was moving the files onto those drives, I assume the files were written to the disks in the manner of fastest disk place to slowest. If that’s true, I would like to move the latest files, as they would be in the slowest place on the disk (I assume nearest to the center of the platters).

I am a total linux newb, so lemme say sorry in advance, if my questions are silly…

Questions:

  1. Am I right assuming, that XFS filled up the disks like I’ve described: fastest-to-slowest disk place?
  2. If yes, how can I find out which files where written to the disk the latest?
  3. One idea I had, is maybe using the inodes? Does XFS create the inodes sequentially?
    I mean, if it does, I could sort files by inodes (from smallest to largest number), and this way I could find out, which files should be latest written onto disk, and which ones should I move of…
  4. Maybe there is some other, better/easier “linuxy” way to achieve this? A bash command, or script?

I hope I am making some sense… :slight_smile:

Thanks in advance for any help.

The only filesystems with sequential inodes is vFAT.

All the Unix filesystems place data far apart.

As to the second part of your question, I am not sure.

The “linuxy” way of moving “older” data off to a slower store would be to use utilities like “find” to identify directories and files that are older than “N”.

This does require that you’ve preserved dates on your files/dirs when you’ve moved things around. It’s always a good idea to use cp -a or rsync -a when moving lots of data around to ensure that you are preserving such things for just this reason. -a is the “archive” which generally lumps in other things like -p for preserve permissions, timestamps and ownership.

p.s. rsync is generally superior to cp as well as you can start/stop it and avoid duplication. Both support preservation of sparse files (cp --sparse=always and rsync -S). Rsync also supports check-summing to catch corruption when it happens rather than later when its too late.

rsync -avchS --> archiive, verbose, checksum, human readable, preserve sparse

1 Like

Does this basically mean, that there’s is no way to actually discern which files where written to the slowest part of disk by any type of file time-stamp?

If I’m not mistaken, when using find I could search by using access, modified or created time… Which would not help me to find what I want, if I am not mistaken…
I want to find the files, that were written to the disk the latest. Or rather, I would like a way to sort all the files in the disk by the time they were written to that disk.

Does any time-stamp of the file is changed, if the file is simply moved from one disk to another? Because that’s what I did. I simply moved files with mc from one unraid disk to another. Would that have changed any/either created or modified time? I maybe be wrong, but simply moving files from one disk to another does not change these times, no?

Unless there is another type of linuxy time-stamp I could use, that I do not know about… I am windows user, sadly…

As it stands - all the files should have original created and modified times, as pretty much none of them have ever been actually modified.

Basically, if I can get a listof all the files on a disk with find or any type of other command, with any type of time-stamp which rpresents, when the files have been written that disk - that would be the thing I need I guess.

I thought created and modified times do not change by moving files by default? Or am I mistaken?

Thanks for tips on rsync and for commands…
I am trying to learn linux bit by bit, although very slowly. Bash is still the biggest hurdle, coming from windows-only background. I will try to get into rsync eventually. ATM, I am using an unBALANCE plugin for unraid, which is actually using rsync to move the files.

Whether or not the creation time-stamps were preserved depends on the tool.

Easiest way to check is just look at your files and see if your timestamps appear to have been retained. If everything has your copy day as their date, then they haven’t.

If they have you can find by creation or access time or use a benchmark file (pick a file/dir that will serve as the cut-off - anything older is moved, anything newer is preserved).

This would find all dirs newer than 2 months:
find ./ -newermt $(date +%Y-%m-%d -d ‘2 month ago’) -type d -print

You can replace month with day, week, year, etc…

You can replace -print with
-exec ls -la {} ;

Where “ls -la” could be any command - such as cp -aR {} slow_store/.

“{}” is the current match…

find has a pretty rich set of tests based on all manner file stats.

find /path/to/check -type -d -mtime +5 -exec mv {} slow_store/. \;

This would would all dirs with a modification time older than 5 days and move them to “slow_store” whatever that needs to be…

1 Like

Thank you for your help. I will try playing with the find command. It’s a long read :slight_smile:
find(1) - Linux manual page

It is… I typically just google what I need to do if I don’t remember it…

Then use -exec ls -la {} ; as a test target rule to find to confirm that I’ve found the right files before committing…

p.s. it looks like the forum is removing the backslash prior to the semi-colon in my examples above… So, use care if when/if copying…

find /path -type -d -mtime +5 -exec ls -la {} \;

1 Like

Thanks again.
Linux is awesomely powerful, when you know the commands. Although I feel sooo stupid ATM. I feel like ~25 years ago, when I bought my first ever pc and didn;t know how to do anything… :slight_smile:

1 Like

Indeed - it is a steep learning curve - steeper if you really use all it has to offer,

As an example that, this use-case you are talking about (relocating files) quickly has you in the world of using sed, awk or perl to modify the path as a regular expression so that you can preserve directory structure in the target file-system.

In my examples above - investigate the use of depth control (-d) to avoid recursively moving/copying things… That might keep you away from more complex sed/awk/perl transforms of paths…

This is because you are not putting your code in markdown tags.

In-line, use grave ticks.

`var code = 'goes here';`

Yields:

var code = 'goes here';

Mult-iline, uses tertiary grave ticks.

```
this is a multi line
example!!!!!
```

Yields:

this is a multi line
example!!!!!

Note, for multi-line you will have to use spaces for indentations.

1 Like

I am unsure. There might be something you can check within the XFS metadata. Although, linux filesystems are smarter than Windows-y ones about file placement; which is why you do not (usually) need to defrag hard drives with linux-y filesystems.

Although, in my situation, I was moving files onto completely empty, freshly formatted XFS drive. I filled the whole drive in one “swoop”, as in one move command with Midnight Commander (selected exactly right amount of folders to move).

Basically, it was an empty 10TB drive, and after 1 move command it had only ~800GB of free space. Question is, how did XFS write the files onto the drive? Sequentually, or did it scatter them around, even though the drive was empty with 0% fragmentation… :question:

If XFS did use some tricks to scatter the files around, it seems no point in me even trying to find the latest files written by any time-stamp.

At this point, this whole experiment feels pointless :slight_smile:

Checkout this post:

And since XFS has metadata journaling, blocks were written with millisecond accuracy.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-xfs

I do not know how to access this information though, but hopefully I can get you in the right direction.

Now you can actually defrag XFS, check that out here

# xfs_fsr /dev/sdXY

With the arrival of SSD and nvme/optane, your time is likely better spent on setting up cache-drives to accelerate frequently and recently used items rather than trying to optimize where they fall on the disk.

I haven’t spent enough time with unRaid specifically, but most NAS configurations have provision for SSD cache drives and they are very effective for such things.

@Dynamic_Gravity @cekim
Thanks for all the info, guys …

This was a simple one time deal thing/idea. I had 2 hard drives that I have overfilled, not even that (they still had ~10% of free space left), but I had so much free space on the other drives, I decided to “explore” the possibility to “find” the files that are on the slowest part of the drive(s) and move those files.

Once again - this was a 1-time specific situation - I had 2 completely new, empty, and freshly formatted 10TB drives. I filled them up - at once, not over-time. Then I decided to some files of, and got an idea, to “find” the files that are in the slowest places on actual physical disk. And with having a little a bit of free time, I decided to ask, if and how I could do it.

TBH, after one day I already decided to drop it. I already moved some files out of the drives. This was taking too much time, and being a non-problem - I had to drop it.

Anyway, as I am actually just starting to learn linux, any and all information and tips is always welcome and appreciated

Once again, thank You, guys, so much for your time and replies.

1 Like

You’re welcome. And I thought it was interesting. I know this is something they used to do back in the day to improve performance.