TLDR; reduced inode wastage hybrid filesystem.
0node filesystem project for #devember2022 is the result of the need to have IPv4 (and later IPv6) recorded in the filesystem as filenames where they may or may not contain actual data.
The Problem
Similar to the username and hostname as directory entries on web server hosting platforms from the days of Yore , this is really wasteful in filesystem usage (with 512 byte minimum bytes per single file/dir write), but with modern hardware this is like really, really wasteful (with 4KiB minimum bytes per single file/dir write). Each of those mentioned block sizes, in a filesystem context, are refered to an inode
.
Some sort of simplified fs (filesystem) that undoes this inode
waste would be really useful (especially when you consider a full IPv6 structure as a filesystem tree or as actual filenames, would take up 16 Billion inode
entries).
And so 0node-fs
was born.
A Real World Scenario
Roll forward to 2022 and my firewall project from last year has 14633 IPv4 recored as filenames, each file taking up 1x 4KiB minimum inode
entry (of which only a 25 character date format is used), and an maximum directory entry name length of 15 characters (wasting 17 bytes per entry)
So lets just calculate some data usage here:
- 14633 IPv4 as text = 206307 bytes (201KiB)
- 14633 IPv4 as 32 byte directory entries = 468256 (457KiB)
- 14633 files with 25 bytes each = 365825 bytes used (357KiB)
- 14633 files as 4KB inodes = 59936768 bytes reserved (57MiB)
Yeah! that last number is not a typo! Thats 57 Megabytes used on disk to record 357 Kilobytes of information!
And the information is only recorded in 1 directory. What happens when you need to have information spread across categorized sub-tree directory structure …
Another Real World Scenario
You have a allocate either Username or Hostname (domain name) or both on the filesystem. If you pre-allocate each 1 directory level for each letter, that is 26 directory entries point to 26 different inodes each with further directory entries. That consumes 106496 bytes or 104KiB on disk, with 0 files and 0 sub- directory entries in each.
As you can see, each sub-folder level, exponentially increases disk usage by 104KiB per level. If you take this to the extreme (26^26), it is quite easy to consume Gigabytes in this scenario, before any data is even recorded.
In fact, this very this very scenario happens every week somewhere around the world, as an unsuspecting developer, analyst or manager breaks down a word list into a structured filesystem catalog, only to find out they no longer have any disk space on said device.
In RAM Scenario
So, in theory, you could save that wasted space by constructing the filesystem in RAM, right? Well, almost, but even tmpfs
still uses inodes
in the true sense of there allocated size. ramfs
is the same (without the benefit of being sparsely allocated), but does have a compression option, which makes sense speed-wise, because its all held in RAM, so functionally is “not slow” by comparison to “on-disk-compression”.
Possible Solutions
Whats really needed is a filesystem that can only use the space it needs, and where multiple directory levels can be compacted into a single physical inode, when used on actual hardware or in RAM.
One possible way to sidestem the problem, at least for an “in-RAM-filesystem” is to use a storage-backed Memcache or Redis server. The problem here is, by there very nature, they are not designed for permanent or long lived data, and you still need a filesystem driver to use them “on the filesystem”. That’s not really an issue as there are Fuse filesystem drivers already, and developing our own is also not too differcult (debugging can be a different story there tho).
Another possible solution is to use an existing compacted filesystem inside a disk or partition image, and mount that.
It turns there are a few options (as you can see from the files I collected at the time I originally created the following 0node-fs.txt
file). What none of these option address, is the two (or similar) “Real World Scenarios” mentioned above.
Also, what if you current filesystem structure and not be changed (for whatever reason), but you still want to employ a sub-directory structure to simplify access (and in theory, speed up lookups).
0node-fs
must:
- interface and interact with OS filesystem
- bind a filesystem (it self) in between 2 endpoints on an existing filesystem
- reduce inode wastage “on disk”
- compress multiple sub-directory level entries into a single inode
- function “in RAM”, as “disk or partition image” and on real hardware.
optioanlly:
- provide high-speed lookup
- provide lookup shortcuts
- cater for “0 length files”
- cater for “directory only” filesystem structures
- cater for optional “re/create on demand”
Conclusion
Quite ambitious at first glance. But I had some very intense thinking session back before I outline (in my own mind) what is written above. Since that time, I have had a full year-and-a-half to mull it over, take in some other optional use cases that might be useful, and consider some (at least memory) related lookup schemes (including compression, which I have not mentioned here).
So, is it possible to complete in 2 months? Well for those to initoal scenario outlines (one of which I need), I think yes. Some of the optionals will need a testing application of some description, or at least a test-bed setup, to fully flesh out, and find those all important “corner cases” which when uncaught, are the usual failure of a good idea.
Do I think it has possibly use in the real world? Yes, I need it. Yes it does have other practical uses. Will anyone one else use it? Well that is yet to be seen.
There is an open-source distributed filesystem fork that recently (late last year) destroyed its user base because of programming decision made one person made. People still use that filesystem, even though you have to back-door the write facility.
I think various configurations of 0node-fs
have a very usable future, especially with “o length files”, especially if “lookup shortcut” and “high-speed lookup” are a reality, and especially when that is combined with someone showing the “bind endpoint” on a single “big data” directory is practical in the real world.