0node filesystem

paulwratt · December 6, 2022, 10:24am

TLDR; reduced inode wastage hybrid filesystem.

0node filesystem project for #devember2022 is the result of the need to have IPv4 (and later IPv6) recorded in the filesystem as filenames where they may or may not contain actual data.

The Problem

Similar to the username and hostname as directory entries on web server hosting platforms from the days of Yore , this is really wasteful in filesystem usage (with 512 byte minimum bytes per single file/dir write), but with modern hardware this is like really, really wasteful (with 4KiB minimum bytes per single file/dir write). Each of those mentioned block sizes, in a filesystem context, are refered to an inode.

Some sort of simplified fs (filesystem) that undoes this inode waste would be really useful (especially when you consider a full IPv6 structure as a filesystem tree or as actual filenames, would take up 16 Billion inode entries).

And so 0node-fs was born.

A Real World Scenario

Roll forward to 2022 and my firewall project from last year has 14633 IPv4 recored as filenames, each file taking up 1x 4KiB minimum inode entry (of which only a 25 character date format is used), and an maximum directory entry name length of 15 characters (wasting 17 bytes per entry)

So lets just calculate some data usage here:

14633 IPv4 as text = 206307 bytes (201KiB)
14633 IPv4 as 32 byte directory entries = 468256 (457KiB)
14633 files with 25 bytes each = 365825 bytes used (357KiB)
14633 files as 4KB inodes = 59936768 bytes reserved (57MiB)

Yeah! that last number is not a typo! Thats 57 Megabytes used on disk to record 357 Kilobytes of information!

And the information is only recorded in 1 directory. What happens when you need to have information spread across categorized sub-tree directory structure …

Another Real World Scenario

You have a allocate either Username or Hostname (domain name) or both on the filesystem. If you pre-allocate each 1 directory level for each letter, that is 26 directory entries point to 26 different inodes each with further directory entries. That consumes 106496 bytes or 104KiB on disk, with 0 files and 0 sub- directory entries in each.

As you can see, each sub-folder level, exponentially increases disk usage by 104KiB per level. If you take this to the extreme (26^26), it is quite easy to consume Gigabytes in this scenario, before any data is even recorded.

In fact, this very this very scenario happens every week somewhere around the world, as an unsuspecting developer, analyst or manager breaks down a word list into a structured filesystem catalog, only to find out they no longer have any disk space on said device.

In RAM Scenario

So, in theory, you could save that wasted space by constructing the filesystem in RAM, right? Well, almost, but even tmpfs still uses inodes in the true sense of there allocated size. ramfs is the same (without the benefit of being sparsely allocated), but does have a compression option, which makes sense speed-wise, because its all held in RAM, so functionally is “not slow” by comparison to “on-disk-compression”.

Possible Solutions

Whats really needed is a filesystem that can only use the space it needs, and where multiple directory levels can be compacted into a single physical inode, when used on actual hardware or in RAM.

One possible way to sidestem the problem, at least for an “in-RAM-filesystem” is to use a storage-backed Memcache or Redis server. The problem here is, by there very nature, they are not designed for permanent or long lived data, and you still need a filesystem driver to use them “on the filesystem”. That’s not really an issue as there are Fuse filesystem drivers already, and developing our own is also not too differcult (debugging can be a different story there tho).

Another possible solution is to use an existing compacted filesystem inside a disk or partition image, and mount that.

It turns there are a few options (as you can see from the files I collected at the time I originally created the following 0node-fs.txt file). What none of these option address, is the two (or similar) “Real World Scenarios” mentioned above.

Also, what if you current filesystem structure and not be changed (for whatever reason), but you still want to employ a sub-directory structure to simplify access (and in theory, speed up lookups).

0node-fs

must:

interface and interact with OS filesystem
bind a filesystem (it self) in between 2 endpoints on an existing filesystem
reduce inode wastage “on disk”
compress multiple sub-directory level entries into a single inode
function “in RAM”, as “disk or partition image” and on real hardware.

optioanlly:

provide high-speed lookup
provide lookup shortcuts
cater for “0 length files”
cater for “directory only” filesystem structures
cater for optional “re/create on demand”

Conclusion

Quite ambitious at first glance. But I had some very intense thinking session back before I outline (in my own mind) what is written above. Since that time, I have had a full year-and-a-half to mull it over, take in some other optional use cases that might be useful, and consider some (at least memory) related lookup schemes (including compression, which I have not mentioned here).

So, is it possible to complete in 2 months? Well for those to initoal scenario outlines (one of which I need), I think yes. Some of the optionals will need a testing application of some description, or at least a test-bed setup, to fully flesh out, and find those all important “corner cases” which when uncaught, are the usual failure of a good idea.

Do I think it has possibly use in the real world? Yes, I need it. Yes it does have other practical uses. Will anyone one else use it? Well that is yet to be seen.

There is an open-source distributed filesystem fork that recently (late last year) destroyed its user base because of programming decision made one person made. People still use that filesystem, even though you have to back-door the write facility.

I think various configurations of 0node-fs have a very usable future, especially with “o length files”, especially if “lookup shortcut” and “high-speed lookup” are a reality, and especially when that is combined with someone showing the “bind endpoint” on a single “big data” directory is practical in the real world.

paulwratt · December 6, 2022, 10:40am

NOTES: here is the directory I compiled at the time of creation

0node-fs:
total 20
-rw-r--r-- 1 pi pi   7177 Aug 28  2021 0node-fs.txt
-rw-r--r-- 1 pi pi 263079 Aug 29  2021 fsz.pdf
-rw-r--r-- 1 pi pi 202750 Aug 28  2021 fysfs.pdf
-rw-r--r-- 1 pi pi 118485 Aug 29  2021 IPC-Assurance.ps
-rw-r--r-- 1 pi pi  75067 Aug 29  2021 LEAN file system specification 0.7.html
-rw-r--r-- 1 pi pi  72477 Aug 28  2021 sfs-BenLunt.pdf
-rw-r--r-- 1 pi pi   9252 Aug 29  2021 sfs.html.gz
-rw-r--r-- 1 pi pi  25476 Aug 28  2021 SFS - OSDev Wiki.html
-rw-r--r-- 1 pi pi  24940 Aug 29  2021 sfs-V00.01.pdf
-rw-r--r-- 1 pi pi  54320 Aug 29  2021 SimpleFS-2009.html
-rw-r--r-- 1 pi pi  55132 Aug 28  2021 SimpleFS.html
-rw-r--r-- 1 pi pi 367068 Aug 31  2021 the_vfs_and_initrd.tar.gz
-rw-r--r-- 1 pi pi  25020 Aug 29  2021 vdisk-source.tar.gz

FuseApp:
total 36
-rw-r--r-- 1 pi pi  3882 Feb 18  2022 FuseApp-2.0.tar.gz
-rw-r--r-- 1 pi pi  6829 Feb 18  2022 FuseApp-2.0.zip
-rw-r--r-- 1 pi pi 12060 Feb 18  2022 FuseApp-3.0-hackaday-master-23cf354-220218.zip
-rw-r--r-- 1 pi pi  8408 Feb 18  2022 FuseApp-3.0-master-4a457ac-220218.zip

FYS:
total 19304
-rw-r--r-- 1 pi pi      269 Aug 28  2021 fysfs-modified.txt
-rw-r--r-- 1 pi pi   102188 Aug 28  2021 leanfsgui-0.6.1-squeeze.7z
-rw-r--r-- 1 pi pi    73811 Aug 28  2021 leanfsgui-0.6.1-src.7z
-rw-r--r-- 1 pi pi  2001687 Aug 28  2021 leanfsgui-0.6.1-win32.7z
-rw-r--r-- 1 pi pi  4532011 Aug 28  2021 lean_image.7z
-rw-r--r-- 1 pi pi 13042639 Aug 28  2021 usb_vol8.zip

paulwratt · December 6, 2022, 10:43am

NOTES: this is 0node-fs.txt originally concieved on 28th August 2021 (which is now not 110% understandable )



0 Node FileSystems

0400: IPv4 as file, size is counter only (0 content)
0401: IPv4 regular file content (content are log entries)
0410: IPv4 as directory, counter entries (0 content)

0nType=IPv4
0nTypeID=04
0nSubTypeID=00
0nAsciMin=7
0nAscii=15
0nAsciiMax=18
0nHex=(4x8)+1

84218421
100000 32

1410: directories only, sub-nets only (max 4 levels)
1412: directories only, variable sub-nets only (<= 4 levels)
1415: directories only, full IPv4, no sub-net mask
1418: directories only, with sub-net mask

SFS based 0node-fs with unix/posix attribues

SFS superblock

@off 0x0194 (SFS is of non-CRC type)
@off 0x018E (SFS is of CRC type)
struct S_SFS_SUPER {
bit64s time_stamp; // Time Stamp when volume has changed.
bit64u data_size; // Size of Data Area in blocks
bit64u index_size; // Size of Index Area in bytes
bit8u magic[3]; // signature ‘SFS’ (0x534653)
bit8u version; // SFS version (0x10 = 1.0, 0x1A = 1.10)
bit64u total_blocks; // Total number of blocks in volume (including reserved area)
bit32u rsvd_blocks; // Number of reserved blocks
bit8u block_size; // log(x+7) of block size (x = 2 = 512)
bit8u crc; // Zero sum of bytes above
};

(superblock block_size changed to bit8s from bit8u, + equals)
( 256 byte min block, so 0x02 = 2 = 512, where - equals 4096)
( byte min block size so 0xFE = -2 = 8192, log(x+11) 2^(x+11))
( log(x+7) 2^(x+7) )
in SFS CRC format, there are 2x bit8 after the superblock structure, before the volume signature. 0node uses this to store compression/crc (+1=CRC, + “index entry size” (default 64)

“start entry” inode number from end of volume (inverse LBN format - size of root table in “sector size” ), (–in SFS non-CRC–) this 16bit byte entry is located before the superblock, next to the timestamp.
( https://www.epochconverter.com/ )
(FIXME: move next, next to others that dont get updated)
2x 8bit bytes (1x 16bit byte) before “start entry inode” store 8bit bytes sector size (as opposed to default block of 256, or kernel block of 512), which allows for custom “sector” sizes. ie 64Bps read through kernel (512 bytes min) will read in 8x 64 byte index entries from the root table. 0 indicates no fixed size, read according to type, ie. index entry=64 bytes, file inodes=size bytes starting at LBA (not LBN) with implied “sector size” of 1x 8bit byte. This is to allow efficient ram based SFS disks “squeezed” into memory more efficiently.
8x 8bit (1x 64bit) before “block size in 8bit bytes” store index inode of root table (the inode that contains “start entry”) based on “sector size”

SFS extended superblock

0x0200 4 8bit bytes (for jump instruction - CPU agnostic)
0x0204 ??534653 (SFS?=LE) where ??= extended superblock type)
0x0208

ID’s for root table/index section/index data block:

(sub directories are optionally implied, ie OS/tool dependant)
(hard-links are implied, check all entries before free inodes)
(sparse files are also implied. new ID should be bitmap clean)
(hash should be of type “search optimized lookup” not of type)
( encryption, unless needed, eg: $6$ see Owl/tcp for Blowfish)
(dir entries only end with 0x00, if not 54/118/186 characters)
(file entries only end with 0x00, if not 30/94/158 characters)

                       0x00 // 0node-fs defined attributes

#define SFS_ENTRY_VOL_ID 0x01 // volume ID
#define SFS_ENTRY_START 0x02 // start marker
0x03 // (?) lookup marker ID
0x04 // (?) lookup marker (4)
0x05 // (?) hash lookup type ID ($6$)
0x06 // (?) text lookup marker (4+2)
0x07 // (?) hash lookup marker
0x08 // (?) hash/bad inodes
0x09 // (?) hash type ID ($6$)
0x0A // (?) hash lookup marker (8+2)
0x0B // (?) text lookup table
0x0C // (?) hash lookup marker (8+4)
0x0D // (?) hash lkookup table
0x0E // (?) hash lookup marker (8+4+2)
0x0F // (?) attribute table (see 1F)
#define SFS_ENTRY_UNUSED 0x10 // unused
#define SFS_ENTRY_DIR 0x11 // optional directory entry
#define SFS_ENTRY_FILE 0x12 // file entry
0x13 // (?) block device/lookup
0x14 // (?) character device/lookup
0x15 // soft-linked directory
0x16 // soft-linked file
0x17 // (?) block/character device
#define SFS_ENTRY_UNUSABLE 0x18 // unusable/bad sector(s)
#define SFS_ENTRY_DIR_DEL 0x19 // optional deleted directory
#define SFS_ENTRY_FILE_DEL 0x1A // deleted file
0x1B // (?) socket/named pipe
0x1C // (?) socket/named pipe
0x1D // (?) directory share
0x1E // (?) sparse file
0x1F // (?) socket/named pipe/attr
0x20+ // continuation entry (UTF8)
0x20+ // continuation entry (UTF16)
0x20+ // continuation entry (UTF32)
(UTF-8 vs UTF-16 Encoding | Which One is Prevalent? - TechDim)
" (double quote, 0x0022)
* (asterix, 0x002A)
: (colon, 0x003A)
< (less than sign, 0x003C)
> (greater than sign, 0x003E)
? (question mark, 0x003F)
\ (backward slash, 0x005C)
DEL (delete, 0x007F)
NBSP (no break space character, 0x00A0)
0x22 // meh, allow these
0x2A // IP(v4/v6) continuation
0x3A // IP(v4/v6) continuation
0x3C // endpoint/mount entry
0x3E // link/endpoint entry
0x3F // (?) something 8.3 entry
0x5C // (hmm) meh, allow these
0x7F // bad inode continuation
0xA0 // attribute continuation

SFS Partitions
If a MBR Partition Table is used, with one or more of these
entries pointing to an SFS partition,
each should be assigned the value of 0x53 in the ID field of
the partition entry. If a GPT is
used, the original author has assigned the GUID:
4EBF0E06-11BF-450C-1A06-534653534653 - SFSSFS
4EBF0E06-11BF-450C-1A06-313030534653 - 100SFS (SFS+FIX=v1.00)
4EBF0E06-11BF-450C-1A06-313130534653 - 110SFS (SFS+0ND=v1.01)
4EBF0E06-11BF-450C-1A06-314130534653 - 1A0SFS (SFS+CRC=v1.10)
4EBF0E06-11BF-450C-1A06-314230534653 - 1B0SFS (SFS+CRC+v1.11)
4EBF0E06-11BF-450C-1A06-xxyyzz534653 - xyzSFS (SFS)
x = major version 10 for original
y = minor version + 10 for CRC + 01 for 0node FIX (CRC in cnt)
z = (0node-FS attributes)
+01 for tables + entries (otherwise just entries)
+02 for (unix/posix) attributes
+04 text lookup
+08 hash lookup
+10 extended superblock
+20 special devices
+40 IP(v4/v6) entries (0node entries)
+80 Accordian FS (has endpoint)