Automated Network Threat Response

TLDR;
SS:FragWhare - Super Simple (fake?) Fire Wall (tweaked) -
Nginx blocked urls technical details -
On up-streaming address block requests -
A Simple hits/sec Algorythim (tweaked) -
On inode thrash & inode non-use -
On 0 inode use filesystems -
logd quirks, log quirks, GH and Devember 2021 -
Update to OP points & current status -
Devember2021 + GitHub link -


I’m woundering if anyone (incl. LevelOneTechs) has anysort of automated network threat response setup between their web server (or other services) and router/firewall boxes.

I’m in the process of writing a custom threat response system on my (hosted) web server that can automatically escalate (and de-escalate) a threat response based on log file entries, and woundered what others might be doing.

I know there is stuff like PSAD, that could be used, its designed to do threat analysis and could probably be used to automate something. but I am specifically targeting URI intrution, so I am (initially) targeting HTTP 404 response logs.

I have really low traffic to the server ATM, there is nothing else on this yet, because I need to have enhance security protection and threat management before I build the service I got the hosting for. So even though sshd is also a target, it not a priority yet.

http://psad.disloops.com - has a piece on “better installation” of PSAD (specifically on a RPi), but it uses OFW as well as iptables, and I need the least amount of services running on this box (its only 1G), so I am happy using ip-rules directly instead (for rule generation at least).

Because I dont have access to the network firewall/router, once I start a threat response it becomes harder (towards impossible) to see if the threat is still there or has gone away.

the response levels go (something) like this:

  1. capture bogus URI intrusions, redirect to 4Gb ISO as a 200 respose @1Kb/s rate limited
  2. dynamically add “deny IP” to web server config (and reload service or config)
  3. add ip-rule to interface

at level 2) and 3) I want to drop responses, as opposed to send back responses, blackhole as it were. At level 2) I can still see what comming in on the interface, even when access via the server is blocked. But at level 3) I can no longer see it the IP is still try to target my server. On a LAN or internal network, I can call out to query something (router/firewall) and see if the threat traffic is still there, and/or still high volume

When I am done with this I hope to be using a GitHub repository to dynamically build a “location” level “/etc/nginx/block.conf” include file for Nginx, and have the level 2) and 3) threat response levels automatically de-escalate. Level 1) responses get “access.log” as well, in a seperate file, which allows IP volume and response time analysis to be able to assist escalation and de-escalation.

I’m using Nginx + thttpd + PHP + crontab to automate everything. (incl. dumping http://server.com/favicon.ico connections if there is no HTTP REFERER passed along)

3 Likes

I don’t have any setup running right now, but take a look at Fail2ban and iptables (or other Firewall depending on the OS).
Very straight forward setup, and allows multiple actions to be configured.

Recommendation to set banaction into dummy mode for starters, and ensure your own IP is in a whitelist if possible (to not lock you out of ssh for eg. 30 minutes).

I am try to keep everything as low foot print as possible. I am using Alpine Linux, which uses BusyBox, so all shell scripting fits into that criteria. I am also looking into Owl & vzLinux.

There are some caveats go with this process though:

Like the fact that just about nothing supports --long-option-names, which just makes reading the shell scripts a bit more cloudy if you have never seen a command before.

Also in the process of creating an automated block.conf (and in order to make the result more readable and searchable) I found that sort --unique uses quite a bit of memory, and has to wait til the end of the piped stream to output anything.

It is faster (and lighter on memory) to rather pipe the results to the filesystem, and use touch to manage the unique part, and ls -1 to automatically sort the urls (ls does so by default) for input into the final stage. The optimal filesystem would be a cutomised RAM disk of some sort, as I am not storing any file contents (and if I were it would only be hits, so still way less than 1k per file is needed)

BusyBox date command does not output %N (nano-seconds), but at least it does not break either, so in-script timings with date +%s%N still functions correctly.


so now I am looking at processing hits per second, and I dont thinks shell scripting will cut it this time. But I have already decided its ok to process a folder of IP addesses (as files) against ip-rules, as this allows crontab @reboot to function also.

But I do think something that is easily editable/updatable is needed here too, so maybe not C/C++ binaries (which also need build-essentials etc.), unless something like vbcc would work. Alpine does come with Perl & Python3, but they are relatively slow (see: Daves Garage - software drag racing), so some thought is needed here …

In part this development process was started because of what I saw in log files, and the many days my hosting provider sent me emails warning me of 10 Mb/s outbound traffic, 25 Mb/s outbound traffic and the one that really pissisted me off 82 Mb/s outbound traffic - those were Mega Bytes per second too, not Mega Bits per second.

So after writing all the manual automation and some web based review pages, I can trigger the new generation of a /etc/nginx/block.conf file and restart Nginx - result max 25 Kb/s outbound traffic (and falling, see stats at end of post) and a reduction of 90%+ of inbound bogus urls.

It seems neither the automation scripts like any ISO being served as HTML, no matter how much they limit the returned data (as little as 4096 bytes), nor the individual users who try grabbing /.env and give up after 14 Mega Bytes of pure Windows 11 ISO served as text/plain @1Kb/s.

So I guess its working how I thought it might, now I have also started processing the /var/log/messages logs, as it appears most of that 25 Kb/s outbound traffic is actually failed sshd attempts. Counting IP addresses for invalid user, where the web page review of that data shows me a 50 hit limit per 24 hours on almost all attempts (I saw only one at 52), which indicates scripted automation of some sort. I recorded each IP address as a file with its content as the number of failed attempts, and I wrote another script to process the folder and insert any IP>39 hits into (eg:)ip route add blackhole 167.99.133.28/32, where the IP address gets recorded in another folder with the contents as the date (so I can remove them after a month - if I feel like it) - I can use the same ban script to add the odd HTTP intruder who “doesn’t get the hint” - have a total of 2x HTTP IP banned atm

I dont restart the server very often, maybe I’ll start doing it once a month, at least now I can easily add any IP address to ip-rules after a restart. The only real issue with my server banning an IP address at the front face, is that it can no longer see what sort of threat presence that IP address has (nothing get logged), so its going to be hard to determine a threat de-escalation response.

The other thing that has irritated me is not being able to pipe a name/ip to a file with contents that increment per pipe encounter. This would be a lot easier and cleaner than a shell script having to iterate files, grab their contents, add 1 to it. I decided tocat /var/log/messages | grep $IP | wc -l > $IP instead, to speed things up, but it still has to do it multiple times, so it might take a while on a large or heavy use system. I would rather make something that can be used in the place of .. | xargs touch that can be also be useful in a pipe (incl. the next pipe command).

Anyway, I am about to stick this up (modified for general use) on GitHub, and play test it for the rest of the month, see if I find any other edge cases that might break any of the scripts, then I’ll look at full automation parts. But I would like to address the inode thrash that will affect some systems (eg. RPi sd-cards, other SSD & emmc media).

stats:
max 20 Kb/s outbound traffic | max 1% CPU usage | max DISK I/O 1 block/s

Python (especially modern Python with type checking) really easy to start with - it’s easy to iterate and refine your parsers and rules. You can always rewrite things in c++ down the road.
… or you can try Go or Rust, either has a decent standard library of creature comforts.

Python’s probably ok doing basic accounting for a few thousand things a second.

With nginx, access.log is written to after the request has finished serving. Instead, you can deploy a local http server in Python alongside Nginx (look at quart library examples) and then have nginx fire off a sub request with all the useful info you’d be getting from the logs before it ends up in logs. This is useful if you want to do stuff with existing TCP and/or QUIC connections. For example, you could use the kernel ipset module and add/remove IP:PORTs from sets using Python, and you could set fwmarks on packets to prioritize or throttle connections (slow lane/fast lane) with tc/htb qdiscs based on these fwmarks.

1 Like

There is a built-in throttle & deny system in Nginx, but I’d like to not target one single server if possible. Nginx is quite speedy at processing the 407 urls I am already catching, and I know Apache & others (thttpd, Lighttpd, etc) can also do that, while maintaining reletively low overheads.

But not every webserver has a throttle & deny system, and ideally the final “binary” needs to be able to sample then flag at least 4 invalid attempts in less that a second per IP request (just like a router or firewall does).

The principle being the more useless traffic blocked upstream, the cleaner (and hence faster) the network in between. The result being vaugley similar to what users experience after installing (and using) PiHole for bogus DNS traffic.

And in reality this actually means 4 denied attempts, so after a reponse has been made, the idea is to send small requests based on that info to an internal endpoint on a router/firewall/switch that will then instigate an IP block/ban/deny/blackhole for a short period of time on the internet side of an internal lan, ideally pushing the same sort of data upstream to the next hop device, so (in an ideal world) blocking would eventually be done at the first hop where the offending IP address enters the internet, not the last hop before entering an endpoint gateway (and definitely not the targeted server which may already be under load serving legitimate content or services).

There is nothing wrong with Perl or Python dealing with this sort of information at the targeted host, but ideally you want something that can handle 10K hits per second, or 100K hits/s, or maybe even 1M hits/s even.

So far what I have (which isn’t sampling live) appears to already have reduce things to a more than bareable outbound traffic rate (AVG <10Kb/s and falling), and yet it only consists of some 400+ lines of code in 10 scripts, of which only 4 are run every hour in less than 1 sec. while consuming less that 1% of a single core CPU.

1 Like

here is some more technical details, for those that may be interested:

(add 63Mb of memory usage and 112Mb buff/cache usage to the above stats)
( server was rebooted on 7 days ago - 127 blocked IP addresses )

in /etc/nginx/block.conf:

        location ~ .env$ {
                try_files $uri          /.env;
        }

        location = /.env {
                access_log              /var/log/www/haxors-access.log;
                default_type            text/plain;
                index                   .env;
                limit_rate              1k;
                sendfile                on;
                aio                     threads;
        }

the aio threads offloads to a non-blocking “download” thread.

One drawback of this method is that now 400 & 405 errors are also in the new access log file (maybe they always were in the access log?). 406 errors for the same url still end up in the error log (I dont have POST set up properly yet).

All the captured try files $uri are linked to / (web server root) dot files, and those actual files are sym-linked to the “substitute content”. Although I process logs every hour for web review, I still only manually trigger the new block.conf, as after a week I only have 16 new bogus urls in the error log, 13 of which are targeting phpunit (so thats going to be a custom regex matched entry now, along with print.css).

On the other hand I do see alot more 400 & 405 in the access log, and the 2 messages logs get rotated or filled up after about 3 hours, hindsight process once an hour to block IP addresses with ip-rules seems ok, especially considering the current ram usage.

https://nginx.org/en/docs/http/ngx_http_core_module.html#aio

1 Like

Well, I don’t know this stuff that well, but this looks to be very interesting. Something I would like to have to increase security… than again I am paranoid. Took me a while to look up a bit here, but it seems like a great project.

2 Likes

On up-streaming address block requests:

Most servers are hosted in some way, so the first hop will usually be the actual physical hardware server (or final routing hardware controller), so any request from a hosted server will protect all servers behind that “gateway”.

The second hop will be something within the LAN that manages network segments, while the third hop will be probably be “network gateway” of some sort (router/firewall/DMZ/managed switch), that may or may not also be DNS/DHCP server.

So within three hops an entire LAN can be protected from bogus incoming traffic.

Each up-stream block request needs to be parsed as an internal only source, and to verify no internal networks are being blocked due to malicious user or targeted server has already been compromised.

Due to the way hosting works (at least some hostin that already has on hosted SSH pipes, and/or control panel statistics review - graphs) a pull process could be used instead of push process. Then colated pushing directly to the front facing “gateway” device/service from an already trusted source, should still be again, block address reviewed. Physical connection sould even be network segrigated through the hardware management port to help free up internal network traffic, especially in the case of high volumes ranging in DDOS limits.

A list of addresses would be smallest, however the up-stream destination could require log entries for verification, which could then be cross-referenced with previous/current up-stream activity. On large hosting networks a colation server may also handle up-stream analysis management and could also (at the same time) be used as an internal network honey pot if an internal VPN were used to handle blocking management.

The reality of the world today is, an attacker may be comming from within, and the last thing you need on a network, is a propigated domino effect that started “at your end”, but that should not stop you from “trying to clean up the intenet” at the same time.

Thanks @HaaStyleCat.

The caution here stemmed from the same server (with a different IP address) already been compromised, and the lack of ability to do anything else that “bandaid the hell out of the server”.

Like I mentioned earlier, those repeated 82 Mb/s outbound warning really infuriated me, considering how quickly the server was shutdown previously when it was compromised.

In most cases, the owness is on the final destination server to protect itself. But when that (hosted) server is only a single core CPU with limited memory and resources, (especially in the modern day) it is unrealisting to expect said server to loaded down with processes that should actually be managed upstream in the network, so the entire network is protected, not just one virtual server.

The only real issue is to get hosting providers to provide simple management process that isnt part of a Pro Teir service.

One thing I found interesting is the URL’s that are HTTP targets, and the failed user name SSHD targets.

In some ways these sorts of network “issues” remind me of how Microsoft handles things. Instead of fixing the problem at the source, they sprouted an entire industry backed certification environment and ecosystem, which proves very profitable to this day.

Coming from an 8bit/16bit platform orientation, trying to communicate over and with the modern internet, speed, size and resource limited constraints are a real issue. The upcoming DNS changes will mean that even just the KEY used in part of a handshake will be bigger than some hardwares total memory capacity.

IoT devices especially, need to be able to protect themselves better.

I have another idea for better internet accessible device connection access management called PortNames, but thats on a whole other level, and requires fundament changes to infrastructure (similar to the GnuDNS changes, but for ports, not addresses).

1 Like

A Simple hits/sec Algorythim

Both my /var/log/messages logs rotate collectively on average once every 3 hours, so I have a script that runs at 58 minutes of every 3rd hour. Now although the script works fine from console, it does not work completely via crond. Mostly this is down to root permissions, with the log file having 0650, and sometimes the ip command was not working. Is crond checking the context of ALL commands in a script, not just the script permissions?. I don’t know, but once every 3 hours to block sshd fails > 4 is a bit useless (after the fact), as when they come they are between 3 & 6 per second (I saw one IP try every 4ms for 2 hours).

So it appears the lightest way to do a very basic (or simple) analysis algorythim, is to use a (shell) script to tail /var/log/messages | grep "auth.info sshd" | cut -d \: -f 4 | grep -E -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | uniq and then process that output in a for loop, and blackhole the IP is it appears more than 4x. Here there is no problem with uniq because it will be a maximum of 10 lines, and that grep only outputs IP adresses (so a max of 15 characters per line). The biggest part of the check is still then grepping each IP out of the log file (again).

But I also added a check to see what the last tail match produced, and incrementally add a default sleep time (eg 2 sec) to produce an incremental delay between checks, up to a max check delay (eg 60 sec), so if nothing else is going on, it will still check once every minute. That should probably be every 10 secs based on other info mention elsewhere, but hey just the blocked IP address already add (50 in the last 12 hours), already means I have to wait for over 2 minutes before someone new tries to hit sshd (some interesting user names there too - someone tried every user name any net facing service might use).

Then I run this script as a background task, after adding logging and PID capture. By using date -Iseconds as log timestamps, I can easily look back over the last hour with a grep like date -Iminutes --date="@$(($(date +%s) - 3600))" (as long as I clip the ammended timezone adjustment)

So this SSHd fail IP block and the Nginx block.conf really hammer whats comming in. I have not had to even process another block.conf for a week now. ATM I manually add the odd dimwit repeatedly trying to retrieve /.env, which is about once every 3 days. Yeah there are some new urls, but they are mostly random ../phpunit/.. paths, which I’ll add to the custom include. I would really like to be able to return 0 bytes when an HTTP request explictly asks for 0 bytes, but there is not enough granularity in most web servers to do that.

Initially was really worried about how to impliment an analysis algorythm (what language), as well as trying to develop said algorythm (checking hits per second failures). However after a few nights “sleeping on the problem” its was pretty obvious.

When attacks come, they are always recorded at the end of the messages log file, so tailing that made sense, theoretically thats low overhead even on a big file. A grep from the bottom of the file to the top, with an abort after a set number, would be more optimal, but maybe tricky on heavy load systems, especially where there are other services also hammering the messages log. The crond script was fine, because it piped out entires from both message logs to a seperate file (every 3 hours) and worked on that.

While thinking about the above, my mind kept comming back to the issue of storing info, and the problem of inode thrash that would result in using “off the shelf filesystems”. So I’ll probably talk about that next …

EDIT:
Turns out >4 check is pretty brutal because for every 1 failed message, there are 2 disconnect messages :slight_smile:

That is:
2 failed SSH root user attempts will get your IP address blocked
(because that is 6 matching IP addresses)
2 failed SSH invalid user attempts will get your IP address blocked
(because that is 8 matching IP addresses)

EDIT 2:
RAM usage is up by 3Mb, 1.5Mb for the script sh process, and 1.5Mb for the sleep sub-process. I think that is due to BusyBox multi-binary, because its the same for ash shell, crond and syslogd

EDIT 3:
log rotation is down from every 3 hours to every 7 days due to lack of sshd entries, of which there were still 144 attempts, from 105 IPv4, maximum attempts per IPv4 = 2. However (over that same 7 days) there are still about 25 IPv4 being blocked every 24 hours from Nginx logs.

EDIT 4:
grep is tweaked from . to \.

Super Simple SSHd IP blocking

So I dont have everything up on GitHub yet, and in fact there is some small redesign needed for what I have, purely for tracking whats been block from where. In the meantime I thought someone looking throught this thread might just want something fairly simple (without IP tracking/logging), so here is SS:FragWhare:

#!/bin/sh
### SS:FragWhare (run as root)
### Super Simple (fake) Fire Wall
### (Whare is Maori for "house")
### - to verify everything is working:
### ./ss-fragwhare.sh > verify.ssfw
### - to start as a "service", use:
### ./ss-fragwhare.sh false &

# default interval & increment, in seconds
I=1;

# maximum time between checks, in seconds
M=10;

# default fails before blocking an IP address
K=4;

# easier option to change "tail lines"
# default for tail is 10, dont go above 20
N=10

# where to record the Process ID
#echo "$$" > "ss-fragwhare.pid";

# verbose(-ish);
Q=true;
if [ "$1" = "false" ]; then
  Q=false;
fi;

# main loop;
O=""; S=""; T=0;
while true; do
O="$S";
# we can use `uniq` here because its only 10 lines max
S=$(tail -n $N /var/log/messages | grep "auth.info sshd" | cut -d \: -f 4 | grep -E -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | uniq);
if [ "$S" = "$O" ]; then
# relieve pressure if no change;
  T=$(expr $T + $I);
  if [ $T -gt $M ]; then
    T=$M;
# dont do other things here, as it will exponentially use more processor time
# this script will only consume about 0.16% of CPU on a single core system
# BusyBox will increase memory usage by 1.5MB x2 (1x for script, 1x for sleep)
  fi;
  $Q && echo "$$:ss-fragwhare: waiting: $T";
else
for xADDR in $S ; do
  # minimum length of an IPv4 address is 7 
  X=$(echo "$xADDR" | wc -c);
  # but input might be broken, so double check
  Z=$(echo "$IP" | cut -d \. -f 4);
  if [ $X -lt 7 -o -z "$Z"]; then
    $Q && echo "$$:ss-fragwhare: ..hmm.. '$xADDR'"; 
  else
    # make sure this is th correct name for your system
    C=$(grep "$xADDR" /var/log/messages | wc -l);
    if [ $C -gt $K ]; then
      # silence "wrong input"
      ip route add blackhole $T/32 2>/dev/null;
      $Q && echo "$$:ss-fragwhare: added $xADDR";
    fi;
  fi;
done;
  T=$I;
fi;
sleep $T;
done;

### - to work across reboots:
### ip route show | grep blackhole > blackhole.ssfw
### cat blackhole.ssfw | xargs -n 1 -I {} ip route add {}
### - after 400-500 IPv4 are blocked, SSHd will be "quiet"

Whare - said: furry (or farry) as in ferry, not furry as in animal - is Maori for house, so SS:FW for short, implies FireWall, but is it really? After all it just blocks some IPv4 address from sshd entries in a tail /var/log/messages (10 lines) list. It does not record anything, and needs to be restarted after a reboot. It’s not a cron job, because it needs to run every second (well it could).

The only difference between this and what I use (yes I will be puuting it up on GitHub too), is that all my scripts pass to a dedicated ./block-this-ipv4.sh IPv4-address script, which means I can offload (or consolidate) some of the checks.

You can try and use ip route save 1> dump.rt and ip route restore < dump.rt, but that produces a binary format suitable for manipulation with rtnetlink format tools. Better to just use ip route show | grep blackhole > dump.blackhole, and then after reboot, use something like cat dump.blackhole | xargs -n 1 -I {} ip route add {}/32, which is human readable.

EDIT:
grep was teaked from . to \.

More observations since SSHd monitor

So after about 500+ IPv4 addresses were blocked, on the SSHd side I noticed a heavy slow down in log activity, which showed up some other issues - kex exchange errors. These are (failed) attempts to use SSH key exchanges to gain access to root, instead of trying to brute force root. So I wrote a script to deal with them, its manual, but then the IPv4 address were always in pairs from the same range, so after the 1st run, I get maybe 1x kex IPv4 block per 24 hours.

I did some manual analysis of the IPv4 for those kex attempts, and most of them are from a small group of commercial websites. They say they offer a service, but in actual fact what they are offering, is easier ways in for hackers, who join as fake clients, to gain access to IPv4 address ranges that they want to not get caught from their own systems. I can guarentee that “service” they provide, does not contact any host that is compromised.

The other issue I notice is that the average rotation time for /var/log/messages (Alpine Linux rotates on 200KB) went from 3 hours to 4 days (and growing - its only half full atm). This does allow patterns to show up over longer periods, and what I saw was some obvious stealth botnet activity (I would say a proper botnet as opposed to a kiddie script botnet). And because I process /var/log/messages* in to a folder as IPv4 addresses, I can easily see ipv4 block ranges.

ip route can handle address ranges, so I might look into that. For example, of 144 “Failed password for” SSHd IPv4 addresses I currently have in the logs, 105 of them are unique. Because the above script is pretty brutal (hence the name frag house) there are some 40 odd addresses with 2 hits each. Of these 105 addresses, there are at least 2 blocks of 5 from the same sub-net. And when I cross-reference them against the log files, there are subtle patterns in the timing of there attempts.

On the Nginx side, I have not added any more URI since the 1st of the month, there are a couple of new ones, but what has been showing up are wierd GET request that return 400 errors ‘access.log’, not the ‘error.log’. So there is something else being parsed in the header, and that may be similar to the daily \x?? strings being attempted too (without a GET/POST/HEAD/OPTION header). Somehow one of the custom $uri is causing most 404 errors to be reported in the access.log as well, so there is a “too generic” match being made somewhere, at least there is only a couple to review, and they are in a seperate include.

With everything said and done, at about 500 blocked IPv4 I was getting about 40 odd new blocks per 24 hours, while at 600 I get about 20 per hour. Thats also with around 430 captured $uri in Nginx.

Anyway, the above has allowed me to consider more on the inode issues as well …

So if someone used a octet looking username they could ban your servers gateway address? …

You could also try something like this (example from stack overflow).

-A INPUT -p tcp --dport 2020 -m state --state NEW -m recent --set --name SSH
-A INPUT -p tcp --dport 2020 -m state --state NEW -m recent --update --seconds 120 --hitcount 8 --rttl --name SSH -j DROP

as a form of greylisting and you could whitelist on successfully connections using ipset from your profile

@risk thanks for info - stuff like that is always useful to know.

Those octets are in HTTP headers, so they already get throught to the web server, but it does not process them, 400 errors are “not supported”, so maybe if the web server is setup to support them, yeah it might have an effect.

Thanks for bringing up whitelist too, because although I have not mentioned it, thats part of what I want to add to what I already have, but I wanted to look at using ranges with ip route first, and come up with a simple way to parse an IPv4 in a range, without the need to create file entries for every part of a subnet (*.1 => *.255). In the particular case of these scripts (that use the shell for algorythms) it may be simpler to dump them all (including each subnet entry) ito a file, and use that against a grep. I’ll have to see, I’d prefer something file based due to not having to sort an ls -1 like I have to with other scripting languages (like PHP output).

Cheers, and thanks for keeping an eye on things :slight_smile:

On inode thrash & inode non-use

Besides the use case I have, an IPv4 as a filename containing little or no content, the other big use case is stepped folder names, for either user names or domain names, as used by large web server installations.

To sum up:
inode thrash is where heavy writes of non-inode-block-size happens. On SSD and SD-card media this can wear out the “drive” very quickly (which is why you turn of update-access-time on SBC/RPi linux setups).

inode_non-use (probably has an official name) is where a standard filesystem entry soaks up at least 1 inode of size X (used to be 512 bytes, now 4096) for every directory entry, each of which contains no data.

On a system that has user names and/or domain names of the type /path/u/s/e/r/n/a/m/e/ or /path/w/w/w/./g/o/o/g/l/e/./c/o/m/, you will exponentially use more inodes the further down the folder structure you go, to the point of running out of avalable inodes on a lot of systems, which require the folder structure to be in place before you can use it.

For example, there is a lengthy discussion on how and why not to do it in Nginx (or any file/path base network server).

In my case I only really need the filename, not the space for its contents, which could be bound to its size entry in a directory (inode). I dont actually need to store any content, just a integer number which can (also) be used as a counter. Valid dates can be manipulated via regular filesystem techniques (ie touch).

After looking at the Ext2 filesystem, I am woundering if there is a way to manipulate the inode size at setup time (superblock), or even use hard links, to reduce the amount of inodes used by files (and/or _folders) in the filesystem.

I am looking at off the shelf Ext2/4 filesystems so they will integrate easily, with Ext4 also providing journaling for those that might want/need it (over Ext2). I can get away with 0 content files if I can manipulate the file size. So the only part that is getting updated is the entry in the directory inode.

At first glance, the simplest way to cut down on inode thrash is to mount the filesystem as an image file, and let the underlying filesystem handle things through its write back cache, which are (nowadays) optimized for block size writes. Meaning that only after 4096 bytes have changed per directory inode does the “medium” actually get updated.

For example I have 711 blocked IPv4 addresses as filenames, that amount to a total of 10158 bytes. At the moment I store the date they were added to ip route blackhole, but I dont need to, I can use one of the time areas for that, saving (im my case) 25 bytes of a 4096 byte inode which I dont even need for file allocation. That inode allocation could then be contributed towards further directory inode needs.

As a note here, with Ext based filesystems, a directory inode is compressed of 00 padding (mostly). Because an IPv4 address has a minimum of 7 characters, and a maximum of 15 characters, a lot of entries can be compacted into each directory inode of size 4096, and the file entries would also not need the initial file inode recorded either, further extending what can be squashed into a directory inode.

The only real downside to this type of approach (without modifying an Ext2/Ext4 driver) is the need for stat instead of cat at the filesystem level, however internally stat is called everytime you access a file anyway, so this may be a moot point.

For username/domainname examples, where each directory folder entry still needs to point to its own inode, this can be aleviated to a certain degree by an intermediate mount where the beginning of the path and the end of the path are in the physical filesystem media (as normal), while the stepped path structured mount could be copmpressed into a filesystem image mount.

In these two (basic) senarios, it is not possible to use hard links for folders, but it is possible for files of 0 content lenght. I have used these before, and did manage to remove the filename without removing what is was linked to, which is the default way soft links work.

Another possiblity, at least for files (of static content) is to link there contents into a .inode file present in that directory. But this, like the directory folder name problem, would likely require a customised driver. There is a caveat here with linked files, by default fsck (filesystem check) will try to “fix” them by default (on most modern linux setups), so if there is a custom inode layout, file size and initial file inode in a directory inode will have a bearing on that.

I am trying to think laterally about this issue, as I want to impliment IPv6 in the filesystem too (which could have an issue with ::). It maybe more sensible to create a file back zipfs or a file backed sysfs system instead, but then those would loose the journaling option too, as well as needing custom drivers, so 50/50 on those atm.

As far as creating a filesystem driver, it may even be simpler to write a fuse driver and flat-file the “media” (as a file), something that could (possibly) still be edited by a text editor. I am not sure about this though, as (at least in my case) what I am dealing with need to be handled on a kernal level anyway (the ip route endpiont at least), so having the data available to the kernel (because 99% of filesystem drivers in modern OS’s are in the kernel) will keep things fast.

(k-rap, just going into LVL4 lockdown and I have to move my vehicle, so might have to leave it here)

On 0 inode use filesystems

There are various ways to handle a 0 inode use filesystem, obviously the first detail being that no filesystem can get away with acutual 0 inode use, but in this instance(s) we are talking about file and folder entries that use 0 inodes in their data section (compared to most other filesystems).

After spending a bit of time looking over the filesystems described on OSDev.org wiki and in the bootboot project , I found SFS (SimpleFileSystem) which could be used as a basis. Ideally it needs some extensions, specifically POSIX ones (soft/hard links, attributes, time slots), at least for multi-part directory levels, but this is only for easier administration (and maybe for better security).

The SFS is considered an ideal starting point for various reasons, the main ones being:

  1. there is no sub-directory architecture, every entry is in one root table (the index section).
  2. it can support implied directories, where a directory entry only consists of a file endpoint.
  3. can easily support (because of reduced filesystem structure usage) search enhancements.
  4. the root table extends inversely from the end of the volume on the media, back towards the superblock and data sections.
  5. can support small (1.44Mb floppies), large (4Gb volume) and huge (176 ZetaBytes) media allocations, because default blocksize and inode are described with 64bit numbers.

SFS is essentially a write once filesystem, that can support updating by default (as opposed to say ISO9660). The root table is simple enough to be relaid back to the media (in an optimised way), without the need to reorganise the associated data inodes or disturb index inodes.

The simple idea of one place for everything also describes its primary drawback, lookup speed. (At least on Linux) the VFS setup allows creation of hash to help with lookups that are already held for an open file/folder. But from the driver side, any reqular (nay advanced) search ro sort algorythim will greatly improve initial lookup of new (ie not in cache) items.

That (traditional search/sort) lookup combined with segmented hash of index entries (in the root table) would greatly speed up the physical media lookup and access times. With 0 inode use this can be stored easily in the data section, or any other free space section on the volume (after the superblock). There is already defined in the SFS specification an allowance for reserved blocks which are traditionally (in the use of SFS) used for boot loader stages. This area could be co-oped for a segmented hash lookup table if the index were optimized with contiguous entries (from a text/human point of view).

Reminder here we are specifically looking at IPv4 (later IPv6) file entries that use 0 inodes (no content), and multi-segment directory structures (/a/b/c/abc.com) as used by a service for data, user, domain segmentaion (and security). So besides my use case (IPv4 counters), there are fileserver domain path uses (FTP, Nginx, Apache, etc), user path uses (same + others), and (linux) user password segregation uses ( Owl tcb ).

That said before I do anything here I’ll be finishing the github repo for the firewall part done so far, which has been in use for 1 month as of tomorrow, and got its final tweak earlier today.

As far as dvelopement of a 0 node filesystem, I can see a suite for filesystems to cater for various use cases, as some are not 0 inode (like I am useing atm, each IPv4 recodes the date it was blocked), and others would only use them if journaling was available.

FWIW: I’ll be proto-typing on RPi OS for simplicity and portability, and testing on ATARI ST EmuTOS+MiNT (ARAnyM) (for size and speed constraints) - at least thats the plan ATM.

Cheers

Paul


FYI referecnce: an old Owl tcb article

After running the monitoring script for almost a month now (a slightly intergrated version of the script posted above), I found some quirks with logd, and after playing around with the sleep time I find I can’t eliminate the issue, only make less or more impact on the response (ie the reason the script exists, or what its doing)

My analysis shows that when sshd get hit hard, 10 seconds can allow 2-5 hits per second to be recorded in the log files, so the script is sleeping too long to be useful for what its there to do.

But at 3 or 5 seconds, it can take upto 40 mins before the tail registers as not equal to last tail. From a single IP address, even at 4 seconds delay, there will still be 50 hits before said IPv4 gets blocked.

But with 10 seconds, it can (often) take 20 mins before the tail registers as not equal to last tail. From a single IP address, even at 4 seconds delay, there will still be 20 hits before said IPv4 gets blocked.

Have I found a bug in Alpine’s logd implimentation, or simply a limitaion of it? Or is it a limitaion of the VPS I’m using? I don’t have a way to test/verify either. I would like to be able to use a RPi locally to do this, but the main reason for developing on a VPS is because I can’t guarentee power to said RPi, so I would be limited in what I could do, even when I could do it, without risking a lost OS+sdcard.

Apart from that everything is functioning well, IPv4 range blocking was a non-issue, and everything has been thoroughly tested in a live environment, so its (mostly) ready to go up on GH, to the point that I will be taking what I currently have functioning (as opposed to the overall targets of this OP project) and slipping them in the Devember 2021 challenge.

The interesting side affect of the IPv4 blocks, is the resulting IP’s seen in the logs. ie hundreds of unique IPv4 per 3 days, that only hit the sshd once per month.

A similar thing is happening with the nginx logs, just about all nginx-haxors have disappeared, except those unique IPv4 hitting /.env, boafrm, /setup.cgi?... and random \x0X string connections, all of which cull said IPv4 every hour with a seperate crond script.

One other side effect of the nginx part, is my inability to get php functioning with POST, which is a blessing ATM, because it produces lots of 403 & 402 errors in the logs, which the IPv4’s also get culled once per hour. The quirk with the \x0X strings and 400 errors, are that they don’t appear in the nginx-errors.log file, but rather in the nginx-access.log file. I dont know what to make of this whole situation, wheather it is a result of the custom regex I have in the Nginx configuration or what.

So I am hoping that if I put everything up as a Devember 2021 project, I will get a solution to some of these questions at some point, besides making this sub-project more widely available at the same time.


With regards to the use of OWL (OpenWall), from their mailing list it appears someone tried building a RPi port, so their is hope there too, and @wendell just posted:

That pertains to the initial project which resulted in this OP project, containers and non-systemd OS’s - read post #7 (my second reply) to get an indicator of what that use case is, and my next post to see what OS + container combinations are available now.

Cheers

Paul

Update to OP points

  1. done, extremely successful, most haxors do come back after a couple of tries
  2. done, but not via /etc/nginx.conf, above location regex’s taxing for 1 core CPU
  3. done, ranges are still manual atm (differcult to create valid range mask for said IPv4)

Threat de-escalation amounts to manually assessing IPv4 blackholed via a bunch of shell scripts (from both a block dir containing IPv4 and ip r directly) that do customised validation of IPv4 addresses, with the resulting output of ip route add blackhole 127.0.0.127 piped to a seperate log file (along with script origin, pid, and timestamp.

After running the first 2 Nginx location blacklist generations (that post process specific development url removal), I now find I really need a tool that can incrementally add to whats already there (because the information is now spread around multiple log files), which is some 400 specific urls + 2 dozen customized regex’s, that are logged to nginx-haxors.log.

So I have enough for a good /etc/nginx/nginx-block.conf that will be uploaded to GH. But I also think some of the custom regex’s need tweaking (maybe). That alternate log allows extra IPv4 processing, which consists of shell scripts run by www specific cron jobs, where the resulting IPv4 need to be patched into the blackhole interface by a root cron job that runs directly after it.

All the shell script are kept inside the www tree with www ownsership, meant to allow a web interface management tool via thttpd proxy through nginx port 80 (which does not exist yet). The scripts current security model in this space is security by obscurity (SBO), where their construction, location and naming conventions are designed in such a way as only be useful if used with an install script that allows SBO technique to remain unique.

This installer script is the only thing missing before the complete set of scripts can be made publicly available via GH. The shell scripts are paired with a few convention named PHP scripts and output folders to allow data analysis and log files to be easily views, which also require the install script to maintain their SBO.

After much study of web server haxor attempts, this particular type of SBO model seems like the only real practical solution ATM. This collection, along with a multi-device favicon example, will be released as a standalone project for the Devember 2021 challenge.

As a by-product of this decision, the is NO upstream interaction. And there may never be, as I have since realised its too open to abuse (and stupidity), althought I will probably look at it again after doing the new filesystem driver mentioned elsewhere in this thread (but that wont be till after the standalone project is more complete). It does seem totally doable on a private LAN, say with a PiHole box, but I dont have reliable power locally, so thats just another reason to put it off.

It is hoped that the results of the Devember 2021 challenge will produce (at least) Apache and Lighttpd block list config files as well, hopefully tested via RPiOS (ie Debian +systemd based). It is also hoped that a practical way to handle specific 403 and 402 HTTP/PHP POST attacks will also result from testing on another OS configuration.

Cheers

Paul

This is a bump for the GitHub link, and the fact I finally posted something for the #devember2021 project “thingy”.

I am not going to post a link to the Linode server, as all that will do is get you banned :slight_smile:

1 Like