After fluffing around with some CIDR and AS scripts, I realised that the extra data I wanted to see is also the same extra data needed to concoct valid DShield logs for submission to the ISC (SANS Internet Storm Center).
A lot of the SSH attacks are singles from a unique IPv4, and thats it. Sometimes they also probe the web server (and prossibly other services and ports, Alpine Linux with Nginx and port 443 service off by default is very light on attack surface). However when I check these IP addresses on ISC I often see “one report from six days ago” type results.
I got hit by a sub domain probe, 420 incomming connections in lest than 1 minute, to about 5-6 urls each, and each connection was a unique sub domain. Again when I checked ISC “one report from six days ago”.
These (and other web probes) prompted me to reinvestigate the available DShield options. Turns out you can actually get single file Perl scripts, and (as it happens) Alpine Linux has Perl installed by default. This info was hard to find (dated 2004 - 2007), outside of the FrameWork and RPi Honey Pot options.
The thing is the DShield logs require(-ish) TCP or UDP transport per log entry, and SYN, DNS, etc . From the current data I already know all my stuff is TCP, and I know the ports (there are only 2 atm). The thing with Nginx logs tho, is they dont store the source port.
Part of these issues with modern attack surfaces and routes, is addressed by expanding how and what can be logged, based on the progress and outcom of the RPi Honey Pot. Which means (atm the only way forward is) I have to install the Honey Pot somewhere (its no longer RPi only BTW) and look at how and what sort of data it is collecting.
But is also means I now have a reason to move the IPv4 range block list entries over to the filesystem, and extend the data I am storing in log files based on where it came from, who logged it, and intervene with some analytic, to dump some extra data too (like was it a root
or user
attack on SSH, and was they user unkown
- at least one person has seen the SSFW username, but I know they dont know that)
Outside of the Gerkas, the scripts and organisation of them is only slightly flawed in its current v1 state. I have come to realise the the individual webserver configuration folders require the presence of associated log analisys and clocking scripts, to maintain easy porting for new servers and services (I want to add FTP at the same time I add Apache).
However, in providing a custom log for Nginx to capture haxors, I did not it is not the same format as either the Nginx Access or Errors log files, and those are different to, with Access storing less info, but also logs Errors entries as well as my haxor entries. To make things even worse (from my point of view), because I dont have PHP setup yet, there are a lot of 405/406 “unsupported” log entries that dont make it into the Error log (POST & PUT).
Strangely there is one empty item in the default custom log in Nginx that has never been filled in all the 6 months I have been collect and analysing data.
I am concerned the providing custom Nginx Error and Access logs will break moist 3rd part collection and analysis tool and/or interfaces. Its annoying to think I will probably have to run 2 sets of log files to do DShield properly.
Nginx, especially when set up as a reverse proxy (as I have it) can handle connection management and access denial, but that does not help SSFW logging, blocking and analysis, it only mitigates the problem (and maybe? it would not help DShield logs - maybe it could? by improving log output).
It maybe that I need to write some proper kernel integration code, and look at something like RSysLog as a basis (which can handle 15 million log entries per second). It seems a bit much for a (suposedly) “super simple firewall”, but it might be useful as the basis for the upstream end point of what the OP propses.
I am also concerned about the default log rotation management of the syslogd
“messages” files. 200Kb rotation and .0 naming, where as Debian/RPiOS is weekly rotation, with .1 naming, and gzipped above that (4).
I will probably need to log some query, issue and/or bug-report regarding the lack of file writing with high volume syslogd
log messages on BusyBox/Alpine.
I’d prefer to scavenge a .xz
package into the log archive when I do my monthly backups/rotations. Speaking of which the perils of manual grep && sid -i
on log files means I lost all current SSFW log entries for the 30th of January (it was only 47 entries - and they are logs, not blocks).
Some sort of automated log management is called for. I see how some of the older DShield Perl scripts handle it (by storing the last entry they processed). The annoying part is that it really needs to be checked every 10 seconds on a default Alpine setup, because of its 200Kb byte limit. Actually maybe its just a “wakeup, do some maths, adjust sleep timer, sleep” type problem, and (on Alpine) you can test and store the size/date of messages.1
and then copy that when it changes (sort of accurate).
Anyway, for some random or unknown reason SSFragWhare: the super simple firewall with brutality bonus made it into the #devember2021 winners circle as a finalist, and it only took me about 8 hours to comprehend what I saw (guess that’s what happens when you only get 2 hours sleep in 2+ days). I think it might be down to my “beside the seaside” post that did it.
When I get some more Gerkas going, and after I have made these log changes, and got the DShield submissions running, I would like to pair up either the DShield Honey Pot on RPi, or PiHole on a RPi - actually maybe both, use the PiHole servver to watch and react to the Honey Pot server. I need a PiHole here when I change my networking over. This will (at least privately) allow me to investigate the original target of this OP - Automated Network Treat Response
Argh Well … I guess thats it for this update (lets hope the Judges find a juicey 12v/5v device for me)
Cheers
Paul