S6 mega thread

ThatGuyB · June 16, 2024, 3:10am

Splitting off the discussion about s6 from the neoflex thread. This thread is dedicated to everything s6.

Explaining a bit on @PhaseLockedLoop’s request, I mentioned that s6-rc takes daemon-tools up to 11.

daemon-tools and runit are pretty similar in nature. So is s6-linux-init.

I’d suggest you read all the link bellow, because I can’t explain all the things as good, but I’ll do a high-level overview.

links

s6: why another supervision suite
s6: an overview
s6-linux-init - tools for a Linux init system
s6-linux-init: why?
s6-rc - a service manager for s6
https://skarnet.org/software/s6-rc/why.htmll
skarnet.org: a word about systemd

The general idea under linux (and most unixes) is that you have 3 layers to the OS start-up process.

init
pid 1 (the process that starts the userland up)
service supervisor

The init part can be handled by anything, init can even be a shell script. This is the piece of code that the kernel first loads up and tasked with the early system initialization. The init can be short-lived and exec into something else after its done its early tasks. This is where s6-linux-init lives. I won’t get into much details here. The only thing I’ll mention is that s6-linux-init launches the s6-svscan process, the PID 1 of an s6-based OS.

The PID 1 is the process that is tasked to handle all long-lived system processes (daemons) and restart them if they die. This is the task of “s6” or s6-svscan. Other tools that would live here after being launched by the init are runit or daemon-tools. The PID 1 is also the place where all orphaned processes go to (and get reaped), but I won’t get into that.

So think of PID 1 like a daemon-lord that constantly spawns daemons after they die. That’s its sole purpose. This must be minimal, be able to handle OOM scenarios and has to be very stable. If PID 1 dies, your whole system crashes.

Where the new concept comes into play is the service supervision and dependencies (and that’s s6-rc). It’s not exactly a “new” concept, OpenRC is a service supervisor, but it’s a very serialized one. Sysvinit also had a service supervisor that nobody used. Systemd does service supervision too (via the unit files).

So what the heck’s a service supervision? For that, I’ll have to first come back to PID 1 and explain more about daemon-tools style tools.

So in a traditional daemon-tools-based OS, you have no dependency, all services are started in parallel by PID 1. Services can be scripts and you can handle some checks in the scripts to verify if a certain process is alive before you go into the main service, or sleep if it is not, but sleep is not a good way of handling dependencies.

So if you have sshd that depends on the networking service and they are both started in parallel by PID 1, then sshd will die at least once, until networking is up and then sshd will be up. The PID 1 ensures it gets started every time it dies. This puts strain on your CPU and wastes cycles.

To fix this, you have to be able to tell PID 1 to only start services if their dependencies are met. So s6-rc is the one handling that. Everything starts off with a “down” file, meaning everything has to be down.

The only thing that sv-svscan (PID 1) starts, is the s6-rc service with an argument to start a bundle of services (by default it’s called “default” and it’s what the kernel passes to it - just like you have the “default” runlevel in openrc or runit). Then, s6-rc checks its database for how all the services should start up. It first sees services with no dependencies, like udev, so it removes the “down” file from s6-svscan and thus, udev is started. Once s6-rc confirms udev is up (and not in some weird “starting” state non-sense), then it checks what services depend on udev.

In this case, file system mounting and loading kernel modules depends on udev. But FS also depend on some kernel modules, like zfs or btrfs modules. So s6-rc removes the down file from s6-svscan for the modprobe service. Once this has finished, it removes the “down” file from the FS service, to start mounting the file system.

Traditional PID 1s, like runit and openrc, handle this in runlevels. You must first script your way after the init (init launches “runlevel 1”) to get all your things started up in serial, udev → modprobe → fs and so on. Then the actual PID 1 begins (runlevel 2) which starts everything up (either in serial in openrc or in parallel in runit). For runit, since it’s simpler to explain, it just starts launching everything at once, some things crash, they get started again and eventually you end up with a properly started system.

Does this sound insane to you? Well, it is, but it works. But s6-rc (and systemd) have dependency resolution to prevent things from ever crashing from their dependencies not being up.

So s6-rc kinda removes the need for runlevels in general. After the init starts PID 1, which starts the service supervisor, everything is handled automatically and starts as soon as possible if their dependencies are met.

Systemd solved this issue a long time ago and it’s why it became the de-facto standard on linux. But the way it was done is insane (PID 1 in systemd is huge and wouldn’t really call it stable and the service supervision side is also complicated and poorly thought out - why you’d have an “after” if your service doesn’t explicitly depend on a service? and why you’d have a “requires” if you’re going to start both services in parallel anyway, unless you also use the “after” service definition?).

But as most of you can attest, systemd is heavy and it’s non-portable (can’t run on non-glibc libc libraries and it’s hard to port to different cpu architectures, although most of that work was already done). I’m not here to bash on systemd, we can make another thread for that. This was just as example of how s6 improves on it.

The s6-linux-init is light and only launches the s6-svscan (PID 1) and then the service supervisor begins working on system initialization. And if s6-rc crashes (although chances are very unlikely), PID 1 will just revive it.

The reason I got into s6 was because I needed serious supervision on rebooting linux servers (now that I actually shutdown most of my lab, although I plan to keep stuff up longer once I get some homeprod stuff going). My example, which I complained on my blog on the forum was iscsi.

When the system started, all I had to do was do a process where networking started, followed by iscsid service, followed by a one-shot script that did iscsi-login, followed by mounting a file system, followed by starting VMs or containers. This worked using some service status checks and a sleep loop and for the login script, an infinite after sleep (to make PID 1 believe the “service” is up).

It worked for what it’s worth, but on shutdown, I had no fixed dependency resolution, meaning that everything got killed at the same time, but some services might die earlier than others. If iscsid dies before iscsi-logout, then that’s a problem, we have a hung iscsi connection (that thankfully the freebsd target always handled beautifully). If the file system dies before containers and VMs are shut down first, you get FS corruption (say, a VM was writing something to disk, but got interrupted in the middle by a FS getting unmounted).

That’s how I got a new appreciation for systemd actually (and I don’t mind anymore that nixos uses it, although I wish I could learn how to make the flakes use s6, there’s already “vpsAdminOS” that runs on runit and nix, somehow). I was contemplating whether to switch to nixos on my container box, or stick to something lighter (to allocate as many resources to containers, as I’m running everything on arm sbcs).

Any questions, appreciations or frustrations you have for s6, put them in this thread.

PhaseLockedLoop · June 16, 2024, 4:47am

This is an awesome overview of s6 tbch. I knew something’s from wikis but there were some nuances I did not understand

PhaseLockedLoop · June 24, 2024, 3:04am

So wrapping back a week later after studying. For me runit is enough. Its fast and does all I need but my god s6 should be the way not systemd if it needs to be as friendly as systemd is to learn to other people

That said s6 very much feels like runit extended and I think thats super cool. Its not nearly as lightweight looking at the code base. One of the runit project’s principles is to keep the code size small. As of version 2.0.0 of runit, the runit.c source contains 330 lines of code; the runsvdir.c source is 274 lines of code, the runsv.c source 509. This minimizes the possibility of bugs introduced by programmer’s fault, and makes it more easy for security related people to proofread the source code. The runit core programs have a very small memory footprint and do not allocate memory dynamically. Any programs ran as a service but not use dynamic memory allocation either so its quite secure.

I think both can be compiled to be used with musl instead of glibc?

Man I think its time this forum had a real post talking about the differences in init systems. Where to find good support to each and links to each indepth post like yours… ( I may write one on runit)… If there is interest for this please let me know?

I think its a disservice to a the open service community that everyone just eats systemD when they may or may not need that complicated capability set. Let alone I dont believe systemD has parallel startup like s6 and runit?

ThatGuyB · June 24, 2024, 10:40pm

I’m almost finished with the WIP one on how to deploy. I’m still a bit behind with the documentation on how to use it (improving the old Starter Pack I made back in 2021).

Same for s6, but because s6 has a lot of different functions, some code is necessary. In s6, the equivalent to runit (the init) is s6-linux-init and the equivalent of the runsvdir is s6-svscan.

You can literally run s6-rc under runit, as long as you can modify s6-rc to work with runsvdir instead of s6-svscan. And it shouldn’t be much to modify (I think), just that processes would need to have the “down” file touched in the runit service, to have them all down and allow s6-rc to start them up, according to the dependency tree. In addition, you must force runit to not kill s6-rc on shutdown, but give it a custom exit command (s6-rc -bDa change then SIGTERM), which I’m not sure if it’s possible in the default runit.

Ok, maybe running an s6-svscan (in its own servicedir) supervised by runsvdir might make more sense. Then, you’d abandon most of runit services and have s6-svscan launch most of these (they can literally be the same run files, but you’d have to add service definitions for s6-rc).

The only limitation (that I can think of) to this setup is that you’ll still be dependent on the runit serialization process on startup and I find runit less powerful in that regards. All inits can be categorized in 3 stages: stage 1 (init), stage 2 (basically multi-user.target, kinda) and stage 3 (shutdown). For runit, you need to script a lot of stuff in the stage 1 (in void, you can view /etc/runit/{1,2,3}, each corresponding to the stage).

The runit stage 1 invokes a lot of scripts from /etc/runit/core-services, that are needed for runit to start up. Stage 2 is from where runit starts and the system shuts down (basically almost the whole “uptime” of the system). And weirdly enough, the stage 3 has cleanup to do that happens after runsvdir is already dead (which doesn’t make much sense if runit would take care of the system state).

I still have an old void ISO, although I think we could find the same info on github. The older runit/1 was way messier, but somewhat recently it got converted into serialized scripts under core-serivces, which makes things a bit cleaner, but is still hacky. You have no way of undoing these changes via a service (machine state) change and some things can be run in parallel (like mounting dev, sys and proc, instead of mounting them 1 by 1).

And it’s not just that. On startup, runit requires that you run a udevd daemon unsupervised, before runsvdir starts, which defeats the whole purpose of supervising longruns (explained better in s6-rc/why). But udevd also requires oneshots after the daemon is active (which runit can’t handle, which is where the init scripting comes into play).

The s6-rc runtime does a way better job at handling all the services and their dependencies and it’s the reason I like s6-rc so much. You can define service dependency to just scripts (they’re called oneshots). Heck, even systemd can do that (albeit in a very non-optimal way). And s6-rc (when s6 runs as pid1) handles udevd fantastically. And s6 also handles logging better, but that’s besides the point (it’s hard to lose logs from crashed services in runit anyway, but it’s theoretically possible).

With s6-rc, I translated all the core-services that void provides into s6-rc services. And the only thing that stage 1 in s6 suite does is mount a /run tmpfs (for s6-svscan), launching s6-svscan, creating the /run/service directory (for s6-svscan and s6-rc) and launching s6-rc. Can’t get simpler than this. Stage 2 then starts all the services needed (like mounting the fs, starting tty etc.).

Neither does s6. On the skarnet site, it’s explicitly mentioned in a few places that no malloc is used (i.e. dynamically allocating memory). Not sure about s6-rc (I doubt it does, but I haven’t checked). In the s6-why, it’s mentioned that neither s6-svscan or s6-supervise allocate heap memory (re. same thing). I think there’s a very good reason skarnet got sponsored by Alpine Linux to work on s6.

Both work with glibc, musl, llvm, uclibc and I believe dietlibc too. Systemd only works with glibc.

I didn’t find any better runit guide than Gentoo’s wiki on it. It’s a bit hard to digest, but you can skip most of it and go where you need to go (like custom down commands for services.

https://wiki.gentoo.org/wiki/Runit

I’ll be honest, I didn’t do a ton of homework on runit before switching to s6 suite, but that’s because I was already kinda sold on it. I did a lot of research on startup sequence and trying to define service dependency (which I made in the run service file for runsvdir services, but I couldn’t figure out the shutdown sequence, so I gave up on runit).

It does startup things in parallel and based on a (weak) dependency tree, but it’s a hackfest (requires, wants, needs, partof, bindsto - why so complicated? you should never have a service that has soft dependencies on other services).
https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html

If you don’t use “wants” and “after” together, systemd will literally want to start a dependent service in parallel with the one getting started. Say you need to start nginx which depends on mysql. If you only have wants, they are started at the same time. If you have after, nginx starts after mysql gets started, but does not guarantee that mysql is actually up when nginx starts. You must use both…

With s6-rc, a dependency is something set in stone. Dependency not up → service not up. If your website can function without mysql, but more features are enabled by having mysql running, then don’t set mysql as a dependency for nginx and allow them to start in parallel (or if mysql fails to start, you still start your web server, but you’ll get db connection errors).

I don’t remember if I mentioned here, I need a service manager that handles shutdown sequence properly. I don’t want everything killed in parallel, if I have A → depends on B → depends on C, then when I start up I must have the order of startup C, then B, then A and when I shut down, I must have A going down, then B, then C. Systemd actually does this. Runit doesn’t, it kills everything in parallel.

For runit, the easiest way to provide service dependency is to write the run service file to do an infinite wait / sleep (or fail after n attempts or seconds) in which it checks if the dependent service is up and running. Otherwise, a servicve will just crash and get started back up, until all the services are eventually started.

On shutdown, I’ve never looked deep into it in runit, but I don’t think there’s a way to stop a service before another one gets stopped.

PhaseLockedLoop · July 5, 2024, 1:56pm

Im sharing your awesome thread on discord to those who want to get off sysd

ThatGuyB · July 5, 2024, 2:08pm

Please make sure to pass any questions here.

jaskij · July 5, 2024, 3:25pm

So, I do not want to get off systemd, @PhaseLockedLoop may have exaggerated a bit, but it’s still a great and interesting thread. Reading the thread, I realized that while I’m no expert, I’m probably deeper into systemd than most, so I’m happy to answer questions on that side.

I am happy we’re seeing a new service manager, to perhaps challenge systemd.

It’s a little more complicated than that, because some modules are loaded from the filesystem. AFAIK it’s more like: load modules from initrd, mount filesystems, load modules from filesystem.

AFAIK, it’s not that Pottering is actively hostile to other libc implementations, he just doesn’t care about anything except glibc. I have seen a post on Mastodon from him that he will accept PRs removing this dependency if they do not introduce bugs or compromise functionality.

This part I do admit seems overcomplicated. I’m pretty sure there is some niche use case for this, though. Now that you made me think about it, the whole thing more resembles a state machine than just plain dependency tree. Which… it has it’s uses, but is complicated. Also: Wants= is still a soft dependency.

When systemd considers a service started is also an interesting topic, which ultimately lead to the libzma fiasco (where, I believe, the primary party at fault was whoever wrote the patch that Debian used).

Consider a service which takes a long time to start up. When does the service manager consider it started? Is it immediately after it’s started? After it forks? If it forks at all? That’s one of the reasons for sd_notify(). Which you should reimplement (it’s like fifty lines of C), instead of linking to libsystemd to bring it in (that caused the libzma hack).

Before I offer my own perspective, I’d like to note that I’m in the unique position of a person who both writes services running on embedded Linux, and is responsible for preparing the final image for deployment.

Now to stuff I have actively used in systemd, that you didn’t mention in the context of s6 @ThatGuyB , and I’m interested to compare notes:

log management: I don’t need to implement it, I just spew stuff out on stdout and have it taken care of for me
no need to fork when starting
clear indication when startup is done, if the service implements it
hardening the system by using the restrictions described in systemd.exec(5), anything from limitin memory usage, through disk quotas, to filesystem restrictions, to disabling network connections, and other stuff I have yet to use
systemd --user, that one on my workstation to manage the SSH agent
not directly systemd, but show me a network manager that can configure a CAN interface, or an embedded switch, that isn’t systemd-networkd

Looking back at the stuff above, the most basic advantages are not to the system administrator, but to the service author.

But first and foremost: I’m the new breed which, despite having spent nearly a decade maining Linux as my desktop and working with it professionally, has a severe aversion to shell scripts. I will take systemd over anything that requires shell scripts, always.

Bash is an absolutely awful language I do not want to use. Every time I write a script in it, it’s a cop out.

Okay, that was not strong enough: I loathe bash.

PhaseLockedLoop · July 5, 2024, 3:39pm

Im glad to see my zeal has caused you to return after 2 years

sorry couldn’t resist the comment

You sharing your perspective as the new breed so to speak regarding systemD will be helpful to all sides imho. Since it will also help s6 people understand where to make it more user friendly or rather service author friendly (s6-rc)

jaskij · July 5, 2024, 3:55pm

btw, would a separate thread with a summary of WTF happened with SSH, libzma and how systemd even comes into the picture be worthwile? I’d need to look up some stuff first, but it’s on the cards.

jaskij · July 5, 2024, 4:33pm

Going through some s6 docs, which answers some of the points I wrote earlier. But it also reminded me of one more feature: delays and backing off in restarting a service that is down.

ThatGuyB · July 5, 2024, 4:37pm

I haven’t encountered a system (yet) where root isn’t mounted as read-only by the bootloader. Even with encrypted root-on-zfs, ZFS-Boot-Menu asks for a password, decrypts the dataset and then mounts root in the loaded initramfs and then init starts.

Not here to bash on systemd, we can make another thread for that.

I was discussing with @Marandil on either my own blog or his just today regarding this. I personally love having systemd gurus get involved, because they can actually point out pain points and maybe potential advantages to systemd.

I’ll put it in other terms here than our discussion. If it’s a niche need, you should be able to make the service itself account for that, instead of having the whole service manager do it. If 1/100 services makes use of After=, then that service can handle its soft dependencies inside the service and not have to bog down the other 99 services. This and the fact that generally you want hard dependencies anyway (even if you can get away with soft ones) is why s6-rc takes the hard dependency stance AFAIK.

There’s notification-fd in s6-rc for that, but, just like sd_notify, needs to be implemented in the daemon (so just like debian does their patches, a distro like, idk, devuan, but with s6, would have to patch the notification-fd in). And unlike the sd_notify, this thing doesn’t even need any s6 libraries or anything, it just needs to write a file telling everyone the service is ready.

s6 has its own logger, but you can use anything for logging. You just need to write a logger service and define it as the consumer for your service. Make sure your service spits things out to stderr (in bash it’s 2>&1, in execline, which is preferable, it’s fdmove -c 2 1) and that your service doesn’t fork and you’re gold, all the output (or just the stderr if you don’t want to log stdout) will be saved to a defined log.

s6’s logger is even more powerful, because you can aggregate outputs from multiple services (a service can be a producer for 1 service, but a service can be a consumer of multiple services). And you can have basically an infinite amount of log parsing between services, until you reach the final consumer. I find logging in s6 really powerful, I just didn’t mention it much (I switched to it from socklog).

You must run in foreground, otherwise, it won’t work. Besides, backgrounding or forking is kind of an old way of doing things anyway, in s6, ideally you wouldn’t fork anything.

This one’s almost a guarantee won’t be a thing in s6. That’s not the job of a service manager. If you need such restrictions, use other tools like cgroups, bubblewrap, selinux, fapolicy and so on, inside the service itself, when you start your service.

Not sure about this one. I know s6 provides options for services that will fork, to set up their env as another user, group, or both, through s6-envuidgid (which functions almost like daemon-tools’es envuidgid). But your daemon must know how to run as a certain user. For example, chronyd implements the -u user flag, to run chrony as a non-privileged user. s6 will only assist to set the env accordingly, but that’s it.

There’s probably other way of doing this if your service doesn’t support it outright, like su - user -c /path/to/command -parms.

IDK what a CAN is (I assume a bus interface for embedded stuff), so I can’t even begin to provide an alternative to that. I know the network management part of linux is pretty horendous, with only networkmanager and systemd-networkd providing any adequate basic functionality for some scenarios.

What I’m doing on my own systems is just using the built-in ip command to set up stuff (from bridges, to vlans, to bonds, to ip addresses, to routes).

One thing I think s6 lacks is a “conflicts” option. Say you have a networking service that sets up wlan0 in a certain SSID with a certain static ip, and another one that sets it with another one SSID and ip. If you launch one or the other, all good, but if you launch both, you’ll have issues with the 2nd one. You have to either code stop-gaps, or force the other services to stop when this one launches, which isn’t ideal when you have to ensure that more than a dozen services are stopped, in each service). But then, even when you write conflicts, you have to write it for so many services anyway.

I like bash, but I don’t like to write bashism (because I like my systems to be portable, so I’m ideally aiming for POSIX-compat, but even then, I don’t always succeed). I always use either oksh or dash (sometimes ash when I’m on alpine). Lately, with the advent of s6, I’ve gotten into execline, which is certainly interesting for service scripting. You can use execline outside the s6 ecosystem (which is why it doesn’t have the s6-* prefix, although s6-networking has the prefix, but it’s mislabeled, as you can also use that outside s6).

ThatGuyB · July 5, 2024, 4:56pm

You can add new details here. I can edit top post to link to your post lower in the comments.

Note sure what you mean by delays. Delays between attempts? You can program a manual delay in the “finish” part (finish is a process that always runs after a service dies, either crashed or killed).

We’ve got all the “timeout-*” stuff (timeout-up in ms to start a service before it’s restarted, timeout-down in ms to stop before sending a sigkill etc.). There’s also timeout-finish. By default, finish shouldn’t run more than 5 seconds, but you can increase its timeout and have a sleep 30 or something, in-between restarts and nothing else in the finish script (the idea of finish is that it’s usually used to clean-up after a daemon dies, no matter if it’s killed or crashes, but just like most things in s6, everything is highly customizable).

There’s max-dealth-tally (# of deaths of a service), mostly used for tracking, which might be a way to integrate a max number of deaths before a service is stopped completely. This option is used to throttle restarts of the service apparently (somehow, idk how). There’s a default value of 100 in s6-rc (if you restart your service more than 100 times, something could kick in, but by default, it’s just that s6-supervise will forget about the previous older than 100 restart attempts).

I’m not yet aware of any limited number of try-up before the service is stopped. You have the “once” option in s6-svc to start a service once and if it crashes, it doesn’t attempt to start it again. I’m pretty sure there are ways to implement a max count of starts in s6, before a service is stopped, I didn’t look into that.

jaskij · July 5, 2024, 5:33pm

Not a guru, just an above average user who actually likes the thing.

That’s a very fair argument. Personally, I’d prefer something in the middle. systemd is overcomplicated, while s6-rc is a bit too barebone for me. Between the two, I’d rather navigate overcomplicated which lets me express my needs that an opinionated barebone which doesn’t.

I read the docs three times, and only clicking the example afterwards made it understandable how to implement it. I gather the content written can be anything?

That’s another thing: watchdogs. With sd_notify() and systemd I can, if I so desire, connect the hanging of my service, through systemd and the kernel, all the way to a hardware watchdog outside the processor.

sd_notify() does not need a library. All you have to do is read an environment variable, open the socket in that variable, and then write the notification(s) there. Linking the library to avoid reimplementing, while always an option, is IMO a mistake.

If you want to see the implementation, there’s an MIT-0 version pasted towards the end of the docs.

Yeah, I don’t have much experience with logging yet, so can’t really speak to the capabilities of systemd. Although iirc it can quite easily plug into syslog infrastructure if desired.

On this we agree.

I assume that by service, you mean the s6 service definition? Yeah, on this we do not agree. I’ll get back to this one later.

Something I have noticed just now:

There’s at least two parties which will change the service definition, and the changes will at times conflict. You have the definition coming in from upstream (software authors, as modified by your distro), and then there is the stuff you introduce yourself as the system administrator. How are you going to manage the two, merge them?

This seems like an important drawback to s6. The package from your distro will install one version. You modify it, perhaps some minor detail. Then you need to take manual action on every subsequent update which touches this files. s6 could use some sort of two-tiered system, although I’m not sure how it would be implemented, since you’re dealing with freeform scripts.

systemd --user is not running a service as a specific user (although there is that capability). It’s systemd itself running as the user. It’s started when a user logs into a machine and kept alive as long as they have at least one session active. It’s useful for maintaining stuff that you want to start automatically and kept running, like ssh-agent.

CAN was originally an automotive bus, connecting the various electronics in your car, and it’s used in this role to this day. Have you heard of OBD-II? It uses CAN. It has since expanded to be used in industrial stuff in general. Under Linux the standard way to implement is using socketcan network interfaces.

Re: scripts with ip, I’ll get to this later too.

Later.

Interim:

Wrong hack. It’s not regreSSHion. It’s the xz/libzma hack a few months ago.

I’ll have to dig in the details, but AFAIK the TLDR on that is:

Debian patched in systemd support into OpenSSH, badly. Unnecessarily linking libsystemd. Which loaded, among others, libzma to support something. And libzma was the target of slow-roll takeover by a threat actor.

The whole thing was discovered only because the hack injected by the threat actor into libzma was faulty, and would segfault at times, causing the delays that the original reporter noticed.

What may have spooked the attacker is that systemd started moving from hard-linking dynamic libraries (which loaded them early in process startup and therefore excellent place to inject stuff) to on-demand loading of libraries through dlopen(), which would have completely negated the injection.

Back to later:

Not sure if you have noticed, every single of these “later” instances, I mentioned having something I liked in systemd, your reply was, more or less, “Oh, you can script it in yourself”. I do not want that.

I do not like, nor want, to write shell scripts. I’ll have to do some soul searching to find the actual reasons, but something deep in me believes it fundamentally wrong to manage a system with a pile-of-shell-scripts.

Marandil · July 5, 2024, 8:34pm

You keep misspelling liblzma. LZMA is the compression algorithm.

This is the right thread:

Btw IIRC sd started moving towards a standalone lib for sd_notify, outside of libsystemd.

@ThatGuyB
On the wants/requires/after debacle, I need to check something and will reply in your thread later, but @jaskij’s comment reinforced my opinion that the fundamental difference, and in my opinion one of the sources of systemd success, is that it gives more flexibility towards service authors and package maintainers.

These kinds of soft dependencies are crucial if you have an ordering retirement, but at the same time don’t know enough about deployment environment.

I guess they feel natural to me, because I have some background on compiler engineering, and it’s natural there to have both hard dependency (i.e. one value depends on another) and ordering dependency (e.g. the prints you write in code need to execute in the right order, even if they don’t depend on each other).

jaskij · July 5, 2024, 9:22pm

Ah, thanks for that.

Not sure if you caught that in the long comments, but right now, there is a standalone, MIT-0 licensed, reimplementation of sd_notify() towards the bottom of the manpage.

Maybe that’s why they never stood out to me - I have full control of my deployment environment.

Separating ordering and dependency, as you say, makes some sense, although is probably less applicable to init systems than to compilers.

But going back to the initial example from @ThatGuyB of a website - sure, minimal functionality is available without the database, but it’s a last resort, I want the database started too if it’s possible. But if the database fails, I’m still left with a basic website. There is value in showing an error page instead of rejecting connections.

Just comparing the notification system - maybe it’s just familiarity, but s6c requires a deeper understanding of the Linux system than someone writing a service nowadays will have. It feels arcane. Most people (me included) just don’t deal with file descriptors directly anymore. And the documentation isn’t that good either. By comparison, systemd’s is simple - check for an environment variable, open file, write to it.

Another reason for systemd’s popularity is that there really weren’t good service managers at the time it first launched - best I can tell, only Upstart predates it, and it got dropped a few years after systemd came out.

Edit:

btw, sorry for hijacking the thread. It was supposed to be about S6 but I turned it into S6 vs systemd. Should we split?

ThatGuyB · July 5, 2024, 10:22pm

I’m still new to s6, not sure. I think it’d be the readiness codes that you put in the file. The s6-logger has this notification-fd file and its contents is always 3. I’ve been using it without knowing its meaning yet, but it works, sooo… oh right, both s6 and 66 recommend getting into it without reading everything first, since most of the advanced stuff you might not even see (unless you write services yourself, in which case, since there’s nothing good out there, I have to).

Yes, inside the service definition.

The way I proposed handling this is allowing the user full control over the system. You never change a user’s source files and never mess with the user’s system state (the s6-rc database). You give services from the repo in an “examples” folder (I currently have that be delivered in /etc/s6-rc/s6sv) and if you don’t want to have to think about the service, you symlink from the examples to the source folder (/etc/s6-rc/source - basically the fountain-of-truth for your system state).

Because (in the way I envision it) everything goes in the examples folders (when managing a repo, you’d have a service file bundled with the program’s repo package that would only be added when it’s installed), if a user wants to have their own definition or custom folder, then one would either copy the example service with a custom name in the same location and symlink with the normal name to the custom name (e.g. /etc/s6-rc/source/chronyd → /etc/s6-rc/s6sv/chronyd-custom-service) and s6 will see it as just the normal name (despite it pointing to literal gibberish as long as the folder is a valid service definition), or one would just copy the service directly from the examples to the source folders and modify it there.

The above is kinda hard to digest. Let me rephrase it. Everything goes to examples, users should never touch examples directly (because they’ll be always overwritten from the repo). They can symlink the example if they don’t want a custom service. If the example updates, they just have to update their s6-rc db and nothing else (to get the new service definition, but that’s not mandatory).

If users want custom services for their own needs, they either make a new “example” and symlink to it, or make the folder directly in the sources folder for s6-rc-compile. The repo maintainers will never touch the s6-rc source folder and will only deliver examples. The only exception would be on first system setup, when some low-level services (init-modules, mount-fs, tty etc.) are set up as symlinks to the examples in the source folder. That’s the only thing users should generally avoid modifying, but they can remove all of them and use their own custom options (busybox tty instead of agetty, added modules etc.).

Yes, I know systemd has the “systemctl edit something.service” option, I’ve used it myself. I find this too be too much on the chaotic side (why have a built-in functionality that could introduce bugs, instead of using something like examples?).

I prefer the fedora approach, you don’t enable services for your users automatically when someone installs something. The way ubuntu and debian does it might make it easier (automatically setting things up), but that’s insane when it comes to user preferences. Just like you have to manually either “systemctl enable service,” so would you have to enable a service in s6 (which is currently a bit complicated, but I’ll be working on a helper script to make it easy - that’s why s6 is easy, to allow frontends to integrate with it).

If you view the larger pictures, they’re just services. Actually, something not said in many places, the both the service files run and finish can even be binaries and don’t have to be scripts. If you have a custom, idk, java program that you just execute and it does its magic, then you can literally just copy the binary in the source folder, name it to “run” and go with it. It won’t be ideal, because it bloats the s6-rc db, but it’s workable. Ideally you have a very simple execline script that only calls that binary from somewhere else on the system (like, idk, /opt/program or /usr/bin), so you keep your s6-rc db lean.

This is especially important if you want to update your binary (get a new version) without having to rebuild your system state (that’s why you’d invoke it from somewhere else). In some places, actually having it in the db might make sense (like in embedded), so everything just runs from the db (s6 even has a data and env folder for services, that also never get touched).

You can run nested s6-svscan as a user, which can have its own s6-rc database. You can even run s6-svscan + s6-rc under runit or systemd. You can even use s6 as your “glue” between your services if you have a large unix suite composed of multiple programs that need to executed in a certain order.

s6 is flexible when it comes to that (the fact that systemd can’t run if it’s not PID 1 is total insanity to me - if I chroot into my rootfs from another system and I want to disable a broken systemd service, or want to start a systemd service that I know should work just fine, then I shouldn’t get an error that systemd isn’t running as pid 1 and should just start the dang service, dangit!).

I’m not into those kinds of embedded systems, sorry.

Oops.

You didn’t need to explain to me, but I still read the explanation regardless.

I noticed, that’s what I wanted to write before you wrote this, but I got busy with other stuff and you got ahead of me

With more and more discussions, I find that this is becoming a trend. This is not criticism of systemd users. It seems most of them like having the service manager do things for them, especially when they’re the ones writing the service (unit) files.

I can see why one would prefer built-in directives, as opposed to the stability of a minimal program, where you’d have to program (probably repetitive) functions. In regards to s6 stack, the default is s6-rc, but there’s another contender, 66, which makes things a bit more like systemd definitions (they look similar to unit files in some aspects, but you still have some scripting involved, like the start command and I think there’s no soft dependencies either).

This is a big philosophical difference between s6 stack and systemd. Systemd does a lot of things (and arguably not well, but “good enough” for most people). The advantage of s6 is its extensibility. I can see why s6-rc maybe isn’t for you. And maybe 66 won’t be for you either, as it still maintains some basics that s6-rc also does (66 used to be a bunch of wrapper scripts around s6-rc, but it became its own thing later on).

But there’s a possibility that someone might write a new service manager around s6-supervise that acts more like systemd and offers all (or most) of the built-in functionality (you probably won’t see stuff like netword, gummiboot or nspawn though, which should’ve been separate projects anyway).

The author of s6 actually wrote an article on how to reimplement systemd in s6. It’s insanity.

I prefer execline rather than shell, but again, you can have a service be anything, even a c or java binary. The reason I prefer to write even the basic service file in execline is to keep my s6-rc db minimal.

Here’s my simplest service definition.

# cat cronie/run 
#!/bin/execlineb -P
envfile -I /etc/s6-rc/config/cronie.conf 
importas -uD "" OPTS OPTS

exec cronie-crond -n ${OPTS}

In sh, it’d translate to something like this.

. /etc/s6-rc/config/cronie.conf
export OPTS

exec cronie-crond -n ${OPTS}

If cronie-crond had the default to run in foreground and not daemonize (background) itself, you could literally invoke cronie-crond without any arguments (the OPTS arguments is currently empty, but it’s used if an admin wants to add additional flags, like enabling debug and restarting the service, without having to recompile the s6-rc db, because the system state hasn’t changed, but trying to run a service with more verbosity).

One of the more crazy services I’ve written is the nfs-server (because nfs has a lot of moving parts). You won’t need to write this yourself once I’m done (which is why I have a git for s6_services), but you could edit it (once I finish the commit, nfs is not done yet).

I’m a sysadmin, s6-rc comes natural to me, because I have to modify services from time to time. In systemd I’ve used the edit option, as mentioned before. Sometimes I even have to create them from scratch (mostly did this on nixos, but not with the native nix stuff, just the systemd unit file itself). I never saw practical places where a soft dependency makes sense. Maybe I’ll get confronted with that at some point, but until then, I need something that works for me and I don’t mind having to use just hard dependencies.

I’m really enjoying our discussion. I’d prefer if we can keep it here. I want to see all point of views from systemd side and I’m glad jaskij is here as well.

Do you even deal with notifications in systemd? I doubt most people need to think about these things. It’s up to software developers and at best maintainers to add these things in. Once defined, a user will just know to set up the service (or it’ll come from the maintainers anyway and a user won’t have to even think about it, just like with systemd).

But as far as file descriptors go, many people (probably even a majority, or at least a large minority in unix) still deal with them (stdin, stdout, stderr). Just that rarely anyone deals with custom fd and most people just pipe stuff (well, pipes and network sockets are also fd - don’t ask me to give you an in-depth overview, lol, I only know the surface / basics).

Slightly off-topic, but Upstart’s design is sane. Its implementation is the definition of insanity (ptracing a PID to monitor it).

I’m pretty sure the why answers most of the s6 questions on the design choices. Stuff like “how to do X in s6” is what I can (hopefully) answer.

No. As long as it’s still s6-adjacenet and questions on how s6 could handle stuff, it’s all good.

ThatGuyB · July 6, 2024, 12:49am

I want to correct this. I just found out about s6-applyuidgid program. It actually does the application of permissions for you.

I ran a few quick test services and they work. Keep in mind that this isn’t the equivalent of su -, because you won’t be getting the environment of the user (like its shell vars). It’s more similar to just su (without the -). A script (or service) will always run with the same env if it’s a service, it must be reproducible, so it won’t take vars from a user’s shell.

The bonus for this is that, even if your shell is set to /sbin/nologin or /bin/false, su will fail, while s6-applyuidgid will succeed.

# s6-rc -u change test-longrun
# s6-rc -u change test-longrun-ok
# s6-rc -u change test-oneshot
# s6-rc -u change test-oneshot-ok
### Note: when starting a oneshot, your shell will  stay locked
###    until the "up" script finishes; I got got the "ps" and
###       ctrl+c'ed immediately; just bg, & or nohup if you want to 
###          start a very long-running oneshot

# ps -ef | grep -v "grep\|s6-rc" | grep "test\|sleep"
root      6781     1  0 23:47 ?        00:00:00 s6-supervise test-longrun
3000      7174  6781  0 23:48 ?        00:00:00 s6-sleep 9000

root      6783     1  0 23:47 ?        00:00:00 s6-supervise test-longrun-ok
oddmin    7190  6783  0 23:48 ?        00:00:00 s6-sleep 9000

root      7217  7216  0 23:48 pts/1    00:00:00 s6-rc -u change test-oneshot
3000      7220  7219  0 23:48 ?        00:00:00 s6-sleep 9000

root      7250  7249  0 23:49 pts/1    00:00:00 s6-rc -u change test-oneshot-ok
oddmin    7253  7252  0 23:49 ?        00:00:00 s6-sleep 9000


# cat test-longrun/run 
#!/bin/execlineb -P
s6-applyuidgid -u 3000 -g 3000 -G 3001,3002,3003 s6-sleep 9000

# cat test-longrun-ok/run 
#!/bin/execlineb -P
s6-applyuidgid -u 1000 -g 1000 s6-sleep 9000

# cat test-oneshot/up 
#!/bin/execlineb -P
s6-applyuidgid -u 3000 -g 3000 -G 3001,3002,3003 s6-sleep 9000

# cat test-oneshot-ok/up 
#!/bin/execlineb -P
s6-applyuidgid -u 1000 -g 1000 s6-sleep 9000

You can see how services either forcefully run as ID 3000, or when set to uid 1000, as the known user with that id. So your service doesn’t need to know how to switch user / go down in privileges, you can run a service as a user directly (this would probably make sense for writing postgres and mariadb service files, I’ll need to remember this).

In the above shell output, you can also see there's no supervise for oneshot services and that some in-between pids are missing. Here's more details

In s6, all oneshots are handled by the s6-ipcserverd, which is handled by the s6rc-oneshot-runner, a built-in s6-supervise service. The latter calls all the (one)shots, he’s the hotshot.

The (s6 built-in function) “s6-sudod” longrun service calls upon the oneshots (one instance for each oneshot). This service is kinda the equivalent of s6-supervise for oneshots, but dies after doing some finish-up work once the “up” oneshot finished running (and does something depending on whether the exit code is 0 or another code).

The s6-applyuidgid function does nothing but lower the privilege and won’t show as a process, unlike s6-sudod or s6-supervise.

# ps -ef | grep -v grep | grep "s6-sleep\|12653\|533\|510"
root       510     1  0 Jul05 ?        00:00:00 s6-supervise s6rc-oneshot-runner
root       533   510  0 Jul05 ?        00:00:00 s6-ipcserverd -1 -- s6-ipcserver-access -v0 -E -l0 -i data/rules -- s6-sudod -t 30000 -- /libexec/s6-rc-oneshot-run -l ../.. --
root     12653   533  0 00:17 ?        00:00:00 s6-sudod -t 30000 -- /libexec/s6-rc-oneshot-run -l ../.. --
oddmin   12654 12653  0 00:17 ?        00:00:00 s6-sleep 9000

# cat test-oneshot/up 
s6-applyuidgid -u 1000 s6-sleep 9000

Here we can see services in action when the shell is false or nologin, which wouldn’t work with su in a shell script.

# ps -ef | grep -v grep | grep "sleep\|test"
root      9500     1  0 00:07 ?        00:00:00 s6-supervise test-longrun-chrony
chrony   10391  9500  0 00:10 ?        00:00:00 s6-sleep 9000

root      9501     1  0 00:07 ?        00:00:00 s6-supervise test-longrun
nobody   10370  9501  0 00:10 ?        00:00:00 s6-sleep 9000

# grep "nobody\|chrony" /etc/passwd
nobody:x:99:99:Unprivileged User:/dev/null:/bin/false
chrony:x:980:997:chrony unprivileged user:/var/lib/chrony:/sbin/nologin



# cat test-longrun/run    
s6-applyuidgid -u 99 s6-sleep 9000

# cat test-longrun-chrony/run 
s6-applyuidgid -u 980 -g 997 s6-sleep 9000

This also shows as an example of how easy it is to write an s6 service. The only other thing you need for a longrun or oneshot is:

top level dir = name of the service
- file “run” / “up” = contents of the actual longrun or oneshot service
- file “type” = contents either “longrun” or “oneshot” (basically defines what the service type is - pretty straight-forward, right?)
- optional directory “dependencies.d” = containing files named after other services (either atomic services or bundles) that define which services need to start before this service and stop only after this service.

I feel it’s by far easier to write s6 services than systemd services. Maybe that’s just me, since I’m a script afficionado, but even then, writing services isn’t difficult. It’s harder to write scripts or programs that actually do things than to write a service file that just calls upon a daemon.

ThatGuyB · July 6, 2024, 1:08am

Wrong again! Kinda. There’s s6-softlimit built into s6, but it’s not as feature-full (and some things are better handled by other tools anyway).

The systemd unit files conversion page really is a good way to dig deeper into s6 (it’s not very detailed, but has a few hints of good stuff here and there).

Marandil · July 6, 2024, 4:51pm

If you will

So, first of all, After= is not a dependency per se, it’s just an ordering requirement.
Wants= and Requires= are dependencies.

Let’s take the following “base” services:

# cat my-a.service
[Unit]
Description=My A

[Service]
RemainAfterExit=yes
Type=oneshot
ExecStart=echo "Starting My A"
ExecStart=sleep 3
ExecStart=echo "Started My A"
ExecStop=echo "Stopping My A"
ExecStop=sleep 3
ExecStop=echo "Stopped My A"

[Install]
WantedBy=multi-user.target

# cat my-b.service
[Unit]
Description=My B
After=my-a.service

[Service]
RemainAfterExit=yes
Type=oneshot
ExecStart=echo "Starting My B"
ExecStart=sleep 3
ExecStart=echo "Started My B"
ExecStop=echo "Stopping My B "
ExecStop=sleep 3
ExecStop=echo "Stopped My B"

[Install]
WantedBy=multi-user.target

By default, both services are disabled, but you can start them on demand:

Now let’s see what happens if I add to my-b.service:

After=my-a

image843×569 18.1 KB

As expected, only setting After= doesn’t affect my-a at all. Let’s check enable:

image975×79 4.75 KB

As expected, it’s not affected.

BTW, you can start them both at the same time and ordering should be preserved:

Wants=my-a

image807×519 17.9 KB

As expected, my-a started, and moreover started in parallel, because we didn’t specify that it needs to be running to start my-b (no After=).

A bit questionable if stopping my-b should stop my-a since it brought it up as a requirement, but I can kinda sorta understand that.

my-b can work without my-a (weak dependency)

Wants=my-a but my-a fails to start

image894×553 24.4 KB

Again, weak dependency so my-b is fine with starting even if my-a fails.

One reason this is the case is that you use Wants= or rather WantedBy= to specify targets. If you used Requires= or RequiredBy= your whole startup process could fail if one component failed. And - in my opinion - it’s better to start in a degraded state and debug than bail out because your VPN couldn’t connect or something

Oh, and one more thing, this is the same in both cases with Wants=, but:

IIRC that’s because enable/disable only looks at the [Install] unit section.
If a unit is pulled via the .wants/ drop-in, sysd can read the [Unit] section and pull the remaining dependencies on its own.

Wants=my-a and After=my-a

image860×555 18.8 KB

As you can see, my-b starts only after my-a finished startup. As expected.
Requires=my-a

image810×513 17.6 KB

No difference from Wants= here, however stop works a bit differently:

image817×596 15.7 KB

Breaking my-a has an interesting effect on my-b:

My guess here is that since they started in parallel my-b got into the active state before its dependency failed. IMO the service should stop at this point.

Requires=my-a and After=my-a and broken my-a

image946×594 27.5 KB

This time, as expected, the service didn’t even begin startup because its dependency failed.

Note that in all of these cases, both my-a and my-b were disabled in systemd terms, or rather were not enabled. Except for enable/disable there is also mask and unmask that work differently. Masking creates a symlink from /etc/systemd/system/ to /dev/null to make it impossible to load the unit. I can’t mask my-a because it’s already there, but I can mask my-c and change my-b to point to my-c instead (for science).

Requires= and After= is a no-go:
Wants= and After= works just as broken my-a:

image800×340 11.6 KB

I hope this research answers all your questions regarding Requires/Wants/After.
Oh and one more thing (vm using Ubuntu focal):

marandil@mdl-vm:~$ egrep "^\[Unit\]" -R /usr/lib/systemd/system | wc -l
417
marandil@mdl-vm:~$ egrep "^After=" -R /usr/lib/systemd/system | wc -l
263
marandil@mdl-vm:~$ egrep "^Wants=" -R /usr/lib/systemd/system | wc -l
63
marandil@mdl-vm:~$ egrep "^Requires=" -R /usr/lib/systemd/system | wc -l
83

ThatGuyB · July 6, 2024, 10:11pm

IMO, that’s the most insane thing in systemd. You need to combine wants and after, or requires and after, in order for your service to actually start up properly. The latter is particularly egregious. If you have just “sshd requires networking” and they’re started in parallel (the default if you don’t also specify after), sshd dies, then networking starts, then sshd gets restarted and starts properly.

It’s the definition of a race condition. Literally no different than if you had a daemon-tools style process supervisor (just restart everything until all your dependencies are met and if one service is broken, restart it ad nauseam - well, both systemd and s6 have limits on how many times you should restart a service). With s6, you don’t have to worry about that, because all dependencies are hard.

Process supervisions only get a pass, because they’re mostly an exercise in simplicity,. They don’t care about dependencies at all (so they don’t handle them) and that’s on the service writers to account for. Most runit scripts that make use of dependencies have a sv check my-a || exit 1 in the my-b run file. That way, you stop the service before the actual daemon starts and your service gets restarted until its dependency is met.

This is useful for heavier daemons that take a big load at startup, you can literally let the daemon start and die (because of unmet dependencies), but it’ll bog down your system (takes more CPU to load a heavy daemon, than to just tell the service it’s failed and have it restarted over and over - that way, at least, you’ll prevent resources from getting exhausted and allow your other services, including my-a dependency, to start up properly).

But service managers don’t get such a pass. s6 passes the test with flying colors. But systemd decided to not threat hard dependencies seriously and has allowed (by design) a scenario where race conditions could happen. So that leaves more work on the service writer too, just like in process supervision suites (i.e. they must add sv check my-a || exit 1 or After=my-a.service in their service definitions).

If you put your VPN as an early dependency, like “don’t start your login manager until your VPN is connected,” then that’s on you. Which is the reason I went with a sane ordering system in my s6_services repo. It’s basically following (more or less) a runlevel / target method. Load all the low-level stuff first (the first bundle ok-init), then load the mid layer (ok-rc-local) where the static IP configuration goes, then load multi-user (ok-multi-user) where you have dhcpcd, sshd, ntpd and so on (oh gawd, I just remembered systemd-timed also screwed me over).

You can have additional dependencies (and even higher-level bundles) after multi-user (or you can dump all of them inside multi-user, which is my recommendation, unless you have massive amounts of services - you can still have dependencies between atomic services in a single bundle). Besides, if you keep your s6-rc verbosity at the default of 2, you’ll see services that fail to start on the tty and you’ll know exactly what failed (nothing will ever be started until their dependencies are actually met).

If your threat-model is so high that you need your vpn or tor to be started before you even begin launching programs (like, before ntpd and what-not), then you can do that (well, you can do that in systemd too).

IMO, regarding this, I’m on the complete opposite side. I guess it’s a matter of schools of thought. My stance on this is that a system shouldn’t find itself in a half-working state, that’s worse than a broken system. At least with a broken system, you know something’s wrong. But with a half-working system, you might not actually know something’s broken, which can get really weird and make it difficult to troubleshoot.

And then you need to add more tooling on top of your system to find out systems are half-broken (like check when a service has died and got restarted too many times, like a counter, or fancy monitoring agents that use more resources than your minimal web server - although at some point, when you manage hundreds of systems like this, a monitoring software becomes kinda mandatory).

I already knew all this, but I hope it answers some other reader’s questions. It’s useful information.