Composing a regiment to maintain a clean and stable debian dektop system for 15+ years

I’ve been using arch for nearly 6 years now, I’ve come to the realization that I have a life and I cannot spend an enormous amount of time just to keep a barely functioning computer that can break at any time.
After some research I chose Debian stable, I’ve been testing it out and the experience is great.
However, I’m planing to buy a PC that I’ll probably keep around for 15 years, the accompanying software on the other hand should also last me a great amount of time without getting too cluttered or unstable.

What I have in mind right now is to rely on a few package managers:

  1. debian stable
  2. flatpak
  3. nix
    the other 2 seem to be compartmentalized and would not affect system stability in a major way and would be easy to update/remove/install.

For other applications that are not in the above-mentioned package managers I would want to create a directory named ~/programs/ in which I will store everything related to that package and if there are files installed outside that directory I will document them and write it to a file.

For other applications/development environments that engage with the main system too much I’ll set up a docker.

I yet have to read more about backing up a system efficiently (just the important files to be able to rebuild it in case of a fatal incident).

What is your opinion about this? do you have any suggestions, expiriences?

3 Likes

I ran Debian sid for 15 years before needing to reinstall. It broke from time to time, but i was able to fix most of that. The problem became accumulated cruft that eventually made it worth re-installing. Debian says to expect breakage with sid, so theres that. Debian stable should run at least that long, you will have to do a apt dist-upgrade from time to time , but should be better that sid was for me.

1 Like

Good quality optical media kept in a cool controlled envioronment can easily last many decades, but their lifespan is not usually the issue.
Drive and computer technology develops at a rate where the media may be unusable long before it actually deteriorates.
Maintaining a compatible drive is an option but electrolytic capacitors can chemically degrade and fail.
So at minimum you are looking for replacement of the capacitors every 20 years just to be on the safe side.
You also must remember that it might be a wise idea to store a complete machine just to access the disks.

Been many moons since I used Debian. I use Mint with Timeshift snapshots. I back those up weekly to a dedicated external USB 3.0 NVMe drive and they’ve saved my bacon a few times. Timeshift can be installed on Debian/Ubuntu as well.

15 years for a system is ambitious, regardless of OS. You need to stay on top of deprecated packages or better still, never install them. Debian has a project to reduce the footprint of their install and subsequent bloat associated with the packages. This may be of use.

https://wiki.debian.org/ReduceDebian

1 Like

What do we mean by deprecated packages?

Is this the kind of cruft which can be removed by apt autoremove --purge or…?

My two cents:

For a system to survive over more than a decade, the chosen distro is probably less important than the strategies for mitigating any issues you may have down the line.

That said, choosing Debian stable or testing is probably a good idea, it should work nicely. But any distro that can reasonably be expected to be around in a decade should do fine.

As for mitigation strategies, I find that filesystems with snapshots make backups and error recovery very easy. On my home laptop I have a super simplistic layout with an EFI partition and a LUKS container, inside that an LVM with two logical volumes, one for swap, and one for btrfs, which in turn is subdivided into a root volume, and a home volume. You could use ZFS here as well; it’s more mature but less integrated into the Linux kernel.

My strategy here is to snapshot both volumes regularly (particular before running package updates), and btrfs send the latest snapshots to an external disk regularly. You can of course automate that, and send to a home NAS, or to cloud storage.

Should something happen during an update, or the system get hosed for a different reason, you can roll back to a previous snapshot using some live system, and be back at it in no time. Should the disk get fried, you can recreate the basic structure on a new disk, btrfs receive from a backup, tweak the fstab/crypttab if needed, and be done as well.

That should take you most of the way there, the rest is fine-tuning.

2 Likes

I just did an update and grabbed this screenshot. Over time these will build up. They may become deprecated and / or unsupported. eventually Not to mention a security risk if they have any vulns. Good to keep them at a minimum to reduce both disk usage and any exposure.

1 Like

Oh yeah. I always run autoremove --purge after I see dangling packages like that. Warning that the purge flag wipes all your configs so you’ll be starting from scratch if you reinstall some packages

I really like this idea, btrfs ensures disk integrity and in case of a failure I will just have a backup on a secondary zfs pool. How would btrfs affect the performance on an nvme ssd? the becnchmarks on phoronix do not look very good.

In case of grub failure Is it possible to chroot into the system and roll back the changes with btrfs?

Do I have to do a full disk backup onto another disk? Is there a way to reduce the size?

I don’t know what your workload is like but I use a LUKS-encrypted BTRFS on nvme (gen2) as a daily driver (fedora, kde) and don’t notice any issues with r/w… bare metal though, not virtualized.

Can’t comment on snapshots.

Intriguing, but i have more questions rather than answers:-)

What is the workload(s) you are running?

Why 15 years?

What data are you keeping on the computer, if any? You mention programs, but nothing about data. Is it personal data, or professional data that maybe requires a retention policy eg tax records or regulatory compliance?

My own personal data experience suggests a data management (inc backups) strategy, to ensure the data stays error free and useable by your chosen workloads is a bigger challenge. e.g. I’ve learned to be careful to store photos as either DNG or jpeg, or at worst tif, as these are well supported formats. I’ve lost some early dSLR RAW files when they became unsupported by the manufacturer. I’ve lost a music collection to the some quirk of the walled garden of Apple, it’s still on my wifes phone, but literally no way to get it off now. I now keep a lot of my important text as markdown as it’s basically a text format. Many docs have been lost over the years as the doc format has become obsolete, although in many cases it’s no great loss - do i need that CV from 10 years ago!

Well I am a researcher in the field of Deep learning, so I work with a lot of random pieces of software.
I write a lot of scripts, config files to keep my daily flow going, I configured the hell out of a lot of software and they usually become nearly impossible to replicate after years of adding configs on configs. Plus updates rendered my scripts useless on arch and I had to keep rewriting them.
Alot of the tools I use also require a lot of configuration and I do not want to lose that, These are not just simply config files I can copy, they are within the application and the ones I can copy are too scattered.
Re-building my system would take weeks.
I rarely use formats other than just plain text files to store information locally.
as for the 15 year mark, I just want my software to outlast the hardware. At that point I’ll consider upgrading lol.
I do need backups for my project files, I usually find myself re-using pieces of code that I wrote a couple years ago. It’s not everyday that you stumble upon a 2GB pdf and want to compress it, but when it happens, you want to be able to.
I’ve lost many photos to dead hard drives, I don’t really care anymore. I don’t really have critical data. I just want my system to function the way I want it to with minimal effort.

1 Like

For what it’s worth, it sounds like you would benefit greatly from documenting your configs as a standard operating (i.e., research) procedure. I don’t have much else to say other than you’ve picked the correct distro for your use case (Debian). Can’t speak to how well this would work, and I think you’ll reach security update end of life before the 15 year mark. But that hasn’t stopped many a mission critical server from running…

One issue I can imagine is that near the end of this period you will run into problems installing the latest (deep learning) software packages due to all the missing (and uninstallable dependencies) on your system. Sure, you can compile the target package… but what about when his 15 dependencies also need compilation! It could be circumvented with flatpak or othet self contained packages but this may not be feasible if you are on the bleeding edge where I suspect researchers ought to be.

2 Likes

Yep, exactly, I’m trying to write procedures for everything that I do on my computer from now on. I’ve already written functions to automate part of it.
I will update debian to the latest stable version, but only at a time that I can spend time fixing things.
Arch does not like not getting updated for long periods of time however, not at all.

1 Like

I agree here.
Documenting updates, configurations, installs, removals, and especially backups is more important than people think.
Archived information make a history record of any changes.
Keeping logs are important.
Just how detaied you chose to suit your needs is your choice.

1 Like

That is what containers are for. Anything AI is a horrible mess of dependencies as is, to the point that using virtual environments or containers are basically a must.

And also in general using containers, flatpaks and the like would be very helpful keeping a system clean and maintainable.

On the other hand, that would be a lot of overhead, no? One thing I don’t necessarily love about containers is that you can end up with a MYSQL in every container if you’re not careful (which may or may not be your goal). Plus there is just the general overhead of containerization itself. Not everyone builds their images on Alpine :stuck_out_tongue_winking_eye:

Thnx for the detail. Makes more sense now.

I spent a few years in a deep learning startup building systems and workflows for the researchers and software devs, and rebuilding the dev and prod deep learning servers. Started off with no idea how the servers were built and everyone worried to touch them - everything built from latest source with the bleeding edge features enabled, but no one had any idea what the gcc or make flags were anymore, so they couldn’t be touched! We got it under control in the end.

Some things we learned that might help you, starting with easy ones:

  1. Document your build config. Loads of advice already on this. Literally a markdown log file with some links to the key websites you followed, key concepts in the build, screenshots of gui config and a bash history dump can be the difference between success and failure in a rebuild or recreating a training run. More detail is better, a tight procedure is best.

  2. Git, getting your scripts, text files and build config/logs into some git repos, not only gives you an easy backup method (just git push at the end of each day), but also an easy version control system to be able to retrieve previous versions of scripts, or known working versions if you give the commits a useful commit message. And git is going to be around for a long time.

  3. Get your training workloads off the host and into VMs or docker. This not only allows you to keep the host computer clean and reliable, but allows you to fully separate workloads into their own VM/docker container. Want tensorflow 1 with numpy version X for one training run, but tensorflow 2 with numpy version Y for another, it’s then easy as you just switch to the other VM or container, rather than trying to rebuild all your workload dependency chains. And that experimental build that goes wrong, just delete the container/VM and all you existing working ones are safe.

If you use docker, nvidia provides loads of prebuilt docker containers in their NGC container registry, and docker hub has lots of handy prebuilt containers e.g. general data science, or Jupyter notebook. Once you have the VM or container pulled locally, and built how you like, you can keep it for as long as needed, assuming you have the storage.

For a lot of stuff, I pull an nvidia docker container that has basic deep learning libraries in it, then I either customise it with my own docker file, or often just attach to it with a bash terminal and do the necessary config and installs. And keep a note of them in the build log.

  1. If your build procedures are v detailed, learn some ansible or docker/vagrant to automate the training workload build. This not only allows you to rebuild workloads in the event of a failure, but ensure that any time a workload needs repeat runs you can ensure the build is the same run to run. The great thing about ansible is you can start with shell or command tasks that just run the bash build /install commands you were using before, but now you have a repeatable workload build process that can live in version control. You can always learn “proper” ansible later and get the whole idempotency thing going if you want. This was a genuine revelation to our researchers and devs, I would build them a basic training workload with vagrant VMs, configured and software installs with ansible, and then they could just make their own mods to the ansible for each different training run.

  2. It sounds like you already know your workflow really well, so I hope you don’t mind the suggestion but I really like Jupyter notebooks. If you don’t know them they are a great electronic notebook that allows you to run code in them, with full GPU access, plus create graphs, keep markdown notes. They are really stable so no problem for multi day training runs. And they can also live in git for backup and version control, as well as being natively supported in GitHub for viewing. They have a text based file format so will be readable for a long time.

  3. We ran stock ubuntu, as I do now for my personal deep learning workstation. Nothing fancy in the OS or filesystem, default install with encryption enabled. We did experiment with some high speed file systems and storage for the dev servers, but they were so specialised that the admin headaches and downtime negated any speed increase. Different if you are running a data centre or large cluster, then this specialisation is worth it, but for a workstation I wouldn’t bother.

I’ve probably suggested too much. I also value simplicity, longevity and stuff that just works, and the above help me everyday, and helped my researchers and devs to get on with their work, and less messing around with the computer.

2 Likes

There are some really bloated Docker containers! “Data science container with Jupyter notebook server plus plus”!!! Which include every data science lib ever written!

But there are also plenty well built and light weight containers out there. Nvidia NGC has some tightly scoped builds. Most of the Dockerfiles are in GitHub, so you can see what has bloat.

I’ve never actually performance benchmarked a containers or VM based deep learning workload vs running directly on the host, but my understanding is that the performance overhead is minimal. If I consider the time to manually configure a server for a specific deep learning workload (0.5-1.0 day dependent on what is needed and might need built from source and my knowledge) vs the time to either pull a prebuilt docker container (minutes), or have my own docker images that I start (seconds), then docker is the bigger time win to getting the training workload processed.

As always it depends on our own workload needs. Maybe for multi day or multi/cluster GPU training workloads there is a notable docker performance hit? I’ve no experience on training longer than a day or two, and on a small number of GPUs.

Edit, had a google, this recent paper compares containers to direct on the computer. Applied Sciences | Free Full-Text | Performance Analysis of Container Effect in Deep Learning Workloads and Implications.

1 Like

vagrant+ansible is exactly what I can and should use. I was having problems with a software that used docker as part of it and the rest was on my system. I also had to manage different versions of it, I could not containarize it within another docker that couldn’t work, and runnig an entire VM was just too inefficient. this would totally solve my problem! Thank you!!

1 Like