Level1Linux - Fireside Chats

tar is too slow :frowning:

At least in some cases I had, when there was GB’s of data in a huge amount of small files.
tar and untar took hours for a backup and restore during a hardware migration.

Next time I will try to start early and use rsync in the days before and then just do a final sync on the migration date.

But I think this could be an interesting topic:
What other tools/practices are there for creating backups and restores, that could perform better than tar?
snapshots (if filesystem supports it), squashfs, rsync, a multithreaded / recursive / map-reduce like tar?, others?

2 Likes

I am using ARM as an easy and cheap way to get into this. Imagine if home routers were to use x86, the costs would be pretty big and you wouldn’t see home routers so widespread. Same goes in this case I am envisioning.

Regarding backups, this is another story. I am saying that there are setup tutorials, like “how to configure apache” and “how to setup gitlab” and “how to install and use postgres,” but none of those show you how to back them up. GitLab has its own backup tool, if you show someone to set it up, also show them how to backup in the setup process. If you configure a DB, show people how to dump it, or how to enable transaction logs and back that up.

What I was saying was that in my tutorials to come, I will include backup methods towards the end, after the setup is complete. I will likely go through my previous posts and add this towards the end if required. For example, setting up the roadwarrior VPN router doesn’t need a backup scheme, just copy the 2 scripts and your keys / certificates and you’re done. But for something like a DB, it needs at least a few words said about how to backup.

3 Likes

bzip2 is the fastest I tried when it comes to compression and decompression. And bzip by default does -9 anyway. I found that rsync was unreliable when sending big files (200GB+) over the ocean, so what worked for me was to bzip to standard output, pipe to ssh to another box and run bzip to extract from standard input. Not sure why that worked and not rsync.

1 Like

zstd and lz4 are much faster, lzma2 is more efficient but also slower :wink:

2 Likes

The problem isn’t necessary the compression, but in this case the sequential reading/processing of each file. If you have a huge number of files the sequential processing takes a lot of time. If all the files are on a single HDD parallel processing probably doesn’t make sense (because of read head). But if the files are on SSD(s), Storage Aray, I think there could be some speedup if you do it in multiple threads considering the IOPS limitation and the number of threads to start. (like when you do filesystem performance tests with fio with multiple threads, different queue depths,etc) .I don’t know if such a tool already exists, but I haven’t really digged into this.

1 Like

Why does pulseaudio work worse out of the box (crackles, distortion) on every motherboard i’ve owned for the past 10 years than linux audio did in 1996?

RE: Raid. I made a RAID IOPs calculator spreadsheet some time ago.

I’ll sanitise it and dig it out if anyone is interested. The tricky thing with modern storage though is it is really difficult to take caching into account.

3 Likes
  • XCP-NG with passthrough gpu to windows
    • passthrough their original windows boot drive too might help some people try it out
  • Deeper dive into xcp-ng xostor/linbit, compare/complement with raid (is hardware raid dead?), compare how much network speed makes it go vroom, dedicated backchannel network vs shared, how to effectively use multiple nics
  • Wireguard deep-dive (explain headscale, innernet, netmaker, compare with openvpn, nebula)
  • How do I build my own hyperscaler in my basement (maybe xcp, terraform, maybe touch on openstack, openshift, nested virtualization, finish off with a linode backup server or load balancer connected into the infra. clouds rely on huge swathes of similar hardware, how do you handle just a bunch of compute from your basement that might generally support all the same things, ddr3/4, vt-x, vt-d, aes-ni, but different generations or specs, maybe some are pi4 or rockpi or whatever)
  • Where are the lanes why can’t we have more lanes, is it because enterprise segmentation or is there a tech reason you need a threadripper/xeon to be able to plug more than 1-2 things in at once
  • Terraform and github actions have done great things at my work, how do i take that and transform my baremetal computers i have laying around into something neat and durable and easy to maintain.
  • Don’t say k8s, k8s is none of those things. A k8s rant video would be great.
  • Also a deep-dive into k8s would be useful because it is great when you need it.
    • maybe show off some ui dashboards along the built-in docker k8s
    • maybe show off podman and k8s
      • idk about going full redhat openshift, but maybe
  • is nomad from hashi any good? what is the next kubernetes?
  • what is something useful that isn’t crypto mining that i can do with my computers that isn’t a benchmark
    • cpu based, gpu based
    • things that can actually benefit from local cluster
      • ideally more than parallel copies of boinc or f@h with team code or whatever for points
  • explain RAFT
  • dynamic dns in 2022, lets encrypt, some sort of securely opening a port on pfsense
  • all those 2.5gbe aliexpress boxes? find a thing to do with them
  • ‘how do i make sure my traffic goes down the fast 10gb nic for the lan, but knows how to reroute through the 1gb if the 10gb goes down’
  • ‘i want lan stuff on 10||2.5 nic, but i want externally traffic only through the 1gb nic directly to the router segmented from regular traffic’
  • thunderbolt 4 2-port devices,
    • daisy chain 2 linux, 2 windows, 2 mac machines
      • shared 40gb ?ip? network
      • a bridge on each devices.
    • mac’s do it automatically, not sure what happens if there’s non-mac involved.
    • could show off that new super-long thunderbolt 4 cable from apple
  • any more examples of plug-and-play ipmi replacements for non-ipmi boards
  • can you make all the old computers you have work together to make dall-e pictures for you in hi-res
  • how fast can you render if you cluster, does bandwidth matter, 4 cards 2 per pc vs 2 twice as powerful cards in one pc vs 8 much older cards in 1 or 2 card/cpu setup
    • and why results are as they are, is it pci bandwidth, peak ghz in the card, the amount of dram in a single card, network bandwidth, stitching together the final result on the host pc, some other single thread thing or something
  • I’ve seen some resistance regarding best practices for containerizing large, old (decades), especially windows, processes. Some of it is the concern that they weren’t built for containers and boot times are measured in minutes.
    • I think there has been a(t least one) cultural shift in tech and I’m not sure everyone is on the same page.
  • Which link aggregation will actually work and not break my network setup randomly sometimes?
  • Wifi 6E
  • Why is Optane good, what happened to consumer Optane, why did Optane dimms not pan out (afaik), some real 101 on latency iops bandwidth. At this point the samsung pro 980 seems like the best go fast option but when would you want a 905P?
  • I go to amazon and type “64 gb ecc ddr5 udimm” and there is no results. When will that not be the case? How about at least parity with ddr4? Also why is nemix ecc ram so expensive? (Other than being apparently the only option, at least it’s fast with jedec profile)
    • is there someone at intel that we can threaten with a cosmic ray gun
  • what is the absolute cheapest way to acquire a ton of spinning disks, cables, compatible cpu/motherboards, cards to just get just a bunch of disks, say 54 drives, without needing a tech sponsor. not a fucking 4-drive nas running on a potato and preferably not the loudest dell powerwhatever from 2005 with equivalent cpu to the potato at 10x the power usage. power can be whatever, but not while sounding like a jet engine and also being useless.
  • talos is a neat way to setup a k8s cluster, uses an api – no ssh allowed. would be a fun video
  • yes, rust. but like, in detail about things like memory and stacks, borrowing, and exactly what they mean by safe and how you can still write some bad code. maybe show cargo, something with a rhai script, maybe a little bevy game, spin is a neat wasm framework that’s been going around
  • what things absolutely need just a single thread, the fastest single thread, why is it, that different things can’t? maybe examples that are because code architecture and examples that are because math it just doesn’t work.
  • cheapest backup for say 3tb of data with full retrieval say every 2-5 years.
    • chain-mail encrypted 4tb drives with your 2-4 closest friends every 6 months?
  • have you ever looked at secure scuttlebutt protocol, gossip protocols in general
  • what is observability, how is it not just syslog or splunk with more steps?
    • is there an open source observability
  • literally just talk into the camera uncut for an hour about software licensing
  • is github bad, actually? is microsoft good? also microsoft owns linkedin
    • copilot, also what’s that aws ai thing
    • github alternatives, sourcehut, sourceforge, gitlab, gitea, bitbucket, etc
  • why is c++ like this
  • why is hyperthreading good, why is it bad
  • what is a cuda core anyways, how is that compared in intel or amd world? can i do science with amd? more details on vgpu and pci passthrough
3 Likes

I realize that i know nothing after i’ve read all the answers…

9 Likes

You’ll learn pretty quick around here. Welcome!

2 Likes

So Hey; Not a programmer. Apologies from a mostly windows only Luddite. So much of my time has just been trying to figure out / find the holy grail of stacks.

Been trying to learn the basics of rust. I want to create games on a JIT IR with GPU Compute, and been trying to devise how I would do that with the same level of “run-any where” as java. Looked at LLVM/SPIR-V/Vulkan/WGPU/WASM as vectors and the same problems come up. The driver walled garden.

So here is what I am asking for, how does someone set up Something like RDMA over loopback to stream something like vulkan/ash or any other buffer on your own browser essentially as an equivalent to Winnit?

Nothing against winnit, but the level of inter-op that I’m hoping comes to market is essentially host level Stadia on PlayStation V Using a pure Vulkan IR that can run on anything.

I’ve only recently seen your guy’s stuff on the VM side of things, and done some reading on TCP/Offload But given this stuff is usually implemented over enterprise networks and the level of driver access that usually hampers this kind of thing. I don’t know what standards have been fleshed out with the reading I have done so far and the situation between software and KVM/Hyper-V implementation. Planning to see what isn’t already there in the STD/Net rust library. Now that I am catching up with what standards are actually being worked on.

I’ve started reviewing the technical stuff posted on youtube for looking glass. Since I didn’t mention earlier.

1 Like
  1. something i am interested in doing myself is trying to install bedrock linux ontop of void linux. Because void avoids systemd, which is good for simpler base system. And that is a good fit for embedded devices with fewer resources. However I still want access to the AUR… so bedrock seems like it might be an interesting experiment.

Only thing is that bedrock does not work with zfs aparrently. So it’s more for smaller devices like rpi, or home routers etc.

  1. Might be kindda cool experiment to create an alternative to pfsense, but on linux using containers. And so then the docker container networking would manage each service running in docker on the router. So that the different components were kept more securely compartmentalized. And better isolated away from each other. And of course so the main router platform framework can then use other regular services as optional 3rd party addon projects. Thereby avoiding having so much of a burden of maintainenance of those pkgs. I.e. not needing their own explicit GUI pages and options in a centralized pfsence config etc… just ordinary seperate docker containers. (or as kubernetes services… but that would probably require a lot more complexity and service overheads).
1 Like

Virtualization, APU/GPU setup passthrough (if it even is remotely possible), uh, nothing really is coming to my mind at 11pm right now… Mby I’ll drop something else tomorrow.

2 Likes

If you are running Void, you don’t need the AUR. Void has such a great build system using xbps-src. And besides, the main repo contains most stuff you would need in the first place. And if there’s something you can’t find, there’s Flatpak too, although I avoid them.

Void is still quite heavy compared to something like Alpine. I mean, the resource consumption is small, it is comparable to Alpine, but Alpine has 1. a smaller footprint and 2. can run from RAM (frugal install) out of just around 90 MB of RAM total.

Not sure if that’s a good idea. I mean, nothing really beats pf, really. But most people who do Linux routers just go with the CLI on Ubuntu or Debian, something RHEL family. Alpine is another good distro for Linux routers, because how small it is and because it is a distro built with security in mind. And you probably want the packages to be filtered as close to baremetal as possible, using docker means just a layer of abstractization between the container and the kernel, meaning potentially less performance. And then, there’s OpenWRT.

Not to discourage you though. You can create the GUI in a container and control the host that way.

I agree that focusing on “why” is often more important than on the “how”.

“Why does this not work / has stopped working?!” being the first one that comes to mind (troubleshooting errors and finding them first), but also the ominous “why would I need that” which seems to be missing from lots of How-tos could help.

I understand that a big part of the audience are pros who work with scientific computing, devops… server admins of all kinds who want to discuss kubernetes and IOMMU groups, but take the case you mentioned in the video: A gamer that decided to use Manjaro to play steam games with proton and has a browser with the L1njuws running in the background.
What would their “whys” and “hows” be?

Ok, just finished watching the released video. I guess I get what the plan is for the fireside chat now.

@wendell, I’ll keep this short, so I don’t waste your time. If you haven’t glanced my big thread with ARM SBCs adventure, tl;dr the one thing that I am struggling with is getting ZFS to run on ARM on any Linux distro I tried. The dkms module doesn’t want to build. Only tried Alpine, Armbian and Void.

If anything is to be accomplished Linux community needs to unite a little bit. First task consolidating brainpower and resources. For example:

There is no proper Window organizer for Gnome. Something as simple as FancyZones in PowerToys. I have spent 2 hours looking for one and tried 12 different half-made projects that are all bugged to Hell and back and none of them offers a quarter of the functionality of FancyZones. All of them combined maybe a half. At least 20 people worked on 12 versions of the same software and they did nothing.
Imagine those 20 people working on the same project. It would be glorious.

Why are there more than 300 Linux distros? Not even the big 3 are fully stable and featured with proper tools. You can’t even snap 3 windows in Gnome side by side. Imagine all those devs working on 5 proper strong distos. Everyone is typing the same code over and over, a no one has managed to finish the project. Let us unite and write a component each.

I was hoping that Valve’s SteamOS would be the go-to distro that people are focused on. But I don’t see it happening anytime soon.

Wendell if you can ask the big names in interviews what can be done. And you on this forum and YT channels can raise awareness of this problem.

1 Like

Here’s a subject,

The extended family/“clan” NAS arrangement. Say I’ve a family member that’s computer needs have come to the point that they really need a home server, and I’ve convinced them of it. My NAS has all the bells and whistles, and i want to set up mine as offsite backup for a family/“clan” member. Do they need all the bells and whistles, or can theirs be simple? How do we make sure that sync is behaving if the family members computer literacy is frighteningly low? How do i go about making sure I still have space for my needs? Are there services that make since not to duplicate between myself and him?

2 Likes

As mentioned in the video, if you have servers kicking around I would be interested in some sort of “do it yourself” Data Science stack. The data lifecycle includes everything from data engineering, pipelines, notebooks for exploratory analysis, HPC needs, distributed storage, databases, and more. Much of this is available as services in the “cloud”, however, it certainly would be fun to roll-your-own.

2 Likes

@dremcat4 Netgate has already made a replacement for Pfsense. They call it Tnsr. It is a router on Linux without a firewall. I would like to see a web interface built into Tnsr just like they did with Pfsense, or maybe take Pfsense and replace Free BSB with Linux.

Proper tree organization of files would help a lot.
People tend to pile everything in one place and spend a lot of time looking for it.

1 Like