Sysadmin Mega Thread

PhaseLockedLoop · July 6, 2023, 11:04pm

Red hat is just a sympton. I fear we are headed right back into the 90s except not locked into hardware platforms. Rather software

Present value function checks out

xzpfzxds · July 6, 2023, 11:15pm

What does the lz4 package install which zfs uses? I can’t see how a kernel driver can use anything from a userland package. (I’ve never had lz4 package installed on any OS/distro that runs on zfs).

oO.o · July 6, 2023, 11:27pm

Gee, I wonder who would be best positioned to sue a company as big as IBM

Debian and weirdly Suse are standing out to me lately as the least corrupted open source platforms. I don’t know much about Suse given that they are rarely used in the US, but the marketing around things like Harvester is very unequivocal about it being open in a way that hasn’t been seen from Red Hat or Canonical for as long as I can remember.

It was a false flag and you’re probably right.

PhaseLockedLoop · July 6, 2023, 11:28pm

Meritocracy

Literally hitler

Literally stalin

Literally big tech

Lol

PhaseLockedLoop · July 6, 2023, 11:29pm

Not sure. Your probably right. In which case it suggests something us wrong with compression in the debian kernel

Objectively worse lol

PhaseLockedLoop · July 6, 2023, 11:31pm

Okay shitty troll reply asside. I think arch despite its elitism isnt very corrupt neither is debian. But this isnt intention corruption. Its more unintended tyranny. I might expand on this later when I make a thread on DoH. Tired of its issues being dismissed

oO.o · July 6, 2023, 11:37pm

Literally chaos /s

PhaseLockedLoop · July 6, 2023, 11:46pm

Look ancom > ancap /s

Dynamic_Gravity · July 7, 2023, 12:35am

Finally, a reference I get!

Redhat really shat on a beehive with that decision lol.

oO.o · July 7, 2023, 2:14am

If Oracle saves the RHEL ecosystem from itself, I will eat my shoe (gleefully).

PhaseLockedLoop · July 7, 2023, 3:35am

modprobe: FATAL: Module zfs not found in directory /lib/modules/5.10.0-23-amd64
Created symlink /etc/systemd/system/zfs-import.target.wants/zfs-import-cache.service → /lib/systemd/system/zfs-import-cache.service.
Created symlink /etc/systemd/system/zfs.target.wants/zfs-import.target → /lib/systemd/system/zfs-import.target.
Created symlink /etc/systemd/system/zfs-mount.service.wants/zfs-load-module.service → /lib/systemd/system/zfs-load-module.service.
Created symlink /etc/systemd/system/zfs.target.wants/zfs-load-module.service → /lib/systemd/system/zfs-load-module.service.
Created symlink /etc/systemd/system/zfs.target.wants/zfs-mount.service → /lib/systemd/system/zfs-mount.service.
Created symlink /etc/systemd/system/zfs.target.wants/zfs-share.service → /lib/systemd/system/zfs-share.service.
Created symlink /etc/systemd/system/zfs-volumes.target.wants/zfs-volume-wait.service → /lib/systemd/system/zfs-volume-wait.service.
Created symlink /etc/systemd/system/zfs.target.wants/zfs-volumes.target → /lib/systemd/system/zfs-volumes.target.
Created symlink /etc/systemd/system/multi-user.target.wants/zfs.target → /lib/systemd/system/zfs.target.
zfs-import-scan.service is a disabled or a static unit, not starting it.
Job for zfs-load-module.service failed.
See "systemctl status zfs-load-module.service" and "journalctl -xe" for details.
A dependency job for zfs-import-cache.service failed. See 'journalctl -xe' for details.

PAIN

Working on this slowly. Figuring out why services are failing. Likely because DKMS didnt install the darn module as it should have

PhaseLockedLoop · July 7, 2023, 3:37am

root@odin:~# sudo systemctl list-units --failed
  UNIT                         LOAD   ACTIVE SUB    DESCRIPTION
● nvidia-persistenced.service  loaded failed failed NVIDIA Persistence Daemon
● systemd-modules-load.service loaded failed failed Load Kernel Modules

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
2 loaded units listed.
root@odin:~#

Now that doesnt make very much sense (post reboot)

ugh

● systemd-modules-load.service - Load Kernel Modules
     Loaded: loaded (/lib/systemd/system/systemd-modules-load.service; static)
     Active: failed (Result: exit-code) since Thu 2023-07-06 21:36:14 MDT; 2min 48s ago
       Docs: man:systemd-modules-load.service(8)
             man:modules-load.d(5)
   Main PID: 740 (code=exited, status=1/FAILURE)
        CPU: 7ms

Jul 06 21:36:14 odin systemd-modules-load[744]: modprobe: ERROR: could not insert 'nvidia': Invalid argument
Jul 06 21:36:14 odin systemd-modules-load[747]: modprobe: FATAL: Module nvidia-current-modeset not found in directory /lib/modules/5.10.0-23-amd64
Jul 06 21:36:14 odin systemd-modules-load[742]: modprobe: ERROR: ../libkmod/libkmod-module.c:990 command_do() Error running install command 'modprobe nvidia ; modprobe -i nvidia-current-modeset ' for module nvidia_modeset: retcode 1
Jul 06 21:36:14 odin systemd-modules-load[742]: modprobe: ERROR: could not insert 'nvidia_modeset': Invalid argument
Jul 06 21:36:14 odin systemd-modules-load[748]: modprobe: FATAL: Module nvidia-current-drm not found in directory /lib/modules/5.10.0-23-amd64
Jul 06 21:36:14 odin systemd-modules-load[740]: Error running install command 'modprobe nvidia-modeset ; modprobe -i nvidia-current-drm ' for module nvidia_drm: retcode 1
Jul 06 21:36:14 odin systemd-modules-load[740]: Failed to insert module 'nvidia_drm': Invalid argument
Jul 06 21:36:14 odin systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
Jul 06 21:36:14 odin systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
Jul 06 21:36:14 odin systemd[1]: Failed to start Load Kernel Modules.

Seems the apt install nvidia-driver command selected the legacy driver.

apt purge *nvidia* && apt install nvidia-driver

lets try again

PhaseLockedLoop · July 7, 2023, 3:45am

Ahhh thats better

root@odin:~# sudo systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.
root@odin:~# sudo systemctl status
● odin
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Thu 2023-07-06 21:44:10 MDT; 39s ago

NOW ZFS

PhaseLockedLoop · July 7, 2023, 3:57am

AYYYYY @oO.o no kernel panic on zio write. EXCELLENT

Got a few tweaks to make before I start doing any operations

Now lets hope this takes:

root@odin:~# nvim /etc/modprobe.d/zfs.conf 
root@odin:~# cat /etc/modprobe.d/zfs.conf 
# Set Max ARC size => 24GB == 25769803776 Bytes
options zfs zfs_arc_max=25769803776
 
# Set Min ARC size => 4GB == 4294967296 Bytes
options zfs zfs_arc_min=4294967296
root@odin:~# sudo update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-5.10.0-23-amd64
update-initramfs: Generating /boot/initrd.img-5.10.0-22-amd64
root@odin:~#

Ive realized based on the size of my pool without dedup I shouldnt need more than this amount and this gives about 20% headroom

PhaseLockedLoop · July 7, 2023, 4:17am

------------------------------------------------------------------------
ZFS Subsystem Report                            Thu Jul 06 22:16:56 2023
Linux 5.10.0-23-amd64                                    2.0.3-9+deb11u1
Machine: odin (x86_64)                                   2.0.3-9+deb11u1

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                     1.3 %  308.0 MiB
        Target size (adaptive):                        16.7 %    4.0 GiB
        Min size (hard limit):                         16.7 %    4.0 GiB
        Max size (high water):                            6:1   24.0 GiB
        Most Frequently Used (MFU) cache size:         13.7 %    4.2 MiB
        Most Recently Used (MRU) cache size:           86.3 %   26.4 MiB
        Metadata cache size (hard limit):              75.0 %   18.0 GiB
        Metadata cache size (current):                  1.7 %  307.6 MiB
        Dnode cache size (hard limit):                 10.0 %    1.8 GiB
        Dnode cache size (current):                     0.1 %    1.7 MiB

Success now for the moment of truth. Sending a zfs dataset

This did slow stuff down a bit and the SSD cache is getting thrashed a bit more

root@odin:/mnt/OnePoint21GigaWatts# zfs get sync OnePoint21GigaWatts
NAME                 PROPERTY  VALUE     SOURCE
OnePoint21GigaWatts  sync      always    local

BUT Honestly… I dont care. Im tired of not having that little extra bit of safety latency be damned

I really should invest in an SLC based SLOG device

hmmmm

+4x

@Dynamic_Gravity @mutation666 Does ZFS require the SLOG/ZIL to have the same level of redundancy as the pool or two mirrored enough?

I know Wendell would say use Optane but tbch out of my budget and also not enough pcie lanes

Current lane accrument
X4 nvme
X16 3050 Ti

I believe he 3900x only has 24. With 4 extra that you dont see going to the SB

I wonder if I could x8 the card and then do an nvme to pcie expansion . Then I could get the optane in a 2x config and mirror them… Probably would yield ungodly performance benefits if I chose to keep synchronous writes global

mutation666 · July 7, 2023, 11:12am

No, but could lose stuff if fails and there is a power outage I believe. I also believe its only sync writes it makes faster so if you dont need that its no big gain. I have never bothered to run ZIL or SLOG on seperate disks, but I also dont use my big storage for active data I use nvme for that stuff (high endurance P4500 / Optane depending on use)

What mobo do you use? Might be able to split the GPU as I doubt 3050ti = needs more then 8 lanes and if it supports bifurcation could maybe add 2 more nvme without need expensive cards.

oO.o · July 7, 2023, 3:25pm

SLOG mirror is the way, and yes it only only affects sync writes. If your pool is all async, I think it does literally nothing. SLOG needs very little space. I think you can increase the usage with a sysctl, but 32GB optane would be probably be fine in a stock config and those are dirt cheap.

Dynamic_Gravity · July 7, 2023, 3:31pm

Basically only use a dedicated Slog if you are running lots of virtual machines or anything that calls sync() a lot.

I forget if iscsi does that or not for block storage.

Also, if you have a very low dirty ratio for disk flushing (databases).

oO.o · July 7, 2023, 3:39pm

Related question: I have 2 systems running 24 disk striped mirrors. I have 2 WD SS530s in there that are currently doing nothing. I was planning to have them either run as dedicated metadata or as a separate pool for database. I have never run dedicated metadata before. Is 400GB stupid overkill for metadata?

PhaseLockedLoop · July 7, 2023, 6:09pm

Well ive got plenty of those. I went ahead and ordered some stuff up

Ive read that if your whole pool uses sync () as in sync always it will slam the SLOG a lot harder but also youll still have the benefit of SSD speed. You just need to sysctl config a few things for zfs (or modprobe) to make it have a much larger SLOG usage

Setting sync=always would not increase performance (latency nor throughput) under any circumstances regardless of dedicated SLOG devices. It will increase data resiliency and power loss protection. Either SATA and NVME SSDs will yield performance benefits to dedicated slog in this affair.

With sync=standard, a certain portion of your writes operations (the async writes) will be recognized as completed once they’re in RAM. The other portion would have to wait far longer to hit the NVMe drive (the sync writes, which also get written to RAM). The latency you experience as an end user would be roughly the average between the really fast writes to RAM and the really slow writes to NVMe.

If you set sync=always, ALL the write operations (async and sync) have to go to the slow NVMe drive (while also being written to RAM), so you would see a huge increase in overall latency. However, all your write data would be safe in the event of a power loss or sudden system crash (assuming your NVMe drive was power safe, obviously).

So to get my feet wet and not blow my budget on a TR system with more pcie lanes I got the Intel DC S3700s a couple mirrored 200 GB ones and a marvel based SATA expansion card (pcie x1). All for 35 bucks . Gotta love ebay

For clarification, the write path for sync writes (or all writes if you’ve set sync=always) goes from the writing application to both RAM and the ZIL (which resides either on the pool or on the SLOG device) simultaneously. The writing application does not move on to its next instruction until the data is on both mediums; obviously the write to the ZIL takes much longer. After the write is done, the application proceeds. Every 5 seconds or so (more often if the transaction group fills up quickly), the data is flushed from RAM to its permanent home on the pool. The ONLY time data gets read from the ZIL is if there is a system crash and ZFS has to recover the last few seconds sync data.

One day I will do 3dXpoint but today is not that day. I may plan on it when I move to an all ssd array and dedicated storage server. It will be expensive. I want to mirrored vdevs which means I need an epyc system… Literally. I need to download the internet before the fascists and liberals destroy it since we seemingly cannot have normal sides to politics. When someone asks what was the internet like in the 00s and 10s. I want to be able to show them not just say it

ZIMs are quite useful for archiving pages on the internet that are mostly text and light on media. It compresses it really good. Wikipedia ALL OF IT can get compressed into a 100GB or so