So Intel Scalable is finally getting old enough that it’s becoming viable for me.
Can someone please explain this?
Is it just UPI? It’s worth that much?!
Even more bizarre, the Gold one is more expensive now in the used market.
So Intel Scalable is finally getting old enough that it’s becoming viable for me.
Can someone please explain this?
Is it just UPI? It’s worth that much?!
Even more bizarre, the Gold one is more expensive now in the used market.
I’ll be at the DC most of next week to get my colo into tip-top shape after several hectic and rushed services over the summer. It will be an intense 9 days (week plus both weekends), but I am looking forward to the satisfaction afterwards.
Primary goals are cable labeling and thorough documentation I wasn’t able to do previously, and installing ceph mon and mds nodes which will kick off the transition of general storage from single point of failure (zfs) to ceph cluster (note that backup/archive will remain zfs).
I will also have infrastructure ready for nvme HA database, kubernetes, GPU compute, macos virtualization and various other things that I have not been able to do previously.
Looking to the future, hosts will all be 10GB/25Gb ready (NIC and cable but not yet switch).
I am excited and nervous.
Healthy diet should help. So remember to eat pizza
Ok so finalizing the rack layout, from the top down, I have:
5x 1U compute
2x 1U gateway
2x 1U SFP+ switch
2x 1U gigabit switch
6x 2U ceph osd (only have 3 currently)
5x 1U ceph mon (only have 3 currently)
2x 1U ceph mds (have one currently and will use VM for failover until I have the 2nd)
1x 4U live backup
1x 1U tape autoloader (don’t have this yet, but space is set aside for it)
2x 4U power cycling cold storage (have one currently)
Laying all of this out in such a way that shorter devices create cable management space opposite of where they are mounted. For instance, the switches are mounted in the back of the rack. The space in front serves as cable management for the gateways which are mounted in the front (and have front-facing I/O). Likewise, ceph mds and mon nodes are short so they create cable management space for the osd nodes which barely fit in the rack.
Here is an idea of how it looks (left is back of rack, right is front of rack).
Note that the switches listed there are wishlist at this point, I am using Mikrotiks until SONiC hardware becomes more affordable.
So I’ve been thinking of how best to back up zfs to ceph. I had assumed that zfs on top of ceph would be a bad experience, but it might actually be serviceable.
I am seeing ~60mb/s average sending to an Ubuntu VM with a zpool created on a ceph rbd-backed virtual disk. While that’s not amazing, my use case is send/recv over WAN, so in many cases that wouldn’t even be the bottleneck.
I am using a very vanilla test pool in ceph with 6 HDDs across 3 hosts. I have write back cache enabled for the virtual disk. Nothing else was adjusted.
Now I have to decide what is least cringe:
you using cluster and public net on the ceph?
yeah
I just saw that you can export NFS shares of Ceph S3, so that makes rclone pretty appealing actually.
Picked up a Supermicro 1028GR-TRT on ebay for $450. The listing was a little sus. No memory or cpu listed. Says it comes with 2 1.2TB and a 128GB Intel S3510 SSDs. Don’t really care but who can have too many SSDs right? I mainly want this model because it has onboard 10Gb which is much less common than the 1028GR-TR which is gigabit.
Turns out it came with 4x 8GB memory and 2x E5-2690v3s and a K80. SSDs have 4 year+ power on time but basically no wear at all. The 1.2TB drives are at 100% and the 128GB is at 97%.
No one thing is amazing, but overall it was a nice score. I’ll keep the K80 to mess around with until I have additional GPU need. Currently just using a radeon for hackintosh vm.
I ordered this chassis. In a lot of ways, it’s what I’ve been looking for. Kinda wish I had found it before ordering a bunch of CS-515’s earlier this month, but I did already have one, so it kind of made sense to match the others.
This will be the chassis for my ceph mds node. Note that if the parts list is accurate, it does not come with the IO plate that is pictured but with a more typical one with only 2 onboard NICs (plus IPMI). I am hoping this is the case, but if not, I can always substitute it, but unfortunately, I don’t think they sell IO shields individually in black
It took way too long to figure out automatic Debian preseed installation. Still need to add the partman recipe (just using auto right now), but at least I’ve gotten to the point of hands-off OS deployment.
Why not try something similar to what Hardkernel and the Pi Foundation are doing and netboot a vmlinuz and initramfs that you made, that will launch a debian install script if the server hasn’t been commissioned yet?
Their versions is that you boot the vmlinuz and initramfs from a http server and then manually install, but if you have netboot in your infrastructure, just make your own image that runs an auto-install script. It’s not much different than normal netbooting concepts, just that instead of booting the OS, you just boot into a live environment which launches the auto-installer…
The logic of “has this server been commissioned yet” is interesting, but I don’t know if this would avoid the issue I had which was that preseeding Debian is weird and poorly documented.
If that’s a problem (never dealt with that myself), then why bother with preseed instead of doing a chroot install by just bootstrapping the bare minimum OS files and replacing all the things that vary (hostname, machine-id, ssh host keys etc.) instead? Almost like cloning a minimum system and retroactively changing host-specific files.
Now that I do have it working, I’m going to stick with it.
Part of the reason I want to do it this way is that it is the same formula for many other operating systems. You netboot via PXE and then pull some structured text from somewhere to configure the system (cloud-init, kickstart, preseed… openbsd has a similar thing too). For me it’s cleaner to approach them all the same way. Let’s me write portable Ansible playbooks and roles.
If you already fixed it, or if you have a similar config for other OSes, then it makes sense to stick with it, I guess. I’m the jank janitor that will just use whatever is the fastest and improve on it later (if needed). I’m also kinda used to chroot installs in general (from void and nixos), so it’s partly why I’m biased towards this, instead of using normal installers.
Even more so because I used to migrate systems a few years ago by formatting a fresh disk, mounting the partitions, copying data over, then chroot reinstall (mostly just repairing grub, as it didn’t exist on the system and modifying the fstab, nothing fancy, but I’m somewhat of a chroot aficionado myself).
I will say that the exercise of going through and meticulously scripting a complex Arch install was critical to my understanding of Linux. Even if I don’t go that route for production servers, I understand the appeal. Especially for understanding storage concepts from basic partitioning to mdraid, luks, etc, the chroot installation is really a right of passage.
I’m going to bed
IT TOOK ALL DAY
But I do finally have an unattended Debian install which can write back over itself including lvm on top of mdraid with /var/log and other things on separate logical volumes. Most special is that it will install the OS onto drives identified by a substring in /dev/disk/by-id/
. So you can use a drive model or manufacturer instead of trying to figure out sda vs sdb, etc if you have multiple drives.
That said, it will indiscriminately wreck mdraids and possibly things lvm-related, so I would definitely not use this blindly… In my case, mdraid and/or lvm will always mean Linux OS storage, so I’m ok with it.
Here is my absolutely unhinged partman/early_command
:
# Storage
d-i partman/early_command string \
echo PARTMAN >> /var/log/early_command; \
os_drive_id='MTFDDAK480TCB'; \
os_parts="$( printf '%s ' /dev/disk/by-id/*"${os_drive_id}"*-part* )"; \
os_drives="$( printf '%s ' $( ls /dev/disk/by-id/*"${os_drive_id}"* | grep -v -- '-part' ) )"; \
real_os_parts="$( printf '%s ' $(realpath $os_parts) )"; \
real_os_drives="$( printf '%s ' $(realpath $os_drives) )"; \
echo "Clearing swap..." >> /var/log/early_command; \
swapoff -a; \
echo "Starting mdraid arrays..." >> /var/log/early_command; \
mdadm --assemble --scan || true; \
echo "Starting LVM volumes..." >> /var/log/early_command; \
pvscan --all || true; \
vgscan --all || true; \
lvscan --all || true; \
echo "Clearing lvm..." >> /var/log/early_command; \
lvchange --yes --activate 'n' $( lvs --noheadings --rows --options 'lv_path' ) || true; \
vgremove --yes --force pve || true; \
pvremove --yes --force --force $real_os_parts || true; \
echo "Clearing mdraid..." >> /var/log/early_command; \
for md in /dev/md?*; do \
echo "Clearing ${md}..." >> /var/log/early_command; \
pvremove --yes --force --force "$md" || true; \
umount -l "$md" || true; \
echo idle > "/sys/block/${md##*/}/md/sync_action" || true; \
echo none > "/sys/block/${md##*/}/md/resync_start" || true; \
mdadm --stop "$md" || true; \
mdadm --remove "$md" || true; \
done; \
for part in $real_os_parts; do \
echo "Clearing ${part}..." >> /var/log/early_command; \
mdadm --misc --force --zero-superblock "$part" || true; \
dd if=/dev/zero of="$part" bs=1M count=1; \
dd if=/dev/zero of="$part" bs=1M count=1 seek=$(( $( blockdev --getsz /dev/"$part" ) - 2048 )); \
done; \
for drive in $real_os_drives; do \
echo "Clearing ${drive}..." >> /var/log/early_command; \
dd if=/dev/zero of="$drive" bs=1M count=1; \
sync; \
while true; do \
blockdev --rereadpt "$drive" || true; \
partmap "$drive" || break; \
done; \
done; \
debconf-set partman-auto/disk "$real_os_drives"; \
boot_raid_parts=$( for drive in $real_os_drives; do \
printf '%s3 ' "$drive"; \
done | sed -e 's/ \(.\)/#\1/g' ); \
swap_raid_parts=$( for drive in $real_os_drives; do \
printf '%s4 ' "$drive"; \
done | sed -e 's/ \(.\)/#\1/g' ); \
root_raid_parts=$( for drive in $real_os_drives; do \
printf '%s5 ' "$drive"; \
done | sed -e 's/ \(.\)/#\1/g' ); \
raid_recipe="1 4 0 ext4 /boot ${boot_raid_parts} . 10 4 0 free - ${swap_raid_parts} . 10 4 0 lvm - ${root_raid_parts} ."; \
debconf-set partman-auto-raid/recipe "$raid_recipe"; \
debconf-set grub-installer/bootdev "$real_os_drives"; \
. /usr/share/debconf/confmodule; \
db_fset grub-installer/bootdev seen true; \
echo END >> /var/log/early_command
Isn’t it time to ditch LVM over ZFS?
If you can show me an unattended Debian network installation with ZFS root, that will remain stable for the lifecycle of the major Debian version, I’ll happily use it.