Probably won’t matter, but for reference what models of SSDs did you get and how are they connected? It’s known that sometimes a particular samsung evo drive will generate a few errors every so often, but nothing like this and certainly not on both drives at the same time.
What’s strange is it’s both drives, which can point more to something like a sata controller being bad/overheating. Bad cables (the most common error generating problem) will cause errors only for the drive it’s connected to.
However it’s also very possible you hit some kind of software bug and ZFS is getting confused.
I’d search the GitHub issues, as well as take this to the ZFS mailing list and the subreddit since you’re more likely to find someone who can help figure out what’s going on in regards to the software internals.
Beyond that, how I’d personally approach this is basically clear it all, completely make a fresh pool from scratch, and then transfer the data from the backup pool, and if that doesn’t work possibly even using rsync (with byte for byte verification) instead of sending/receiving as snapshot to make sure everything is completely rewritten.
I am looking at adding a metadata special device to my pool while I am rebuilding my NAS. Does zdb -Lbbbs *pool name* only provide a block size histogram in the output on certain platforms? Because my NAS is currently on Ubuntu 20.04 and the output of that command does not contain a block size histogram.
EDIT: Nevermind, it looks like this feature was added in a later version of ZFS according to the merge request and was not backported to the 0.8 version that 20.04 is still on. I will need to wait until I have my new hardware installed with a newer OS to run these commands.
I found this which might explain why you strangely have no read errors: Topicbox
Christian Quest
Mar 25 (9 months ago)
I got something very similar with one of my pool a few months ago, millions of checksum errors, no read errors.
It was caused by a cache SSD that was silently dying. ZFS was not reporting errors from the cache SSD, but from the vdev HDDs. I removed the cache SSD from the pool, did a scrub and all checksum problems disappeared.
Your pool doesn’t use cache, but has a special vdev. I don’t know if we are is a more or less similar situation, and I have no clue how to get away from your problem because data on the special vdev are not duplicated anywhere else if I’m not wrong.
The SSDs used for the special vdev are 2x a model WDC WDS120G2G0A-.
They are both connected to the last two SATA ports on the motherboard. X570 Phantom Gaming 4. The only “off” thing I went about connecting the drives is their power connectors. They are fed through a 4-pin Molex connector through an adapter that splits in 2 SATA power connectors.
I see you added an additional reply. Thanks a lot for helping out so much!
I can look into detaching the cache device from the pool, to see if that helps at all. I’t not my expectation the drive is dying, because it has barely any write cycles on it, but who knows.
The result of catting the zfs_no_scrub_io is indeed 0, as expected.
Before detaching the cache drive to see if that solves the problem, I’ll first have to try to better reproduce the error rates, as they seem to have stabilized after writing my initial post. I’m not sure if this is something that is due to the low activity on the pool, or not having run the proper scrubs since, but I am currently looking for a way to properly test that detaching the cache device solves my problem.
Given the state of things, I will give it some time for now. Thanks a lot for all the help, you really gave me the courage and motivation to continue. I’ll post back with any updates, good or bad.
The errors seem to increase still. For a bit more insight, I’m sharing the files that are marked as an error:
errors: Permanent errors have been detected in the following files:
/media/anime/Pokémon/Season 3/Pokémon - S03E06 - Flower Power SDTV.mkv
/var/lib/mysql/nextcloud/oc_activity_mq.ibd
/var/lib/mysql/nextcloud/oc_twofactor_providers.ibd
/var/lib/mysql/undo_001
main/applications/docker-compose:<0x0>
/srv/docker-compose/backblaze/wine/drive_c/ProgramData/Backblaze/bzdata/bzlogs/bzbui/bzbui31.log
/srv/docker-compose/bazarr/config/log/bazarr.log.2022-12-29
/srv/docker-compose/plex/config/Library/Application Support/Plex Media Server/Logs/Plex Crash Uploader.log
/srv/docker-compose/backblaze/wine/drive_c/ProgramData/Backblaze/bzdata/bzlogs/bzbui/bzbui02.log
/srv/docker-compose/backblaze/wine/drive_c/ProgramData/Backblaze/bzdata/bzlogs/bzbui/bzbui30.log
/srv/docker-compose/backblaze/wine/drive_c/ProgramData/Backblaze/bzdata/bzlogs/bzbui/bzbui29.log
/srv/docker-compose/backblaze/wine/drive_c/ProgramData/Backblaze/bzdata/bzfilelists/bigfilelist.dat
These mostly seem to be highly active files.
Log files
Database files
A recently transcoded file
Some file has remained in the list of errors, without a proper name, after having manually deleted it when it showed up in the list before and I deemed it unimportant.
I will try removing the cache drive for a couple of days and see if that changes anything.
$ sudo zpool status main
pool: main
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 45K in 00:08:33 with 19 errors on Mon Jan 2 23:10:42 2023
config:
NAME STATE READ WRITE CKSUM
main ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sdb ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sda ONLINE 0 0 0
special
mirror-1 ONLINE 0 0 0
sdi ONLINE 0 0 355
sdj ONLINE 0 0 355
spares
sdh AVAIL
sde AVAIL
errors: 12 data errors, use '-v' for a list
Do the drives have the latest firmware? A 4 pin molex cable can handle 2 SDD’s just fine, the only potential issue being the molded ones are known to sometimes short out/catch on fire.
I’m still leaning toward a software bug being the problem here, given that both drives have identical numbers of checksum errors. If they had differing amounts of errors, then that’d point more to a pure hardware issue.
These mostly seem to be highly active files.
That’s good to know.
I basically can’t realistically give any further actual help, but I’m definitely interested in what you find. I should strongly suggest making a github issue and/or a mailing list post, as 1# you’re more likely to talk to someone with more in depth knowledge, and 2# it’ll put your issue on the radar if it is a software bug, so it can get fixed.
Unless I’m looking at the wrong place, I don’t think that is happening, currenly. I do have an add-in card for the additional SATA slots, but I’m not seeing AER errors.
$ sudo dmesg | grep -i AER
[ 0.728018] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME AER PCIeCapability LTR DPC]
[ 1.321877] pcieport 0000:00:01.2: AER: enabled with IRQ 28
[ 1.322065] pcieport 0000:00:01.3: AER: enabled with IRQ 29
[ 1.322221] pcieport 0000:00:03.1: AER: enabled with IRQ 30
[ 1.322439] pcieport 0000:00:07.1: AER: enabled with IRQ 32
[ 1.322598] pcieport 0000:00:08.1: AER: enabled with IRQ 33
I’m especially intrigued by the high cpuio loads. I am running into some aberrant behavior with both my NVIDIA GPU getting evicted from the Docker container runtime, resulting in high CPU loads on transcode operations and Backblaze consuming lots of CPU resources, even if it has not had anything to backup in a while. Both are suspect.
I ran memtest86 and found no memory errors. I reseated cables to the drives and memory for good measure.
After having the cache detached, the CKSUM errors have not gone away and they are still mirrored across devices.
At this point I am choosing to bow out and thank you all for thinking along, but my SO overruled me in that we will simply rebuild the pool at this point, for both our sanity. If I had gotten a bit longer with this, I would have opened an issue, but as I won’t be reintroducing special devices in the new pool, it seems wrong to open the issue for a pool that can’t reproduce the problem.
I swapped around some duties here, making the previous cache device a log device and the special devices as cache devices. It might still end up showing hardware issues, but at least now I can safely detach devices from the pool.
What happens if my metadata device fills up? Is it possible?
Or does it write to the pool instead?
I’m guessing a bunch of node_modules/ directories filled this up. I excluded them in Robocopy, but /PURGE (or /MIR) doesn’t remove them if they’re excluded, so they’re artificially bloating my metadata drives right now.
I have 1TB metadata on my other NAS, but I thought 200+ gigs would be plenty here. It looks like I might need to add some mirrors.
ZFS starts writing to the pool as it would normally do without a special vdev when it’s 75% full, leaving a 15% reserve for general performance reasons.
You can the reserve percentage in: /etc/modprobe.d/zfs.conf by adding zfs_special_class_metadata_reserve_pct=10% for example, which should still be perfectly fine. The larger the drives are, the less reserve is needed to maintain sanity in free space availability that allows ZFS to function quickly.
Calculating actual space in ZFS is a little bit complicated even before metadata is involved, so I refuse to comment on what reported numbers actually mean and why napkin math doesn’t give entirely expected results. Because fuck if I know.
Yes it is! I can find the exact way to do that in a bit when I’m not on the phone, but essentially you set something the datasets you don’t want included to “none” or something like that.
I am not finding an obvious way to exclude a dataset from utilizing a special vdev. I take it back I may have gotten confused with some other things like setting primarycache=none|metadata|all on a dataset, or excluding dedupe data by changing the zfs_ddt_data_is_special module property (though this is system wide).
Hello. I have a basic understanding of what is going on here, but I’m not grasping some of it. My situation is that I don’t even have a ZFS pool yet, but I am looking to create one with new drives at some point (my RAID array is full). If I create a histogram of the files of this array, is there any way I can calculate how large of SSDs I would need to store the metadata (to figure out things before even migrating to ZFS)? I wouldn’t be storing at small blocks files on it, just metadata. My mind is going all over the place thinking about this.
It might be better to see real data. I don’t know how to get the file stats, but I have a mix of games, long video streams, short videos, pictures, and code projects as well as documents, music compositions, you name it.
All that data is currently 8.22TB. This zpool has 204GB of metadata. Unless you have 8TB of super tiny files, you’re probably fine. The quickest way to test is to copy your files. I’m assuming you have an extra copy of your data to play with.
I have an all SSD-array in this NAS. I’m only using metadata vdevs because I put four Optane drives in there as 2 mirrors. Those drives allow very low-latency access; otherwise, I wouldn’t have bothered. In my offsite HDD array, I used regular NAND flash SSDs.