Drives lie, and so ZFS will faithfully set the ashift incorrectly to 9 instead of the much more sane 12. This has and continues to cause many people problems that requires them to completely destroy the pool to fix. ZFS does have a list of known lying drives, but it’s very incomplete and will never stop being so.
For most use cases, setting ashift too big is a tiny performance problem that might show up in a benchmark. Setting it too small is a big performance problem that can be felt. Set HDD’s to 12 unless you know you need something else. SSD’s are usually better at 13, but also usually do fine at 12 because they are optimized for that. There can be issues removing things from the pool if there are different ashifts (like the pool set 12 and a special vdev set to 13) so the current advice is to keep it all the same.
His real issue is sync=always. This means ZFS will immediately commit the write, which takes time to complete. Non-sync writes are cached together and committed to disk the default of every 5 seconds. Normally an SLOG is used to vomit the writes to for safe keeping, allowing ZFS to cache the writes in ram like normal for performance reasons (though frankly most SSD’s are going to have disappointing performance). It looks like for some reason his SLOG is managing to choke on writes somehow. While ZFS does currently have ongoing performance issues with NVME drives and needs a big refactoring, I’m positive it’s something else. Frankly that “-part4” on the nvme seems suspicious, looks like he didn’t use the entire disk
Consider posting here, there are more experienced people there who work with ZFS professionally and may be able to see what’s going on.
You are moving large files, so an ashift of 12 is NOT your issue.
Try using only the base of the NVME, not the “-part” partitions
Also, are you using samba to move files? If so that’s another wrench thrown in and can have all manner of bizarre interactions, and you may test setting samba to do only sync writes instead of ZFS.