How much does parity calculations slow down the speed of a storage array?

I am using a LSI MegaRAID SAS 12 Gbps HBA and I have a RAID5 array with four HGST 6 TB SATA 6 Gbps 7200 rpm HDD on an Intel Core i7-4930K on an Asus P9X79WS-E motherboard with 64 GB of DDR3-1600 unbuffered, non-ECC RAM.

When I am copying data to the RAID5 array now, I am only able to write to it at around 30-40 MB/s or so (from a TrueNAS server).

How much does the parity calculation impact the write performance of the array, if at all?

small addition
My “client” system is a system that’s running CentOS 7.7.1908.

2 Likes

RAID (traditional hardware striping) and RAIDZ (a ZFS block parity scheme) are very different and should not be ever be mixed. If you are using TrueNAS, you should be using an HBA card and letting ZFS manage the disks directly. ZFS’s Mirrors and RAIDZ offer error correction, as well as the ability to instantly produce snapshots and the ability back up to other machines with Sending/Receiving datasets that acts like a sequential transfer and is MUCH faster than throwing around individual files.

Your LSI card can likely be flashed to “IT” firmware, converting it into an HBA.

RAID5 and RAIDZ1 don’t have much in the way of performance loss when it comes to sequential transfers though. What you are likely seeing is the effect of moving lots of small files, which is severely slowed down by what is basically random reads, and can easily drop down into low double digit MB/s.

5 Likes

Yup. This is the case on the TrueNAS “server” side of things.

What I was asking about was on the “client” side of the data transfer.

Thanks, but not going to do that. Not for my client system.

I’m currently copying two files, one that is 7714747458892 bytes 7-zip archive file, and the other one is another 7-zip archive file that’s 1072897014242 bytes.

The second file is getting copied at around 17.6 MB/s whilst the first file is getting copied (technically rsync’d) at around 40 MB/s over two SEPARATE GbE NIC interfaces from said TrueNAS server.

(The “client” also has dual onboard GbE NICs as well.)

I was really hoping and expecting that it would be able to set both of those files at close to the 100 MB/s line speed of the GbE NICs.

(The TrueNAS server also has four HGST 6 TB SATA 6 Gbps 7200 rpm HDDs in a raidz1 pool/array in a single vdev.)

Thanks.

1 Like

Alright good, as long as you don’t have ZFS on top of hardware raid, which I’ve seen happen occasionally.

Are you doing both transfers at the same time? If so that could potentially screw with performance, as the read side will be sent to switch back and forth between the two, with the same issue on the write side.

But other than that I don’t have anything else to strongly suspect.

1 Like

Yeah, I’ve seen some people get confused about that, but I’ve been using ZFS since the Solaris 10 6/06 days…so…

Originally yes, but even so, I would have expected both of the transfers to have been able to transfer the data faster than what it was able to do.

The one that was about 1 TB completed, so now it just transferring the 7 TB file, and even that is only going at anywhere between 55-90 MB/s.

The low end of the transfer is what’s somewhat surprising to me, but in either case, it’s still not hitting anywhere close to the 100 MB/s line speed that GbE should be capable of and I don’t really know where, why, nor how I would begin to try and figure out why it isn’t meeting performance expectations out of TrueNAS Core 12.0 U1.1.

Thanks.

1 Like

@Log can help you troubleshoot storage side of things.

I’ve had luck using “mbuffer” for my network copying before. To work around the bursty nature of both systems.

Basically it gained me some performance because it helped ensure system reading the file always had enough space in ram to put data it read, network always had enough data to send; other system always had somewhere to put the data it got from a network; and disks always had something to do when writing.

You could increase TCP buffers too, but with mbuffer I can carve out a gigabyte of ram easily for a once off large transfer.


Are you using rsync over ssh for the transfer?

3 Likes

What command are you using to send things over the network?

One potential thing is if there is a buffering issue. For my linux send/recv’s to my backup, I make use of mbuffer

edit risk beat me to it by seconds lol.

2 Likes

@Log

So, the CentOS client has mounted the TrueNAS share using NFS.

On the TrueNAS server side, the folder is set as a NFS export/shared folder.

On the CentOS client side, a mount point is created on the system, and mounted (effectively) with the command:
sudo mount -t nfs truenas:/mnt/share/share /truenas/share

(It’s defined in /etc/fstab, but it’s the same basic gist.)

So, no, it’s not over ssh, or at least it wasn’t explicitly defined as such, if rsync over ssh is the default behaviour between said TrueNAS server and said CentOS client. (Which I don’t think it is, but I also haven’t personally checked nor verified because I wouldn’t even know where to begin to look to check if that was the default case.)

user@centos 18tbraid5array$ rsync -avrsh --progress=info2 /truenas/share/file .

The transfer finally completed.

7.71T 100% 47.00MB/s 43:28:58 (xfr#1, to-chk=0/2) as reported by rsync when the job/task/transfer is done.

So almost two days just to transfer the file. yikes!

2 Likes

I don’t have much experience with NFS, but I believe it’s set by default to force all writes into becoming sync writes, which kills performance. In fact here’s a truenas forum thread today about a guy that seems to have gotten similar speeds with a large sequential transfer.

One thing some people do is use tar for transfers instead of rsync. Here’s a thread full of interesting command line examples

1 Like

Thank you.

So I took a look at the TrueNAS forum thread, and it looks like that they were trying to write TO the TrueNAS system (as opposed to reading FROM my TrueNAS server).

(They mention SLOG/ZIL.)

The usage of tar is an interesting one.

I might play around with that to see if I get any better performance, but maybe with a smaller file than my 7.something TB file (as I really don’t want to spend another two days, copying the file using tar).

Thank you.

1 Like

You could try avoiding NFS or ssh and just sending a file using socat


This will avoid any encryption and unnecessary syncing and mounting and what not, but it’s a bit more work.

1 Like

I think that this is always the balance in IT and network development operations, right?

Where you are constantly trying to balance between the time that you are going to spend “perfecting” the solution, whatever the case may be, or you are spending the time with an “imperfect” solution as a “penalty” of sorts.

it has been my general experience that if the probability distribution as a function of file size histogram is about even across the board, then tuning NFS parameters and/or network parameters doesn’t really make a whole lot of sense due to said net, even probability distribution.

But if say, you know that the probability distribution skews one way or another, then you can tune said NFS/network specifically for that type of workload.

And I get your point in terms of not using encryption (ssh) and/or mounting (NFS), but getting back to the original point of my question - I still don’t really quite fully understand why I wasn’t able to send the two files, over two separate GbE NICs, at the full line speed that it should have been capable of.

That’s the part that I don’t really quite understand why this happened.

Thanks.

1 Like

Sadly it’s usually a lot more complicated to “properly” debug the issue from first principles than it is to do “shotgun” debugging, or to apply some differential debugging or divide and conquer strategy.

Doing “proper debugging” requires skill, time and motivation.

Differential debugging would be e.g. you trying rsync over ssh or socat or samba mounts or SFTP or FTPS or FTP, and analyzing different aspects of copying in hopes of finding anomalous behavior.

Divide and conquer for performance debugging is strange but it could be something like you deciding to split the stack and you could put performance budgets on various components like test network using iperf3, and disk performance using fio. And fio over NFS (as opposed to copy over NFS). sun rpc performance somehow… not very fruitful in this case.


When it comes to random stuff to try, you could try mounting NFS async, it would mean that other hosts on the network might not have the same view of the file system as the client and state on the server would only be eventually consistent with what you see on the client. (edit: you could also increase the read size+write size to max of 64k if you haven’t already… I wish TrueNAS were to offer an fstab generator)

NFS is not really well maintained as a protocol outside of some large companies who use it for specific things and run custom versions of it for niche stuff.

3 Likes

Thank you.

Yeah, I mean, it’s not very often that I would move such large files in a “single shot” so to speak.

But when it does happen, being able to transfer said large file at speeds > 40-50 MB/s IS appreciated with the current and given hardware vs. pouring more money in for a once-in-a-while use case that seldomly occurs (but also does not have a non-zero probability of occurance neither).

1 Like

I may have missed it, but are you only copying from the Server to the client when you see this performance dip? As it was mentioned, NFS uses the syncall option by default. This means that as your data is inflight, if something is writing to the server, the reads and writes have to sync up to ensure that you are not received stall, old, or invalid data. You can set async which will give you a big performance boost, but the risk of data corruption is super high as everyone can re write the same file at the same time and that is bad.

There is also another option that essentially functions as a delayed write. It will keep changes in memory on the client side and only sync during the default time or when forced to. Same implications as async though.

You can also set the buffer size on the server side to set the maximum bytes that it will read and also write per transfer which can help with performance.

NFS is really simple but has a lot of customization. To get a really performant server, you really have to do some tuning not only on the server but also the client side. Either way, it is going to take a bit of effort to get the right balance.

1 Like

Yes, I am only copying from server to client when the performance dips occur.

(Sidebar/background: The client originally had four HGST 6 TB drives in a RAID0 array. I transferred the data onto the server so that I can reconfigure the HBA virtual drive from a RAID0 array to a RAID5 array. And I am only looking to do that because that way, I can write the data back to LTO-8 tapes and then I would be ok to purge the data. So, this transfer from server to client is after the array has been reconfigured as a RAID5 array.)

This is why I was asking my OP question because I didn’t know if the parity calculation on a four member RAID5 array could be enough to cause these performance dips during transfer and writing the data to said RAID5 array.

Thank you.

Yeah, I’m not looking to spend a lot of time because for the time being, this is really only to facilitate the data transfer from said server back to client only for this specific purpose/sequence of events that’s coming up.

(Since the data has been sent back to the client from the server, the client system has calculated the parity data using par2 (in preparation to write said parity data to tape as well) along with the SHA256 digests/checksums that I also use to make sure that the data has been written to tape properly and successfully.

The client is currently working on the second parity calculations and after that, it would be calculating the SHA256 checksums for the second parity data. Once all of that is completed, then I am writing all of that data onto tape.

1 Like

How is the array set up on the client side? Did your set a sector size and /stripe size on the LSI card? And what file system did you format the array to?
In case there are sector mis-alignment that might cause amplification?

is the Read speed from the server fast, but the write speed to the client slow?
Like copying a file from servers’ array, to client’s /dev/null?

Perhaps there is some tweaking to the client array that can help?

2 Likes

I don’t know if I can set the sector size on the LSI card, but the stripe size is either 64k or 128k. (I forget at the moment.)

XFS

Not sure, but unlikely as I probably used the defaults.

Varies.

zpool iostat shows that it can hit 100 MB/s whilst the data is being pulled over the network.

Never really benchmarked the system otherwise.

*edit
ran time -p dd if=/dev/urandom of=10Gfile bs=1024k count=10240 and was getting about 51 MB/s write speeds with that.

WIth time -p dd if=/dev/zero of=10Gfile2 bs=1024k count=10240, the system was getting 639 MB/s write speeds.

Read speeds for 10Gfile to /dev/null (from server to server) was 2168.4 MB/s.

Read speeds for 10Gfile2 to /dev/null (from server to server) was 670.6 MB/s.

No, the client, I know, can write at a maximum of 800 MB/s (tested), which, for four HDDs at about 200 MB/s write speed for each, in a RAID0 array, makes sense.

I haven’t thought about trying that.

It didn’t even dawn on me to try that.

Maybe, but my current hypothesis is that it points to the server side being the issues because I have other Qnap NAS units (which also runs some variant of Linux) and the client runs CentOS (also Linux), and those systems have no problems hitting GbE line speeds.

TrueNAS, on the other hand, is the only “odd ball out” at the moment.

2 Likes

I’d try a couple tests if possible. Try plugging in another such as a notebook with a NVMe drive and do a couple transfer tests. That will give you clues right away. If you’re getting fast transfers you know it’s not the network or TrueNAS but the client system.

Alternately reverse the test by using another computer other than the TrueNAS server to send test files to the client to see what speed you get.

You could have highly fragmented disks on it which would benefit from a dump and load. I don’t think you mentioned free space but if the storage is getting close to full it’s going to be writing slow with mechanical disks due to the physics of how disks work.

1 Like

Yeah…I wished that there was a way to overlay ZFS on top of tmpfs so that I can create like an 8 GB ramdrive, mount ZFS on it, and then test it.

Read/writes to my QNAP NAS units between client and QNAP can run at line speeds.

I don’t remember exactly now when this all started, but if I had to guess, maybe 70% full? Maybe? I don’t remember anymore.

Thanks though.

(Since then, the data has been evacuated and the TrueNAS server has been rebuilt and redeployed already with eleven (11) fresh, new HGST 6 TB SATA 6 Gbps 7200 rpm HDDs in a raidz1 vdev/pool.)

I thought that with ZFS, you didn’t have to worry about fragmentation as much?

1 Like