How to improve ZFS file server write performance?

So I’ve recently set up a ZFS file server on ubuntu serving clients through Samba and ISCSI.

Everything goes well except the fact that it seems like the default ZFS settings only allow for relatively small amount of data to be written to RAM before flushing to disk.

I THINK this is whats causing my transfers to start out saturating the 10Gbit link I have to it, then dropping off immediately after about the 5 second mark to about 200MB/s and sometimes completely choking down to 0.

I’ve tried adding the following lines to /etc/modprobe.d/zfs.conf

options zfs zfs_commit_timeout_pct=30
options zfs zfs_txg_timeout=30
options zfs zfs_dirty_data_max=50000000000
options zfs zfs_dirty_data_max_max=50000000000
options zfs zfs_dirty_data_sync=10000000000

But nothing seems to be working…looking at system monitor my RAM is barely moving.

I cant quite remember if I read from somewhere that ZFS treats random and sequential writes very differently. Something about sequential transfers skipping the ZIL and going straight to the main pool?

The system has 32GB RAM for now (soon to be more), though i only see it hovering around 5-8GB most of the time.
10x 2TB WD RE4 drives in RAIDZ3
4x samsung 970 evo plus 250GB for cache

Transferring to/from a PC with more 970 evo plusses and intel 750 series SSDs.

What are the correct settings I can use to get ZFS to sustain a constant write speed for about 50GB?

1 Like

Definitely sounds like a configuration issue.

Okay, so let me get some info here.

Is this business or personal use?

How many concurrent users?

Are you using a SLOG?

What’s the full specs of the server?

What version of ZFS and distro/kernel are you running?


Now for a bit of initial advice:

Those evo plus drives are not good drives, and

RAIDZ3 seems a bit overkill for 2TB disks. That will reduce your effective read/write speed significantly.

1 Like

Thanks for the quick response! I do apologize for the sporadic first post. I’ve been at this for hours and its been driving me nuts.

The server is for personal use. I have created ZVOLs for running virtual machines as well as using them as ISCSI targets for extending storage of other PCs on the network. Most of the time though its only me and my GF using it so concurrent users would be 2.

Specs are:
2x E5 2687W V3 on an ASUS Z10 PE D8 - WS mobo with 32GB of SK Hynix ECC RAM

Its actually 8x2TB drives and 2x6TB drives, I’m currently upgrading but don’t have the cash for all 10 drives at the same time.

The 4x 250GB 970 Evo Pluses are set so that every drive has a 50GB partition for SLOG, and a 150GB partition for cache with 50GB overprovisioned. This is temporary while I’m waiting for another 750 Series - please tell me the 750 series is at least decent for this purpose?? :frowning:

And finally I’m on Ubuntu 18.04 LTS / 4.18.0-18-generic with zfs version being 0.7.9-3ubuntu6

image

RAIDZ3 seems a bit overkill for 2TB disks. That will reduce your effective read/write speed significantly.

I’m looking at zpool iostat -v and I can see that the pool absorbs information @ 600-800MB/s so I’m not sure thats entirely true in my setup.

As mentioned before, I’m more interested in what I’m doing wrong when trying to configure ZFS to use more RAM and flush a little less when doing heavy writes.

Specifically, my workload requires that I offload camera footage somewhere in the neighborhood of 50-100GB each time. I’m looking to speed this initial process up and let ZFS churn away while it writes the data to the main pool. I have a UPS and everything so I’m not too worried about data loss from an outage.

gogogo

So this is basically what happens if I delete everything from zfs.conf.
I’m transferring a 40GB file from my PC’s 500GB samsung 970 evo plus. and for the sake of avoiding samba overhead I’m doing this over ISCSI.

It starts out completely pinned at 1GB/s, then drops at exactly the 5 second mark to about 300MB/s.

Sorry, I get a bit lost in the weeds sometimes. :stuck_out_tongue:

Okay, let me qualify some of my earlier comments then.

Okay, so you’re going to upgrade all of em to 6TB? that’s a much better case for Z3 then.

Well, that’s quite the system.

Ah, good.

The Intel 750 series is excellent for this purpose.

I was more worried about wear and IOPS/throughput on the original drives. ZFS is extremely write-heavy on SLOG and L2ARC devices. I burned through some 128GB 850 pros in 18 months a few years back. After that, it was always good flash for the caching devs.

Hmm, okay, so we’re not looking at a competition issue.

I’m not super familiar with 10G networking, but if your CPU can handle samba, I’d switch back to that. The overhead is very minimal.

I see that. Looks to me like you’re blowing out your cache.

iSCSI does what’s called synchronous writes. This means the writes have to be committed to persistent storage. Basically, the writes need to go to the primary array and cannot be cached in the SLOG. That’s why you’re seeing the performance issues. I think you’ll see better performance immediately by moving to Samba. Could you do a Samba test just to rule out the possibility of iSCSI writes being the culprit?

Don’t sync writes use the SLOG anyway? 200GB over 4x 970 evos can’t handle 5 seconds of 1GB/s over ISCSI? The math doesn’t seem to check out.

image

Roller coaster ride when going through Samba. anywhere between 200-700MB/s

My RAM usage seems to max out at 12GB.

Clearly. Better overall speed, I’d say.

What distro are you running?

Writes get flushed to disk as fast as possible. It only caches data as it’s read, so if you aren’t reading more than 12GB, you won’t see it get up there.

There are also settings that can be configured to change the arc size buffers.

I’m on Ubuntu 18.04 LTS / 4.18.0-18-generic with zfs version being 0.7.9-3ubuntu6

I was actually referring to ZFS writing to RAM. It seems to go from 2GB at system idle/boot and goes up to 12GB usage while I’m writing the file, but it stops there and my transfer speeds drop.

I should clarify that my main issue is with write operations only.

What about settings to force ZFS to use more ZIL/SLOG?

A lot of articles mention the original zfs.conf options I had put in place, but its just that I can’t figure out anything that works consistently.

Yeah, there are settings, that’d be zvol/dataset properties. Off the top of my head, I can’t remember it. Let me look it up.

Yeah, I figured as much. :stuck_out_tongue:

Have you done local write tests to the datasets/zvols in question?

Those options aren’t doing the right thing.

Try the following:

zfs set sync=always pool/path/to/dataset

Do this on the Samba share first, then let’s try the iSCSI.


I’m starting to wonder if this is a network reliability issue.

Thanks

Yeah. I copied a file from the 800GB 750 Series SSD the OS is using to the ZFS array with similar things happening. Great write speeds initially and then flops.

For testing purposes I also connected my PC directly to the server via a Mellanox 10Gbit card.

Here are the results for sync=always. I just set this globally because my datasets and zvols are all under the same zfs directly tree.

SAMBA is extremely stable now. This actually makes more sense. and I’m seeing activity on all four 970 evos. And it carries on through the rest of the transfer.
image

heres iSCSI… and for some reason it just absolutely chokes. The flat part initially is 1GB/s then… sigh.
image

1 Like

Okay, how old are those 2TB disks? Are they pre-fail?

I’m not really experienced with iSCSI, so I can’t really speak to it.

Well, this is progress, at least… Not that 300mb/s is ideal though.

I wonder, you’re seeing ~80MB/s sustained writes on each evo, right? This might be good news.

They’re about 3-4 years old now. :confused: which is what prompted the upgrade.

BTW, in this video at that time wendell talks about some zfs tunables he uses to change the default 5 seconds txg time to 30 seconds. Would that not help my workflow in this case?

I don’t think that will make a huge difference.

But didn’t you already do that?

oops forgot to reply to this part. and yes. 80MB/s. I just wonder why it doesn’t do that for ISCSI. Maybe it does some weird thing when sync=always and ISCSI requests another sync write?

I was actually hoping someone with better knowledge could confirm first whether or not I was even using the correct tunables.

Wendell never mentions exactly what tunables he used to change the settings and its not on a post anywhere…

Oh, derp

I’m not that experienced with ZFS, tbh. I’d give it a go and see what happens.

I’d also like to mention that setting sync=always absolutely kills iSCSI performance to the point where its unusable sometimes. I can click into the folders and explorer would freeze.

Jumping on the server and reverting sync=standard immediately unfreezes it.

Thanks a lot for your fast responses anyway. I might try the ZFS reddit and see if someone can help me out there…

Hmmm, interesting. (again, I don’t know about or use iscsi, so I can’t help that much) :thinking:

Let me drag @freqlabs in here so he can make fun of my terrible attempts. He’s probably better than I at this stuff.