Your method to copy 3TB between servers?

Decommissioning my old Xeon file server in a few days and switching to a Ryzen based machine.

The server has 3tb of data, 1.8tb is large non-compressible data files. Another 1tb is tiny non-compressible data files (mp3 and jpg), hundreds of thousands of them.

The large files will copy OK, speeding up and slowing down as caches fill/empty on either end. The tiny files will copy terrible. Each file seems to start a new stream using rsync or samba. I’ve also tried creating a TAR of these files and that takes forever too., both creating and expanding on the other end.

What other options do I have, how would you do it?

edit: both machines have 10gbit fiber connection to a switch.

3tb HD SATA

1 Like

If you have a good 10Gb link between the two and can have them both running at the same time I’d say that might be your best bet.

Even if you get a large USB3 external hard drive your transfer speeds won’t be as fast as the 10Gb network.

A large internal hard drive might be an option as well. Speeds shouldn’t be that bad.

What’s your backup solution? Could this be a test of that solution?

I use crashplan small business. they really hamstring the restore rates, took two days to restore a 60gb set I tested a week ago.

Yikes. Well, internal large capacity hard drive or transfer over 10Gb network would be my suggestions then.

Personally, I’d do:

source$ tar -cpf - <sourcedir> | pv -B 200M  | nc -N <destination_host> 9999
destination$ nc -l 9999| pv -B 200M | tar -xpf -

Transfer should go as fast as your disks can seek those files.

If that’s still not good enough because you have millions of tiny files, it might be quicker to transfer the whole filesystem. Being careful not to wipe existing data of course, and making sure destination partition is bigger.

source$ pv /dev/<source_partition> | nc -N <destination_host> 9999
destination$ nc -l 9999 > /dev/<destination_partition>
5 Likes

If the old and new machine have an extra sata port. Its hard to beat a HD copy. I tend to be lazy and just rsync the files and wait.
I hate the SMB small file tax over network but 10G is nice.

1 Like

Great answer, nc, tar have high cap on throughput.

Just one note - both of these do not cover changes to the files. Doing it on live system might produce different versions at target machine.

RSync is not designed to transfer a lot of data, it is in fact designed to transfer as little data as possible, while maintaining parity.

The other options are mounting NFS or mouting the block device over Ethernet. For a single time transfer - not even remotely as good as using NC.

Would bittorrent sync be inappropriate? Set it up on both machines and let them run over night or over the weekend given that we are talking about 3TB.

Personally, if my disk capacity allows it, I usually have tar / rar without compression and additional crc and one large archive with the help of wget, possibly ftps. But if it takes too much time for you then …

You may be interested in Aria2 / uget. You can also experiment with a veracrypt container to avoid multiple sessions on the protocol if this is a problem …

Jesus Christ … Two days is probably not for production data :slight_smile:
Think of something like Odroid HC2 + 12/14TB HDD as an additional backup. The only limit then will be 1Gb/s.

If it can not be in the same location, even place somewhere in the home / work where the link will allow a sensible upload and in the event of a failure even if the link does not allow a quick download, you can even bring such HC2 in your hand and restore your backup quickly.

All these solutions in the cloud have this problem as much as you need to recover a very large amount of data the problem of “time” appears. The problem is not regularly making small backups to the cloud the problem is when you have to recover everything from the cloud. I personally am not a fan backup in the cloud.

Depends on the data , also depends on what is available hardware wise. Everyone seem to cover the basics pretty well. Would probably do over the network as since you have 10gbit network its probably not your bottleneck so no major gain any other way as its probably drive read limited or drive write limited depending on what the drives are at each side.

Thanks for the suggestions everyone. It’s going to be some network method, I don’t have any spare drives and buying one isn’t in the budget.

1 Like

Probably yes. What would the advantage be?

Unless you’re on 10 gig network and even then, compression will likely be a net win. And even if it isn’t… you’re more than likely network bandwidth limited in this case even with compression. Even if the data is only mildly compressible, the win in reduced bandwidth will probably be worth it and i doubt the CPU will be a bottleneck anyway (so possible win vs. unlikely bottleneck on CPU).

But… test with a small subset first i guess. Your CPU performance may differ, etc. But you only need to out-run the ~100 megabytes per second the network is capable of (best case) and lz4 can get 7-8x that on an i7 (not clear if the below link is per-core or not, it does mention > 500 MB/sec per core above the table)… presumably bzip/gzip isn’t THAT much slower.

source:

Hmmm what compression? I do not remember it … Of course, the compression is totally pointless in this case!

OP mainly has a problem with I / O performance with a large number of small files. There will be no wonders here …
Personally, in the place of OP I would bite a bullet and somehow endure this “Time”. If I was in OP shoes, I would have created a large veracrypt container and moved these 3TB data to it. And then smb / ftp to the new server. Yes, it will not be a quick solution.
But the purpose of this is that first all data will be encrypted and secondly if in the future you will need to transfer data again, then we operate in only one large file and at 10Gb/s it will be beautiful. :wink:

You can also take tests with HELIOS LanTest at your free time.

https://www.helios.de/web/EN/products/LanTest.html

…Honestly? With that much data I wouldn’t even bother with network move and do it by hand. DD it to the new drives or DD it out.

Are there just too many files for RSYNC to know what to do with then? IDK much about it yet.

At 10G/bit its a very quick job, ~hour-2 maybe(for large files).
Small files are allways a pita to move, and takes along time, it is not really the network communication, but more the physical harddrive got to position it self every time it writes a new file, and it is just the name of the game.
id suggest start copying your data from computer X to computer Y ~10pm.
goto bed, and let your computers do their thang overnight.
else start the process 8pm. Go watch a movie, goto bed and it should be done in the morning.
normally i use sshfs to move data internally (if i’m moving between linux machines).
sshfs user@addr:path localpath
works wonders, but it’s the same story as samba, sloooooow process for writing small files, again it’s the hardware not so much the software, or sockets or anything else.

Can’t beat Sneakernet… I got 60TB/hour using my car…

2 Likes

nice

image

1 Like