Using LTO Tapes to backup your TrueNAS

Originally posted here:

Preface:

I typically start these types of articles with a definition of the title and some explanation.

Today I am going to break that trend, and jump into an anecdote from my professional life. Just about 5 years ago, I started as the Datacenter Operations Manager for a large public school district. As part of my “orientation” my director went through several key systems, and one of them was our backup strategy. At that time, we had two datacenters, each with warm backups of the other’s virtual machines. A sane strategy. When I asked about offsite, he showed me a tape library.

Now I want you to picture this in no uncertain terms. I was 27 years old at the time, and the only tapes I was familiar up until that point were cassette tapes and VHS tapes. I literally had not even heard the term LTO Tapes. For years, I badgered my director about my frustrations with this strategy. I had to make sure one of my guys was there every Friday to hand the tapes off to a man with a case. I had to make sure we called back tapes when they expired, else risk running out of tapes and not being able to write out weekly rotations. We even occasionally had weird problems I didn’t understand and couldn’t fix. It drove me nuts!

But one day I had an epiphany, it was something that my director had said that caused an explosion of neurons to fire. Just because it was an electromechanical device with roots going back before the dawn of the millennium, shouldn’t have meant it was invalid. We in the tech field are obsessed with rapid growth and change. What I realized that day was that sometimes, what’s old is new again. Tried-and-true technology, stability, predictability, these are all characterizations of tape backup strategies.

What is LTO?

Wikipedia’s definition: Linear Tape-Open - Wikipedia

Linear Tape-Open (LTO) is a magnetic tape data storage technology originally developed in the late 1990s as an open standards alternative to the proprietary magnetic tape formats that were available at the time. Hewlett Packard Enterprise, IBM, and Quantum control the LTO Consortium, which directs development and manages licensing and certification of media and mechanism manufacturers.

So it’s an old, dead format right? Like VHS or Betamax before it? Well, no. It’s actually a fairly regularly updated standard, with new releases every few years, and the most recent one being from 2021. Not to mention they have a pretty firm roadmap going into the future:

What is a Tape Drive?

A tape drive is a device that can read or write, in this context, LTO tapes. They have come in several varieties over the years. They can be internal to a system in a 5.25” bay like a CD drive:

1689436061164.png

They can sit on your desk:

1689436079196.png

Think about a tape drive like you would think about a VHS tape, except for data instead of movies.

Or they can live in a library:

What is a Library you might ask? If you are a 90s kid like me, you can think about it like one of those cool CD changers from back in the day, where you could put all of your CDs in one unit in a console:

1689436094279.png

How might I use it?

Ahh! Now that’s the question. For my purposes I am using an HP 1/8 G2 Tape Autoloader. It’s the same as the library pictured above. The 1 in the name means 1 drive lives in the library (some can have two or more), and the 8 means it holds 8 tapes. Autoloader means, much like the name implies, it can automatically load tapes into the drive for you!

HP helpful has as a little LCD screen on the front of the library that lets you setup basic networking configuration, and now we can access the library from the interwebs.

You can see I have 7 slots free out of 7 slots. Huh? I thought it was 8! That’s because there is one slot which is called a mail slot.

By default, you can take the entire magazine out of the thing:

In order to remove a tape. When you are swapping out multiple tapes, this is fine. But if you only need 1 tape, it’s kind of a lot of work. With a mailsot, things are simpler and you can just remove a single tape

The library will automatically, with a little robot, move your tapes around for you at your request. Here, I just commanded it to move the tape currently in the drive into the mailslot, as an example:

Getting ready:

On my production TrueNAS server, I have already shipped some of my data over to my friend “Little Lenny”, which is a backup server instance I have. I now effectively have 2 copies of my data, but while it is on a different system, it is still in the same house as my primary production. The tape will help me solve that problem.

In this case, Little Lenny has a SCSI connection directly into my tape library using an 8087 SAS cable. This is an example picture of what the drive inside of my library looks like:

1689436161757.png

The 8087 cable plugs into a SAS HBA, like a LSI SAS9207-8e, just like if I were to connect a disk shelf.

1689436172328.png

Now in Little Lenny, I have my dataset called pictures which has precious family moments I would rather not lose.

So, I have my data, and I have the library connected. How do I do that?

Well, we can use a whole slew of backup software utilities, some are free, some cost a lot of money, some come with support, and some don’t. But it isn’t even that complicated. We can use standard Linux commands that natively exist without having to do anything at all. The same is true for TrueNAS Core, though the commands are a bit different.

Lets go over to the shell. Type ‘lsscsi’

Nice! My tape library is detected, and it’s at /dev/st0
Now we can use the Linux command ‘mt’ to interact with it.
Let’s type:

mt -f /dev/st0 status
And for me it looks like it can see the tape and interact with it, we are ready to move on:

1689436219463.png

Backing up my data

Let’s talk a little bit about .tar, I’m sure if you are reading this guide, at one point or another you have encountered a .tar file, maybe the phrase ‘tarball’ or perhaps a .tar.gz file. To quickly get this out of the way, .tar.gz is simply a tar file that compressed. But a tar itself, was created SPECIFICALLY for the purpose of doing what we are doing here, to take files and create an archive, on tape. TAR the letters themselves, is suspiciously similar Tape ARchive, is it not?

With that out of the way, lets get some baseline performance numbers in an ideal sequential workload of 0s.

Code:

root@littlelenny[~]# dd if=/dev/zero of=/dev/st0 bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 36.4856 s, 287 MB/s

That’s faster than 2.5 Gigabit network adapters, so that’s pretty good!

I am going to make a tar ball of my dataset /mnt/backup/pictures and push it over to the tape drive, like this:

Code:

tar -cvf /dev/st0 /mnt/backup/pictures .

Personal preference here, but I prefer to use tar in conjunction with dd as a separate stage to perform this task:

Code:

tar -cvf /mnt/backup/pictures.tar -C /mnt/backup pictures && dd if=/mnt/backup/pictures.tar of=/dev/st0 bs=1M

How do I restore my data, then?

Code:

dd if=/dev/st0 of=/mnt/backup/pictures_backup.tar bs=1M

And then extract it:

Code:

tar -C /mnt/backup/picture_backup -xvf pictures_backup.tar

Like this as a combined statement:

Code:

dd if=/dev/st0 bs=1M | tar -C /mnt/backup/picture_backup -xvf -

Theoretically you can pipe those two things together like we did above during the restore process as well.
Here’s a script I wrote to automate this process via CRON:

#!/bin/bash

# Define paths and filenames
BACKUP_SOURCE="/mnt/backup/pictures"
DATE=$(date +%Y%m%d%H%M%S)
BACKUP_DEST="/mnt/backup/pictures_backup_${DATE}.tar"
LOG_FILE="/root/pictures_${DATE}_tar_output.log"
TAPE_DEVICE="/dev/st0"

# Create tar file and log the output
tar -cvf ${BACKUP_DEST} -C ${BACKUP_SOURCE} . > ${LOG_FILE} 2>&1
if [ $? -ne 0 ]; then
    echo "Error: Failed to create tar file."
    exit 1
fi

# Write the tar file to the tape drive
dd if=${BACKUP_DEST} of=${TAPE_DEVICE} bs=1M status=progress
if [ $? -ne 0 ]; then
    echo "Error: Failed to write the tar file to the tape drive."
    exit 1
fi

# Advance the tape to the end of file (EOF)
mt -f ${TAPE_DEVICE} eof
if [ $? -ne 0 ]; then
    echo "Error: Failed to advance the tape."
    exit 1
fi

The above commands were written for SCALE, modifications would likely have to be made for CORE.

Scaling Past Single tapes

Obviously, This methodology would work fine if you are only backing up enough to not fill your tape. For me, that’s under 2.5TiB on my LTO6 tapes. Once you get to that point, it becomes impractical to backup your data in this way. That’s where backup software comes in.

Bacula is a great open source tool you can run in a VM, and then use NFS to share your datasets (preserving xattrs!) to be backed up/
The best open source backup software for Linux. (bacula.org)

If you prefer more commonly used Enterprise software, Veeam is free for home users, and you can backup your data to a VM using SMB.
Veeam Backup & Replication Community Edition:Our latest gift to the community

Why Not the Cloud?

49nocloud.jpg

I believe in owning my data. If I have an emergency and I need to recover my data, I do not want to be ransomed by the people in the cloud with whom I have entrusted my data. I have a moral objection to egress fees because if things are bad enough that I have to recover from 3rd tier backups, then adding fuel to the fire and having an additional cost makes me mad.

Backblaze is well respected and is generally cheaper than it’s bigger brand competition.

Let’s use the 2.5TiB as our number, which is the maximum I can store on a single tape. Let’s scale out the timeline, and there is no guarantee that pricing won’t increase over time.

Backblaze will cost about $12.50 for the month or about $150 for a year.
That’s $300 at year 2.
That’s $750 at year 5.

Now let’s assume I have to retrieve my data at least once. Add another $25.
We’re at $775.

Now lets assume I have 8 times that amount of data I would like to backup, which corresponds to the number of tapes I can store in my library.
With about 20 TiB of data, Backblaze will cost me about $6,000 over 5 years, and about $200 to retrieve my data only once.

How much did I pay for my tape library, drive, and tapes? $300 for the library, $350 for the drive and $160 for the tapes or about $810. If we assume 20TiB of data, I break even at 8 months of Backblaze, and I should get about 5 years out of this drive. The tapes themselves will last for 30 years if stored properly.

iXsystems uses the tagline “True Data Freedom”. I think that tapes have a part to play in data freedom, but thats just my opinion.

13 Likes

The big difference between the 2 that I can see at a glance is data security. I.E. that your data with backblaze is subject to someone else having access to it or them getting hacked and it getting released out in the wider net / world. The tape backup doesn’t suffer from that.

Just my observation.

Curious thou, is there a better option for long term offline cold storage?

IMO Tape is the answer, which is why I did this.

Also to be clear, Backblaze cannot see your data if you back it up there.
Encrypting Files in B2 – Backblaze Help

1 Like

I wasn’t aware of that. Good to know!

I suggest installing mt from the mt-st package. Its output is much more user-friendly.

# mt status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 512 bytes. Density code 0x58 (no translation).
Soft error count since last status=0
General status bits on (41000000):
 BOT ONLINE

BOT meaning beginning-of-tape. A little more readable than a bitmask.

This is a terrible way to test. LTO drives have built-in compression which will massively skew the numbers with something incredibly compressible like /dev/zero. Instead, you can use /dev/urandom if it’s fast enough on your system, or use openssl:

You should always use the /dev/nst0 device, unless you have a very good reason to do otherwise.

Instead of dd, use mbuffer, as discussed recently in this lengthy thread:

Not really. tar has the -M or --multi-volume option, and mbuffer has the -A option to point to an autoloader command, and the autoloader can be a prompt on the screen or an e-mail telling you it’s time to swap tapes in your drive, if you don’t have a robot.

You’ve got a decent introduction there, but things start getting interesting with multi-tape archives and incremental / differential backups.

4 Likes

You dont have to trust BackBlaze, you can encrypt the files yourself prior to uploading.

TrueNAS has this function.

I am happy to see youngsters see the benefits of the tape ;). it’s still quite valid in my field of work, that things goes to disk, then goes to tape, and then get’s recycled - or disk and tape for keeping redundancy even at the first stage :).
It would be awesome to see, if you started peeking on full and incremental backups here as well, and thought about how to structure it for a sane backup solution - it’s quite common still if you move to tape backups that you try to have valid masters as in full backups, just to have a place in time where things are intact, and then depending on the balancing and the volume of data, goes incremental until a said point.

Backups can be an art if done correctly - and, never ever forget to try to restore tests, as part of the backup strategy - it’s quite a awesome thing to have a response to if someone comes running asking - how long will it take to restore system X or dataset Y. Meaning, having a structure with redundancy, recovery and restore in place - helps everyone…

2 Likes

@NicKF do you have a drive recommendation? The newer LTO-9 ones seems absurdly priced for homelab use…

The tape themselves seem reasonably priced per Terabyte but it requires a cool, dry and pretty much sterile storage that is the opposite of the humid and dusty tropical climate I am in…

SUPER HELPFUL FEEDBACK! I appreciate you very much.

LTO6 is probably as high I could afford. Alot of guys on reddit were talking about LTO4 deals, but they just seem so damn small now.

1 Like

You sound like my old boss. You even type like him. LOL

An air conditioner or dehumidifier can take care of all those issues.

Spouse wont be happy with me permanently airconditioning a room with no one inside. :rofl:

It increases the operating cost when spinning rust will do the job the same without needing dehumidifier and cooling.

Vacuum pack the tape + silica bag inside.

1 Like

I mean…the tapes are designed to be stored at ROOM TEMPERATURE.
[Environmental and shipping specifications for LTO tape cartridges - IBM Documentation](IBM Documentation

My house from that standpoint has rooms that make them “unsuitable” for storing tape, by IBMs definition, sure. The RH in my house sometimes gets as high as 80% even with the AC running (window units), and I set the Tstat on my ACs to 77 or 78 when no one is in the room since it’s actually more likely to save energy and prolong the ACs life by NOT running them on full tilt for several hours at a time trying to get to the set point (which I keep around 72 when I am in a room)

In any case, those numbers are based on a 30 year estimated life time. If I shorten that life to 15 years by storing them in slightly out of spec conditions I can accept that…

I’ve got VHS tapes that were stored outside of cases in an attic at over 120+ degrees plus and RH at 100%…for 20 years that didn’t degrade enough to make them unrecoverable…

ALSO FWIW the OPERATING temperature of tape is not significantly different than HDDs, and we can see that the tapes are rated to be transported at wildly out of spec conditions.

Methinks you are trying to validate your decision to not explore the potential of tape because you just dont want to :slight_smile:

re: @rcxb 's mention of the use of mbuffer

If you read the relatively and rather lengthy thread of mine (which actually went COMPLETELY off topic, but mehhh…), please be aware that there are caveats to using mbuffer.

  1. You can’t “list” the contents of your tapes because in essence, it will be handled more at the “block device” level rather than the file “device” level.

i.e.
If you format the tapes using LTFS, you can mount the tapes to a conventional mountpoint (i.e. I create a directory /tmp/lto8 and mount my tapes to that mount point, where from there, I can use commands like ls to list the contents of the folder.)

With mbuffer, I haven’t really found a good way of doing that.

  1. If you keep an index of all of the files that you’ve written to that tape as a text file, either stored on the tape, or stored elsewhere – that will be useful if you want to or need to pull a specific file or select files from said tape. Again, being that you won’t be able to “view” the contents of it in a more conventional way, you’ll have to either have a really good memory of the filenames of your stuff, written to said tape, or you’ll need to keep and maintain an index somewhere.

Whereas if you are backing up your data relatively “randomly” (i.e. you backup your main stuff and then find other files to fill the rest of the tape to your desired quota, and you didn’t specifically prepare what you are going to be sending to each tape beforehand, then you might have to generate the index file somewhat after the fact).

I mention this because in contrast to this method, what I tend to do is to actually have a separate folder on my system where I am preparing my files to be written to tape, that way I can check that it will fit on the tape (in terms of total volume of space used, as well as I can also create an index of all of the files that are able to be sent to tape.

Just be aware of that as you’re stepping into this.

1 Like

re: your cost analysis

This is a part of the reason why I ended up “investing” in a LTO-8 tape solution.

Yes, the drives and the tapes themselves are more expensive, but it’s only recently where the price of 10 TB or 12 TB drives have finally hit the used market that is at a price that is comparable to the price of a LTO-8 tape.

And I can tell you that I’m not particularly smart as the tapes has helped saved my butt a few times already because of little stupid mistakes when typing in a sequence of commands into the shell, which resulted in accidentally deleting some of my data.

Tapes are AWESOME!

I use IBM’s (Spetrum?) LTFS single drive edition.

Downloaded the source code, compiled it on my system that’s running CAE Linux 2018 (which is based on the new, quite old, Xubuntu 16.04, but it works, so I’m not going to try to break it, by upgrading it. I have a dedicated system to just run the tape backup system.)

Since time is less of a factor than money, so I spend my time, compressing the files using pixz (which results in between 2-4% smaller files, but it does take longer to compress it than say using pigz), and then I will also use par2 to calculate double parity of the data and then the data plus the first set of parity data. And then I write all of that to tape.

Yes, the tapes are supposed to have compression. No, I have never seen the tape drive do that. (But then again, because I just mount the LTFS formatted tape to a more conventional mount point, I just use rsync to send the data to tape. That way, rsync handles the incremental backup to the tape.)

The tapes are also supposed to have a checksum as well (or so I’ve read, somewhere before), but I have had the data on the tape be corrupt before, which is where the parity data came in handy, because I was able to use that to repair the file.

Both the compression and the calculation of the parity data (and the double parity data) takes time, but it has saved my butt a few times already.

YMMV.

1 Like

Getting back the data from a tape backup can take a long time. (Been there, done that.) Frequent automatic snapshots on a COW file system (btrfs, ZFS) can be a better defence against accidental deletions, especially if integrated with incremental backups.

Snapshots are a good defense, and I use them extensively, but they can fail miserably if the pool becomes corrupted in various ways, and can propagate the issue if they are replicated to the next backup pool. A non-zfs replication in the middle can provide some decoupling protect though, possibly what you mean by incremental which could be done many ways. Defense in depth is almost always a good idea, multiple backups, multiple backup technologies, multiple data storage sites. Tape is good at bulk data, and much less fragile to transport than HDDs, although when SSDs come down significantly in price that may go away, but it will be a long time before SSDs can become cheap enough and big enough to be viable “tape cartridge” replacement. ECC like par2 and it’s ilk are also a good idea, more recoverability.

Well put, an attitude I aspire to.