Originally posted here:
Preface:
I typically start these types of articles with a definition of the title and some explanation.
Today I am going to break that trend, and jump into an anecdote from my professional life. Just about 5 years ago, I started as the Datacenter Operations Manager for a large public school district. As part of my “orientation” my director went through several key systems, and one of them was our backup strategy. At that time, we had two datacenters, each with warm backups of the other’s virtual machines. A sane strategy. When I asked about offsite, he showed me a tape library.
Now I want you to picture this in no uncertain terms. I was 27 years old at the time, and the only tapes I was familiar up until that point were cassette tapes and VHS tapes. I literally had not even heard the term LTO Tapes. For years, I badgered my director about my frustrations with this strategy. I had to make sure one of my guys was there every Friday to hand the tapes off to a man with a case. I had to make sure we called back tapes when they expired, else risk running out of tapes and not being able to write out weekly rotations. We even occasionally had weird problems I didn’t understand and couldn’t fix. It drove me nuts!
But one day I had an epiphany, it was something that my director had said that caused an explosion of neurons to fire. Just because it was an electromechanical device with roots going back before the dawn of the millennium, shouldn’t have meant it was invalid. We in the tech field are obsessed with rapid growth and change. What I realized that day was that sometimes, what’s old is new again. Tried-and-true technology, stability, predictability, these are all characterizations of tape backup strategies.
What is LTO?
Wikipedia’s definition: Linear Tape-Open - Wikipedia
Linear Tape-Open (LTO) is a magnetic tape data storage technology originally developed in the late 1990s as an open standards alternative to the proprietary magnetic tape formats that were available at the time. Hewlett Packard Enterprise, IBM, and Quantum control the LTO Consortium, which directs development and manages licensing and certification of media and mechanism manufacturers.
So it’s an old, dead format right? Like VHS or Betamax before it? Well, no. It’s actually a fairly regularly updated standard, with new releases every few years, and the most recent one being from 2021. Not to mention they have a pretty firm roadmap going into the future:
What is a Tape Drive?
A tape drive is a device that can read or write, in this context, LTO tapes. They have come in several varieties over the years. They can be internal to a system in a 5.25” bay like a CD drive:
They can sit on your desk:
Think about a tape drive like you would think about a VHS tape, except for data instead of movies.
Or they can live in a library:
What is a Library you might ask? If you are a 90s kid like me, you can think about it like one of those cool CD changers from back in the day, where you could put all of your CDs in one unit in a console:
How might I use it?
Ahh! Now that’s the question. For my purposes I am using an HP 1/8 G2 Tape Autoloader. It’s the same as the library pictured above. The 1 in the name means 1 drive lives in the library (some can have two or more), and the 8 means it holds 8 tapes. Autoloader means, much like the name implies, it can automatically load tapes into the drive for you!
HP helpful has as a little LCD screen on the front of the library that lets you setup basic networking configuration, and now we can access the library from the interwebs.
You can see I have 7 slots free out of 7 slots. Huh? I thought it was 8! That’s because there is one slot which is called a mail slot.
By default, you can take the entire magazine out of the thing:
In order to remove a tape. When you are swapping out multiple tapes, this is fine. But if you only need 1 tape, it’s kind of a lot of work. With a mailsot, things are simpler and you can just remove a single tape
The library will automatically, with a little robot, move your tapes around for you at your request. Here, I just commanded it to move the tape currently in the drive into the mailslot, as an example:
Getting ready:
On my production TrueNAS server, I have already shipped some of my data over to my friend “Little Lenny”, which is a backup server instance I have. I now effectively have 2 copies of my data, but while it is on a different system, it is still in the same house as my primary production. The tape will help me solve that problem.
In this case, Little Lenny has a SCSI connection directly into my tape library using an 8087 SAS cable. This is an example picture of what the drive inside of my library looks like:
The 8087 cable plugs into a SAS HBA, like a LSI SAS9207-8e, just like if I were to connect a disk shelf.
Now in Little Lenny, I have my dataset called pictures which has precious family moments I would rather not lose.
So, I have my data, and I have the library connected. How do I do that?
Well, we can use a whole slew of backup software utilities, some are free, some cost a lot of money, some come with support, and some don’t. But it isn’t even that complicated. We can use standard Linux commands that natively exist without having to do anything at all. The same is true for TrueNAS Core, though the commands are a bit different.
Lets go over to the shell. Type ‘lsscsi’
Nice! My tape library is detected, and it’s at /dev/st0
Now we can use the Linux command ‘mt’ to interact with it.
Let’s type:
mt -f /dev/st0 status
And for me it looks like it can see the tape and interact with it, we are ready to move on:
Backing up my data
Let’s talk a little bit about .tar, I’m sure if you are reading this guide, at one point or another you have encountered a .tar file, maybe the phrase ‘tarball’ or perhaps a .tar.gz file. To quickly get this out of the way, .tar.gz is simply a tar file that compressed. But a tar itself, was created SPECIFICALLY for the purpose of doing what we are doing here, to take files and create an archive, on tape. TAR the letters themselves, is suspiciously similar Tape ARchive, is it not?
With that out of the way, lets get some baseline performance numbers in an ideal sequential workload of 0s.
Code:
root@littlelenny[~]# dd if=/dev/zero of=/dev/st0 bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 36.4856 s, 287 MB/s
That’s faster than 2.5 Gigabit network adapters, so that’s pretty good!
I am going to make a tar ball of my dataset /mnt/backup/pictures and push it over to the tape drive, like this:
Code:
tar -cvf /dev/st0 /mnt/backup/pictures .
Personal preference here, but I prefer to use tar in conjunction with dd as a separate stage to perform this task:
Code:
tar -cvf /mnt/backup/pictures.tar -C /mnt/backup pictures && dd if=/mnt/backup/pictures.tar of=/dev/st0 bs=1M
How do I restore my data, then?
Code:
dd if=/dev/st0 of=/mnt/backup/pictures_backup.tar bs=1M
And then extract it:
Code:
tar -C /mnt/backup/picture_backup -xvf pictures_backup.tar
Like this as a combined statement:
Code:
dd if=/dev/st0 bs=1M | tar -C /mnt/backup/picture_backup -xvf -
Theoretically you can pipe those two things together like we did above during the restore process as well.
Here’s a script I wrote to automate this process via CRON:
#!/bin/bash
# Define paths and filenames
BACKUP_SOURCE="/mnt/backup/pictures"
DATE=$(date +%Y%m%d%H%M%S)
BACKUP_DEST="/mnt/backup/pictures_backup_${DATE}.tar"
LOG_FILE="/root/pictures_${DATE}_tar_output.log"
TAPE_DEVICE="/dev/st0"
# Create tar file and log the output
tar -cvf ${BACKUP_DEST} -C ${BACKUP_SOURCE} . > ${LOG_FILE} 2>&1
if [ $? -ne 0 ]; then
echo "Error: Failed to create tar file."
exit 1
fi
# Write the tar file to the tape drive
dd if=${BACKUP_DEST} of=${TAPE_DEVICE} bs=1M status=progress
if [ $? -ne 0 ]; then
echo "Error: Failed to write the tar file to the tape drive."
exit 1
fi
# Advance the tape to the end of file (EOF)
mt -f ${TAPE_DEVICE} eof
if [ $? -ne 0 ]; then
echo "Error: Failed to advance the tape."
exit 1
fi
The above commands were written for SCALE, modifications would likely have to be made for CORE.
Scaling Past Single tapes
Obviously, This methodology would work fine if you are only backing up enough to not fill your tape. For me, that’s under 2.5TiB on my LTO6 tapes. Once you get to that point, it becomes impractical to backup your data in this way. That’s where backup software comes in.
Bacula is a great open source tool you can run in a VM, and then use NFS to share your datasets (preserving xattrs!) to be backed up/
The best open source backup software for Linux. (bacula.org)
If you prefer more commonly used Enterprise software, Veeam is free for home users, and you can backup your data to a VM using SMB.
Veeam Backup & Replication Community Edition:Our latest gift to the community
Why Not the Cloud?
I believe in owning my data. If I have an emergency and I need to recover my data, I do not want to be ransomed by the people in the cloud with whom I have entrusted my data. I have a moral objection to egress fees because if things are bad enough that I have to recover from 3rd tier backups, then adding fuel to the fire and having an additional cost makes me mad.
Backblaze is well respected and is generally cheaper than it’s bigger brand competition.
Let’s use the 2.5TiB as our number, which is the maximum I can store on a single tape. Let’s scale out the timeline, and there is no guarantee that pricing won’t increase over time.
Backblaze will cost about $12.50 for the month or about $150 for a year.
That’s $300 at year 2.
That’s $750 at year 5.
Now let’s assume I have to retrieve my data at least once. Add another $25.
We’re at $775.
Now lets assume I have 8 times that amount of data I would like to backup, which corresponds to the number of tapes I can store in my library.
With about 20 TiB of data, Backblaze will cost me about $6,000 over 5 years, and about $200 to retrieve my data only once.
How much did I pay for my tape library, drive, and tapes? $300 for the library, $350 for the drive and $160 for the tapes or about $810. If we assume 20TiB of data, I break even at 8 months of Backblaze, and I should get about 5 years out of this drive. The tapes themselves will last for 30 years if stored properly.
iXsystems uses the tagline “True Data Freedom”. I think that tapes have a part to play in data freedom, but thats just my opinion.