Manage Illegal Characters when Copying to LTFS

So I’ve run into an issue where I am archiving data onto LTFS formatted tapes. LTFS can’t handle some pretty common characters, and copying using HPE’s ltfscopy command fails immediately upon encountering such a character during a recursive copy.

I can think of a few ways to deal with this, but wondering if anyone has any input. The way I see it, I can:

  1. Recursively replace illegal characters on the source material. Making changes to the source during an archive operation is never ideal, so I’d prefer not to do this.

  2. Copy the archive material to a temporary location, change the characters and then copy to tape. Unfortunately, I don’t have a lot of spare live storage space to do this, so it would probably require spending money. It will also add considerable time to the process.

  3. Manually recurse instead of using the -r option, piping each item name through sed to replace illegal characters during the copy. I’ve never tried to do a recursion at this scale in a bash script. An embedded loop is already a lot for a shell script. Can it handle recursing through thousands of folders like that? Should I use python or something?

I’d recommend this. You can handle the manual recursion in a shell script. Bash is powerful and stable.

You might find it’s easier with Python, but you don’t absolutely have to use it.

2 Likes

With python, you can do something like:

for file,dir,root in os.walk(path):
  # doyourthang

I forget the order of the tuple, but you get the idea.

1 Like

Couldn’t you do the same with Symlinks (or Hardlinks if need be)? It’s considerably faster then copying the actual file, but when copying to the tape it should still retain the symlinks name?

1 Like

I guess that would work? I have a gut feeling that there’s something wrong with doing it that way but I can’t really think of anything concrete…

Tbh, idk if ltfscopy will follow symlinks. Is it possible to hardlink across file systems? Never really played with hardlinks much.

I’m going to sound like an idiot, but is there a way that you could encode the files to get rid of the illegal characters to store on the tape? Maybe zip things up and compress before storing? That way you can retain the actual character value without having to change it.

Thoughts?

1 Like

I don’t think hardlinks work across filesystems generally. Because what you’re doing with a hardlink is creating a new entry in the filetable that points to the same blocks on the drive (as opposed to symlinks that point to another entry).
But creating the hardlink would be done on the same drive, it doesn’t take up much additional space since the datablocks remain there.