Video Editing Server/Data Storage Assistance | Noobie

Don’t waste sas lanes with nvme drives unless you need high availability with multiple head units. (You don’t).

modern NVME devices usually are limited by the fact that they only get 4 pcie lanes when talking to the host. You have enough slots, just give each card it’s own connection to the host, ie

https://www.newegg.com/p/17Z-00SW-00041

or

each adapter is under $50 because you can use the cards that require the motherboard and cpu to support bifurcation. ie cheaper than a sas cable or backplane, and faster all the time.

How does one do this on Windows 10? Would this also work on Windows 10? Our only “Linux” based machine is our current Synology’s. And I don’t think I have EVER run a terminal or command through that thing EVER.

Please explain. I’d like to learn more.

From the whole backplane or per drive? Because per drive (Gen 4 x4) would be a theoritical speed of 16GB/s. Realistically, having 2GB/s would be overkill I would think. So it stands to suggest if I could saturate a full 20Gb connection (about) there would never be a project I’d have to worry about editing slow down again right? I’m also suggesting in the future if the HDD become a problem. If i did this, I’d want them hot swappable to easily replace in case of a failure. So doing the Icy Docks adapter to NVME would be better for this use case no? Said Icy Docks are ~$30-50 as well. So more paying for the NVME’s then anything (instead of NVME to PCIE it goes NVME to SAS/U.2). I think there is also a sata interface but I’d prefer not to cause theres a limiting factor there.

Again, a “HDD are too slow for us now, we need more than just our SSD cache and SSD server to edit off of” issue I’ll *hopefully * encounter in another 10-20 years.

I really don’t know enough about nvme sas it to comment one way or another.

u.2 can consume up to 25w while using case cooling. Higher wattage u.2 drives often have aluminum enclosures that are shaped like a heat sink to dissipate heat better. u.2 drives also often have several circuit boards which give them much more surface area internally which lets them have much higher capacity/speed.

m.2 can consume up to 7w, if it is doing 7w of output it should have some sort of cooling solution or it will thermal throttle. There just isn’t enough surface area.

If you can’t spend your CPU channels, ie you are getting low on channels, pcie switches are usually the answer.

for m.2 this drive hit a bunch of review sites last year:
https://www.apexstoragedesign.com/apexstoragex21

this guy also has a bunch of interesting solutions.

Something you may want to do it make 3 disk pools.
1 several hundred T archive
2 30T(or whatever) SSD
3 backup for SSD

Making an independent pool for your ssd backup will allow you to use zfs send and zfs receive to perform frequent incremental backups of your ssd pool, it will also allow you to isolate concerns for your staff. When you receive media from a shoot, you can place it directly on the archive disk. Then you can make your project folder on the SSD array, and transform it into whatever format you use for working projects. The original media is still in its safe place, and there is sufficient throughput to saturate the network in either direction. The working files can be on SSD where the staff know they will be unhindered when working on stuff.

You may make 10 minute snapshots for 2 weeks on the ssd, backing up daily snapshot diffs to its backup drive.

The archive drive may just make daily snapshots forever.

From my experience on a system used by several thousand users in an academic setting, 7 years of daily diffs was about 40% larger than the current data set.

LTT did a video I’ll replicate if this setup becomes an issue. That’s all I meant. What you suggest here also makes sense and I appreciate the specs, thank you.

Here are the pools I’m intending to make:

NVME drives on a PCIe board are where we edit on. Only PPro and AE files/other project files. If these die these are not essential, I can take the time to replace them instead of worrying how fast I need to rebuild an array. Because…

The “main pool” will have the all HDD bays, the entire 216TB (~168TB Usable), as a single pool (to be expanded with another 12 x 18TB drives later) where all our footage and assets are stored. Every night, our SSD PCIe project files get backed up to the “main pool.” We often import past project files, past assets to reuse (we repurpose ALOT, a CRAP ton of projects). So everything needs to be held in the same location upon ingest otherwise we later pay with delays relinking footage. We also have some clients that work with or for each other (construction and plumbing for example, or a non-profit that requested the help of our construction client) so we sometimes reuse footage cross-client (with permission). So maintaining our asset library is a must at all times.

That is why I will be buying two of everything. Since an exact duplicate will be our offsite as well (similar to how our onsite and offsite now is the same).

Basically, our file storage has to be the same. We have 11 years of assets (whether footage, audio, image, or graphics, etc.) we access regularly. So, I’m basically just building a server that has more storage and is expandable. We have enough throughput on our little Synology 8Bays with a 10Gb connection and two expansions. So have 12 drives and an SSD cache, plus 20Gb of a pipeline, I think we’dd be fine with throughput no?

My intention: No snapshots on the SSD. Snapshots on the HDD daily.

We have no more than 2-5 people hitting it at a time. 12 total possible at a time, but unlikely. Also, most are remote editors. So as long as after I build the server, I can hook their local synology units to auto sync every night like they do to our current setup, I’m not worried at all about throughput for the remote editors and the 2-5 editing locally should not be too tough as we film in 4k but are editing in 1080p sRGB. All media cache on a local m.2 drive as well. Proxies also on a local m.2 drive. Each workstation has 3 m.2’s. An OS, a media cache, and a Proxy drive that gets auto wiped since after a projects done we don’t usually need them again.

Your thoughts on this? Good, bad, ugly? I’d assume that if this works on the little celeron chips with good performance our Synology has, EPYCs would be overkill for our workflow, no?

You really should do more frequent snapshots on your production data set. This may save your members of your team a week of work every few months if it is accessible to them.

Look into how zfs does snapshots, and backups of the snapshots to understand this a bit better.

In most cases, snapshots take zero or nearly zero space.

Zfs only writes changes to disk. If you have an older snapshot and a newer snapshot, the new snapshot will only contain the disk blocks that have been written since the older snapshot. It is much smaller, especially on larger project files that gradually grow. Larger files are usually a combination of a database with a list of assets internally. The database is usually tiny, so the snapshot of that will be tiny. The asset list will grow, but the zfs snapshot only contains the changed files. The file system guarantees consistency, and the ability to roll back to any point in time that you have a snapshot without loss.

If you did hourly snapshots of a project folder that gradually grew for a year and was a collaboration of 3 people. I would expect that the cumulative size of the final result and all of the snapshots to be less than 20% larger than the final project folder size.

After the project is complete you can either discard all of the snapshots when you archive it, or all of the snapshots.

There is also a new feature called datasets that gives you more fine grained control if you want that.

Often people try and justify fewer snapshots with system efficiency, but it is a penny wise pound foolish thing. You can spend money for some more hardware, and save several to dozens of hours of employee time.

Zfs snapshots are part of the file system, unlike rsync. You can reorganize your folders on the same volume and it will be a tiny change on the backup instead of double.

Take a look at:
Https://diskprices.com

Set it to u.2 drives.
Instead of getting 8 m.2
Look at some of the 12tb to 30tb u.2 drives.

Modern enterprise u.2 are much more reliable than disks.

You can get a 16x PCIe card (make sure it has proper cooling) to hold 4 u.2 disks. And start with a mirrored vdev of 1 disk u.2 that backs up to an 8 disk raidz2. If that works out, later you can truly mirror it, or add additional mirrors to increase capacity.

NOT spending $1350/drive… We’re writing ~10TB a year on a really good year. With SSD’s at ~500 TBW ratings, I think we’ll be fine.

Interesting. Thank you but its not about file size. Its about unnecessary tasks and necessary ones.

I mean with 30 day or less delivery, snapshoting our projects every night is fine not every hour. Plus all our footage and former project files will be ingested to the pool with all the HDD (what matters most) and all current edits/project files on NVME that get backed up every week, knowing our workflow, how long we await changes from all decision makers, etc. I think we’ll be okay…

I’d probably snapshot the HDD not the SSD’s. Again, if the SSD’s are set to backup to the HDD consistently, why snapshot them twice?

Just curious why I should snapshot the same thing twice?

Thank you for all your advice, not trying to be stubborn just learn. By the way Mike, you had mentioned…

How does one do this on Windows 10? Would this also work on Windows 10? Our only “Linux” based machine is our current Synology’s. And I don’t think I have EVER run a terminal or command through that thing EVER.

For the SSD, you know your business.

For the snapshots, that is how the incremental backups are performed.
You make a snapshot, then transfer the snapshot to the backup disk.

I think you would need to do windows subsystem for linux WSL. I may be able to tell you more in a few days when I fire mine up, but I am in the middle of building some storage and everything is a bit tight. The above line works on the “bash” shell. I think you launch the shell after installing the services. then type
cd
space bar
then drag the folder you are concerned about into the window
press enter
then run the above command.

Oh! So, something similar to let’s say Synology Hyperbackup is not a thing on TrueNas? I can’t say schedule a folder to basically be copy + pasted only replacing new date or different size to the HDD from the SSD’s on an automated schedule? They have to be snapshots??? Cause again, the actual files will need to be accessed again a year, 3-6 months, 4 years, etc. after they are finished. When I think of snapshots, I think of Synology’s snapshots used to restore, not actual .MOV, .MP4, .aep, .pproj etc files we’d important into a new premiere project file. (Think LTT’s vault but we ACTUALLY end up needing and using those assets over and over again frequently.) So if the snapshots are that fine, how the heck do I learn that now (I am currently watching tutorials on TrueNas to learn it as its my first deployment)?

So there is no CMD command I can run instead? This would only be doable in Linux (which I don’t have).

And hey Mike, I appreciate any and all the assistance you can offer. Do what you’ve gotta do and get paid!! All I ask is you get back to me before the topic closes, haha.

The app “everything” will get you a list of all of the files on your computer. I don’t know about export configurations.

If you can get it as a csv, (comma separated value), you can import it into postgres, and have postgres perform the calculation to determine the number of characters in the string.

ie:

select CHAR_LENGTH(fullFilePath), fullFilePath from everything order by CHAR_LENGTH(fullFilePath) desc

I composed this a few days ago and just realized I had not sent it.

Snapshots on the source side are basically free. It is like a lightweight shadow copy. It basically marks the blocks to not delete that are currently in use. Also the file system is all copy on write, so it would not normally have deleted the pervious contents of a file when updating it anyways, it would just have said to the file system, that middle block of the file should now be read from this different section on disk.

also say you have daily snapshots
for a February, 28 days
1 M T W Th F S SU
2 M T W Th F S SU
3 M T W Th F S SU
4 M T W Th F S SU

Then a few years later you realize you are getting low on disk. You can just remove everything except the monday snapshots, or only keep the first snapshot of the month. The file system will take care of merging the changes in the remaining snapshots, and freeing up the disk space.

if you have hourly snapshots on the source disk, and want daily snapshots on the backup disk, you can’t just backup the snapshot once a day, you have to either make sure to transfer all of the incremental stages, or compute the delta between a given snapshot on the source and the most recent snapshot that exists on the destination, then send that.

Also if you are replicating to a backup disk, even though you have just backed up the disk, don’t delete the snapshot or else you won’t have a snapshot to build the delta from next time you want to perform an incremental backup.

ie:

Hi Mike,

Thank you so much! I’m still having a difficult time grasping the importance of these snapshots with regular backups scheduled except for the same purpose I use on our current Synology systems, which is recovery in the event of something catastrophic, like a Ransomware attack.

Could you explain it to me as if I were 5? As simple as possible?

The benefit of snapshots is best explained in comparison to other established ways of doing similar things.

If you want to implement an incremental backup (save only what changed on disk to another location) with a common journaling file system (ext4, xfs, ntfs) the computer need to check every file to see what changed. This is typically done by comparing file metadata (modfication timestamp). rsync/rsnapshot are commonly used tools.
While better than simply copying everything (a full backup), it’s still a relatively expensive operation on large (=many files) and busy (active concurrent access) file systems.

Copy on Write (CoW) file systems (e.g. zfs, btrfs) are designed to never modify data (=blocks written to disk) but instead invalidate old blocks and add new blocks “at the end”. So, the operation of creating a snapshot is as expensive as setting a pointer to the last changed block on disk (= very fast, almost free in terms of effort).
Performing an incremental backup on CoW file systems then doesn’t have to search for changes, it’s simply the list of blocks added between two snapshots (from one pointer to the next).

Also, since existing data blocks don’t change on disk in the process of modifying files, there is the added benefit of “protection from ransomware” because ransomware basically just modifies files (encryption) as long as snapshots are created frequently enough to not loose productive work when rolling back to a known good point in time.
Another good use case for snapshots is an implementation of the time machine: create frequent snapshots (say every hour or even every 15 mins) and allow users to “recover” earlier file versions. It’s baked into Windows.

Both meaningful and good use cases that are enabled because snapshots on CoW file systems are cheap (mostly free). Well, nothing’s free in live. So, in addition of creating many snapshots you probably want some software cleaning up unnecessary snapshots (e.g. keep snapshots every 15mins for a couple of days, daily snapshots for a month, weekly or monthly snapshots for regulatory compliance for years).

1 Like

Hi @jode

I am sorry, but I am still not grasping this. When I hear snapshots, I think of Synology’s snapshots where they are more like a restoration point than a copy of the data.

I’ll simplify the breadth of what the intended action is from throughout the topic:

The server will have two storage pools. 1 made of HDD and an SSD cache. The second, an array of NVME drives.

Diagram:

The HDD/SSD cache combo will be where all our assets live. Not only final assets, but all assets. When we ingest footage, it goes here, not the NVME ssd’s. PDF, MOV, MP4, WAV, AEP, PPROJ, etc. every file goes here.

Example: the “1. Client Asset Library” is where all footage, audio, everything goes. Every .MOV, .WAV, .AI we’ve created or been sent by a client goes there in a subfolder for that specific client.

Where as “2. Client Project FIles” is where all the documentation made, . PPROJ, .AEP, etc. files go.

Folder structure:

image

Diagram:

The NVME drives will be all current projects are edited off of. No footage is stored on these NVME drives, only the project files themselves and deliverable files.

Folder structure:

image

Diagram:

All I am asking, is why can I not setup automated copy/pastes of the files on the NVME’s to go to the HDD every day/Sunday/whatever? How specifically are snapshots different (as simple as possible please)? Replacing outdated files with updated ones and since we’re talking about ONLY PROJECT FILES, KB or MB’s of files each max. I need the actual data to be usuable, openable, and resuable and reopenable at a later date again. If it’s anything like Synology BTFRS snapshots, I don’t see that to be an alternative to this solution. Please advise.

Diagram:

Again, I am coming from only knowing Synology, ASustor, etc. And I don’t really understand what you laid out at this time. From my understanding, snapshots create a “restore point”/pseudo-copy of the data that is not actually individually accessible, unless the snapshot is re-deployed. Which is counter intuitive since that would bring any updated data back to an uncompleted state. Maybe explain this to me as if I was a preschooler…

The entire diagram already shared I was led to believe (and created to confirm this solution) was a good way to go about this:

Characterizing snapshots as “restoration points” is pretty good and thinking about snapshots as “copies of data” leads to confusion.

Yes, zfs snapshots and btrfs snapshots are very similar.

Your requirements are quite specific, but you could use snapshots. For this to work, you need to make use of the fact that (both btrfs and zfs) file system allow making snapshots not only for the complete disk, but for individual “datasets”. A dataset is simply an subset of data on a COW file system that is managed independently from other subsets of a pool. A dataset looks like a folder to the OS, but offers additional capabilities as provided by the file system.

From your description it sounds like you start every new project by creating a new folder on the NVMe pool (plan on starting once you have this setup in place).
To be able to use snapshots you would instead have to create a new dataset for each project - this allows individual snapshots for each project, you can migrate projects individually to the HDD pool and offload/remove projects from the NVMe pool by the time they’re finished without losing the data (because the projects still exist on the HDD pool).

If you feel that’s quite a change from what you could achieve with simple file copies and maybe confused about datasets and snapshots, then maybe a simpler solution is better in your environment (KISS principle).

1 Like