Return to Level1Techs.com

Versions Backup Client | Devember log

devember2k18

#1

So at this point, I’m not sure if I’m going to be using my personal blog or this thread, but I figured that if the community wants to get involved, this will be a good place to have discussion, since I don’t have any comment features on my blog yet.


Without further adieu, I present my pledge:

I, @sgtawesomesauce, will participate to the next Devember. My Devember will be Writing backup software in Go. I promise I will program for my Devember for at least an hour, every day of the next December. I will also write a daily public devlog and will make the produced code publicly available on the internet. No matter what, I will keep my promise.

With that, it’s probably good to describe the purpose of this project at a high level. The project has a number of goals, outlined below. I’m going to expound upon them as the month goes on, but frankly, I’m not a great writer, so this will probably look a bit more like ABI docs and frustration rants than anything else.

Versions will be:

Deduplicated

Versions will be, first and foremost, deduplicated. We’re not looking for anything all that special here, just want to do some basic deduplication so the compression algorithms don’t have to work so hard in the secondary stage.

Compressed

Versions needs to be compressed. There are plenty of file types that cannot be deduplicated, but can be compressed. We’re going to do that. At the end of the day, Versions is all about efficiency in storage and retrieval. We want to be able to store data and retrieve it very efficiently.

Checksummed

Data is no good if it’s not accurate. This goes for anything, backups are no different. We will checksum the data, so we can always know the data is good without even testing the restore!

Mirrored

Eventually, once we have the network (not a priority for Devember) aspect of Versions working, we’re going to have a solution to mirror data to multiple targets.

Self-Healing

When backing up to multiple sources, Versions will have the ability to self-heal a damaged archive during archive verification by finding a good copy of the damaged data (or Chunk) in a sister source and copying it over.

Open Source

Version is to be released under the GPL v2, to better protect both the copyright holders and the users freedoms. Github repo to be announced.


Link to Versions github repo:

Link to Versions blog tag, for those interested:

https://blog.kebrx.tech/tag/versions/


#2

You forgot to insert objective. :wink:


#3

Updated OP, check it!


#4


#5

How about that backup my man? Any progress as of late?


#6

I’ve been quite busy at work (and feeling under the weather), but at the moment I’m working on sorting out the basic storage and checksumming of chunks.

Once that’s done, I’ll be writing base logic for storing files and versions of the file. Essentially my idea is this: Each File gets a UUID. The file holds path, name and Version references. Each Version gets a UUID. File references Version by UUID, Version holds file permissions (possibly, not sure if this is needed) and an ordered list of the Chunk UUIDs.

Once that’s done, I can start on the base indexer and executer logic. I’m hoping to make this all as asynchronous as possible and make extensive use of goroutines.

Oh, I’m also trying to design a Journaling database to help reduce write limitations. Plan is to use a journal to cache writes, then write a bunch of entries to postgres or something similar in one transaction, to make it faster. Once the transaction closes, we clear that segment of the journal out. Thoughts?


#7

Work has started laying out the structure of the journal.


#8

Okay, I’ve gone down the rabbit hole on database design. Eventually, I’ll want to implement a Log Structured Merge tree design into my database, but at the moment, I’m just going to be doing journaling that flushes into sqlite… Just wanted to do a quick update here, I guess.

The scope of this project is quickly expanding into something of epic proportions.


#9

Day -1

I’ve made some progress on the Journal today. I’ve created the cache writer logic and I’ll be working on the logic to flush that cache into a SQLite database today. I hope this will be quick.

The good thing is that this journal is can be abstracted to use for any function I want! I’ll be using this for Chunk, Version and File metadata at the moment. I have no need for any other metadata right now, so yeah.

Just thought I’d dump an update.


#10

Day 0

Made progress. I’d say the journal is 99% done. I’m sure there’s an edge case somewhere I’m not checking for, but at the moment, it’s all good. I’m in the process of connecting the journal to sqlite, but it’s unlikely I’ll finish that in time tonight. I’ll definitely be trying to get that done tomorrow.

Once that’s done, it’s going to be time to build up the chunk storage logic.


I realize I’m not pushing to github. I’m having some issues with my git client config, and rather than fixing it, I’m just going to keep coding until I have time to figure out a solution.


#11

So how do you plan to test your code?


#12

go test I’m assuming :sunglasses:


#13
AWS_PROFILE=production terraform apply

I only test in production. :stuck_out_tongue:

But in all seriousness, Yes, I plan on using go test, once I figure out unit testing. I’ve never done it before, and it seems like a lot of work for very little gain at this point so I’m going to hold off until I have a larger codebase.


#14

Legit pro.

Yeah, it took me a while to see the value, big or small, project didn’t matter, I just skipped it.

Once you figure it out and automate it, it completely pays off.

It looked like you were using VS Code earlier. If that’s the case, they have a nifty feature with the Go Analysis Tools that allow you to run individual tests.

Go is weird in that when you run your test file, it just says “success” or whatever. It doesn’t tell you what passed or what failed, like with JavaScript or PHP. It was helpful using VS Code.


#15

the one and only!

That’s very nice. I really want to get this project to the point of “working, ish” then I’ll start building tests for it.

I can’t imagine that’s the case. That seems very counter to the mission of go.


#16

Day 1

Spending a lot of time rethinking database design. Data Structures are addicting.

I’m thiiis close to implementing an LSM database, but know I need to just get this shit underway.


#17

Day 2

I know this blog is late. I had a busy day though and while I didn’t get as much time as I’d have liked, I still got my 1H in. I mostly did research and experimentation with channels and goroutines for business logic of backing data up. It’s looking quite cool and efficient, so I’m going to try implementing it.

Will also allow for easier adoption to a clustered network based backup solution later, since you can just bolt on some network logic in between the channels!


#18

NOT THE LEADER!! NOOOOOO!!!


#19

I got dragged into a bunch of bullshit at work, so I didn’t have time most of the week. I did work on it today (yesterday) and I’ll post an update tomorrow about that.

Suffice to say, I’ve failed this devember, but that doesn’t mean my project is over.


#20

Does this mean you don’t get the badge?!