I’ve tried both thunrar and gnome files. They both essentially do the same thing.
I thought creation date would show me when it was created on my machine but I was mistaken.
I’ve tried both thunrar and gnome files. They both essentially do the same thing.
I thought creation date would show me when it was created on my machine but I was mistaken.
With reposync, it synchronizes ctime and mtime (I think the underlying command structure is something like rsync -a
). It’s possible you could sort it by access time (assuming you haven’t mounted your FS with the noatime
option). I’m not sure if your GUI programs support that though. I’d try ls -tu
and see if that gives any results. Pipe it to more
or less
to make the results more manageable.
The gui program allows me to sort by “accessed” But for some reason, ALL of the files in my repo are the same date in the Accessed column.
I was looking for a gui option to solve my issue since I copy the newest files to another folder to burn them to CD to move to another machine.
Oh, probably because rsync reads them to get checksum/mtime.
We may not be able to get the view you want by looking at the directory and file metadata.
Let me dig through reposync’s man page to see if we can enable logging of new files and set up a script to help with this.
man, thanks.
I really should learn Py since I’m sure a few lines would do exactly what I want.
Yeah, thinking about it, a good way to do it would be to use a python wrapper.
Here’s something I came up with in about 20 minutes:
Python 2 because CentOS doesn’t like python 3.
#!/usr/bin/env python2
import json
import hashlib
import sys
import os
import shutil
from subprocess import call
def loadjson():
try:
with open('/srv/wrapper.json', 'r') as wrapperjson:
fileconfig = json.loads(wrapperjson.read())
except:
print("Error loading file, defaulting to empty")
fileconfig = {}
return fileconfig
def savejson(fileconfig):
try:
with open('/srv/wrapper.json', 'w') as wrapperjson:
wrapperjson.write(json.dumps(fileconfig))
except:
print("error writing file...")
def getcsum(filepath):
with open(filepath,'r') as fileobj:
return hashlib.sha256(fileobj.read()).hexdigest()
def validatejson(fileconfig):
# walks through all the dirs:
for root, subdirs, files in os.walk(repodir):
for item in files:
# Can't use "file" here because it's a reserved name
fpath = root + '/' + item
csumval = getcsum(fpath)
if not fpath in fileconfig:
print("file not in array, ading to list")
fileconfig[fpath] = csumval
elif fileconfig[fpath] == csumval:
print("shasum matched")
else:
fileconfig[fpath] = csumval
savejson(fileconfig)
def postcompare(fileconfig):
movelist = []
for root, subdirs, files in os.walk(repodir):
for item in files:
fpath = root + '/' + item
csumval = getcsum(fpath)
if not fpath in fileconfig:
print("file not in array, adding to list")
fileconfig[fpath] = csumval
movelist.append(fpath)
elif fileconfig[fpath] == csumval:
print("shasum matched, skipping")
else:
fileconfig[fpath] = csumval
print("shasum didn't match, moving to target dir")
movelist.append(fpath)
return movelist
def movefiles(movelist,destdir):
if not os.path.exists(destdir):
os.makedirs(destdir)
for fpath in movelist:
print("Copying " + fpath + " To " + destdir)
shutil.copy(fpath, destdir)
# Main entrypoint:
if __name__ == "__main__":
print("Starting script")
# Variables here, change me:
destdir = '/root/test'
repodir = '/root'
reposync_command = 'reposync -r base -p ' + repodir
fc = loadjson()
validatejson(fc)
print(reposync_command)
call(reposync_command.split(' '))
movelist = postcompare(fc)
movefiles(movelist,destdir)
savejson(fc)
running this script with 50 updated packages and only base
repo took about 5 minutes in a container with 1cpu and 512mb ram on a spinning disk based zfs array.
I just finished testing a mostly complete repo sync, all but the 500 or so packages that I had previously sync’d in earlier tests were copied to /root/test
, which is the dir that I’d configured it to go to.
A few things to note. If you want to change where it stores the cache file, change the second line in both savejson
and loadjson
. I didn’t create a variable for this because I was lazy.
You’re going to want to look at the bottom of the script and change the destdir
, repodir
and reposync_command
variables to match your config, at the very least.
So destdir would be where I want the new downloaded files to go to?
destdir
is the directory that you’ll burn to a CD/DVD.
repodir
is the directory where the entire copy of the repo is located.
Does it also put the newest files in the existing repo, or do I have to copy them from destdir over to the repo?
This is awesome, thank you so much for this. I literally couldn’t find any other solution.
It scans repodir
and takes a checksum of all the files, runs the reposync command to synchronize the repo into repodir
then scans repodir
again and copies any new files (either new or changed) to destdir
Should there be indication it’s working?
Also, it starts with
Starting script
Error loading file, defaulting to empty
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
etc, etc
then just blinking cursor after the error prints itself into hundreds of lines
normal behavior?
Yeah, it’s working. If you look at the script, validatejson
is the first section, where it sync’s the filesystem’s state with the json information stored.
for root, subdirs, files in os.walk(repodir):
for item in files:
# Can't use "file" here because it's a reserved name
fpath = root + '/' + item
csumval = getcsum(fpath)
if not fpath in fileconfig:
print("file not in array, ading to list")
fileconfig[fpath] = csumval
elif fileconfig[fpath] == csumval:
print("shasum matched")
else:
fileconfig[fpath] = csumval
savejson(fileconfig)
depending on disk speed, it could take a while.
once that’s done, it will call the reposync command and the script will have much more output from there on out.
ok, I thought the cursor just meant it stopped.
I let it sit, then after the errors, says
error writing to file...
then starts running reposync. Should I ignore the file error?
That just means it wasn’t able to save the json file. It’s not a problem, it would just speed up the first bit, if it were able to save.
It also keeps a copy of the data in memory, so you’ll be okay without it.
It ended up just copying my entire repo into my destdir, which is some 9k items big lol
Oh, that’s odd, shouldn’t have happened, didn’t happen to me.
After it finished syncing the repo, it spit out a bunch of text, one of these three:
file not in array, adding to list
---
shasum matched, skipping
---
shasum didn't match, moving to target dir
depending on was spit out, we’ve got a different situation.
Maybe change the json file path to somewhere that the script user has permission to write to. I’m going to make a couple changes to the script because I’m an idiot and didn’t pass variables properly…
import json
import hashlib
import sys
import os
import shutil
from subprocess import call
def loadjson():
try:
with open('/srv/wrapper.json', 'r') as wrapperjson:
fileconfig = json.loads(wrapperjson.read())
except:
print("Error loading file, defaulting to empty")
fileconfig = {}
return fileconfig
def savejson(fileconfig):
try:
with open('/srv/wrapper.json', 'w') as wrapperjson:
wrapperjson.write(json.dumps(fileconfig))
except:
print("error writing file...")
def getcsum(filepath):
with open(filepath,'r') as fileobj:
return hashlib.sha256(fileobj.read()).hexdigest()
def validatejson(fileconfig):
# walks through all the dirs:
for root, subdirs, files in os.walk(repodir):
for item in files:
# Can't use "file" here because it's a reserved name
fpath = root + '/' + item
csumval = getcsum(fpath)
if not fpath in fileconfig:
print("file not in array, ading to list")
fileconfig[fpath] = csumval
elif fileconfig[fpath] == csumval:
print("shasum matched")
else:
fileconfig[fpath] = csumval
savejson(fileconfig)
return fileconfig
def postcompare(fileconfig):
movelist = []
for root, subdirs, files in os.walk(repodir):
for item in files:
fpath = root + '/' + item
csumval = getcsum(fpath)
if not fpath in fileconfig:
print("file not in array, adding to list")
fileconfig[fpath] = csumval
movelist.append(fpath)
elif fileconfig[fpath] == csumval:
print("shasum matched, skipping")
else:
fileconfig[fpath] = csumval
print("shasum didn't match, moving to target dir")
movelist.append(fpath)
return movelist, fileconfig
def movefiles(movelist,destdir):
if not os.path.exists(destdir):
os.makedirs(destdir)
for fpath in movelist:
print("Copying " + fpath + " To " + destdir)
shutil.copy(fpath, destdir)
def buildreposynccommand(repodir, repos):
com = "reposync"
for item in repos:
com += " -r " + item
com += " -p " +repodir
return com
# Main entrypoint:
if __name__ == "__main__":
print("Starting script")
# Variables here, change me:
destdir = '/root/test'
repodir = '/var/ftp/pub/centos-7-rpms'
repos = [
'base','updates','extras','centosplus','epel','epel-testing','fasttrack','C7.3.1611-base','C7.3.1611-updates','C7.3.1611-extras'
]
reposync_command = buildreposynccommand(repodir, repos)
fc = loadjson()
fc = validatejson(fc)
print(reposync_command)
call(reposync_command.split(' '))
movelist, fc = postcompare(fc)
movefiles(movelist,destdir)
savejson(fc)
Try this.
after reposyncing, got more lines of
file not in array, adding to list
then some lines of
shasum matched, skipping
then copied over almost my entire repo ( I say almost because it was already 4k items copied and I KNOW the BASE repo hasn’t updated that much since my last sync earlier this month.
Ill try your v0.0.2 lol
yeah, I’m sure about it as well.
Yeah, sorry… I should really pay more attention to my code.
Problem now is that since it’s been sync’d, it’s not going to catch any changes in the files…
Do you have a copy of the repo from before you ran it?
EDIT: Oh, just to let you know, I’ve updated the method of listing the repos. Now, just list repos in a python list and it builds the command dynamically.
No, I don’t have a copy, but I have the repos separated, so Base in one repo, updates in another, etc
Is your revised code supposed to do all of the repos?
The way I have things set up is I have all of the mirrored repos in separate folders on the desktop. and another folder just labeled “to_burn” with folders for the repos that are updated
I’m doing base,extras,updates,spacewalk,jpackage-generic,epel