The small linux problem thread

I’ve tried both thunrar and gnome files. They both essentially do the same thing.

I thought creation date would show me when it was created on my machine but I was mistaken.

With reposync, it synchronizes ctime and mtime (I think the underlying command structure is something like rsync -a). It’s possible you could sort it by access time (assuming you haven’t mounted your FS with the noatime option). I’m not sure if your GUI programs support that though. I’d try ls -tu and see if that gives any results. Pipe it to more or less to make the results more manageable.

The gui program allows me to sort by “accessed” But for some reason, ALL of the files in my repo are the same date in the Accessed column.

I was looking for a gui option to solve my issue since I copy the newest files to another folder to burn them to CD to move to another machine.

Oh, probably because rsync reads them to get checksum/mtime.

We may not be able to get the view you want by looking at the directory and file metadata.

Let me dig through reposync’s man page to see if we can enable logging of new files and set up a script to help with this.

man, thanks.

I really should learn Py since I’m sure a few lines would do exactly what I want.

1 Like

Yeah, thinking about it, a good way to do it would be to use a python wrapper.

Here’s something I came up with in about 20 minutes:

Python 2 because CentOS doesn’t like python 3.

#!/usr/bin/env python2
import json
import hashlib
import sys
import os
import shutil
from subprocess import call


def loadjson():
	try:
		with open('/srv/wrapper.json', 'r') as wrapperjson:
			fileconfig = json.loads(wrapperjson.read())
	except:
		print("Error loading file, defaulting to empty")
		fileconfig = {}
	return fileconfig

def savejson(fileconfig):
	try:
		with open('/srv/wrapper.json', 'w') as wrapperjson:
			wrapperjson.write(json.dumps(fileconfig))
	except:
		print("error writing file...")
		

def getcsum(filepath):
	with open(filepath,'r') as fileobj:
		return hashlib.sha256(fileobj.read()).hexdigest()

def validatejson(fileconfig):
	# walks through all the dirs:
	for root, subdirs, files in os.walk(repodir):
		for item in files:
			# Can't use "file" here because it's a reserved name
			fpath = root + '/' + item
			csumval = getcsum(fpath)
			if not fpath in fileconfig:
				print("file not in array, ading to list")
				fileconfig[fpath] = csumval
			elif fileconfig[fpath] == csumval: 
				print("shasum matched")
			else:
				fileconfig[fpath] = csumval
	savejson(fileconfig)

def postcompare(fileconfig):
	movelist = []
	for root, subdirs, files in os.walk(repodir):
		for item in files:
			fpath = root + '/' + item
			csumval = getcsum(fpath)
			if not fpath in fileconfig:
				print("file not in array, adding to list")
				fileconfig[fpath] = csumval
				movelist.append(fpath)
			elif fileconfig[fpath] == csumval:
				print("shasum matched, skipping")
			else:
				fileconfig[fpath] = csumval
				print("shasum didn't match, moving to target dir")
				movelist.append(fpath)
	return movelist

def movefiles(movelist,destdir):
	if not os.path.exists(destdir):
		os.makedirs(destdir)
	for fpath in movelist:
		print("Copying " + fpath + " To " + destdir)
		shutil.copy(fpath, destdir)

# Main entrypoint:
if __name__ == "__main__":
	print("Starting script")
	# Variables here, change me:
	destdir = '/root/test'
	repodir = '/root'
	reposync_command = 'reposync -r base -p ' + repodir
	fc = loadjson()
	validatejson(fc)
	print(reposync_command)
	call(reposync_command.split(' '))
	movelist = postcompare(fc)
	movefiles(movelist,destdir)
	savejson(fc)

running this script with 50 updated packages and only base repo took about 5 minutes in a container with 1cpu and 512mb ram on a spinning disk based zfs array.

I just finished testing a mostly complete repo sync, all but the 500 or so packages that I had previously sync’d in earlier tests were copied to /root/test, which is the dir that I’d configured it to go to.

A few things to note. If you want to change where it stores the cache file, change the second line in both savejson and loadjson. I didn’t create a variable for this because I was lazy.

You’re going to want to look at the bottom of the script and change the destdir, repodir and reposync_command variables to match your config, at the very least.

1 Like

So destdir would be where I want the new downloaded files to go to?

destdir is the directory that you’ll burn to a CD/DVD.

repodir is the directory where the entire copy of the repo is located.

1 Like

Does it also put the newest files in the existing repo, or do I have to copy them from destdir over to the repo?

This is awesome, thank you so much for this. I literally couldn’t find any other solution.

1 Like

It scans repodir and takes a checksum of all the files, runs the reposync command to synchronize the repo into repodir then scans repodir again and copies any new files (either new or changed) to destdir

Should there be indication it’s working?

Also, it starts with

Starting script
Error loading file, defaulting to empty
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list
file not in array, adding to list

etc, etc
then just blinking cursor after the error prints itself into hundreds of lines

normal behavior?

Yeah, it’s working. If you look at the script, validatejson is the first section, where it sync’s the filesystem’s state with the json information stored.

for root, subdirs, files in os.walk(repodir):
		for item in files:
			# Can't use "file" here because it's a reserved name
			fpath = root + '/' + item
			csumval = getcsum(fpath)
			if not fpath in fileconfig:
				print("file not in array, ading to list")
				fileconfig[fpath] = csumval
			elif fileconfig[fpath] == csumval: 
				print("shasum matched")
			else:
				fileconfig[fpath] = csumval
	savejson(fileconfig)

depending on disk speed, it could take a while.

once that’s done, it will call the reposync command and the script will have much more output from there on out.

ok, I thought the cursor just meant it stopped. :slight_smile:

I let it sit, then after the errors, says

error writing to file...

then starts running reposync. Should I ignore the file error?

That just means it wasn’t able to save the json file. It’s not a problem, it would just speed up the first bit, if it were able to save.

It also keeps a copy of the data in memory, so you’ll be okay without it.

1 Like

It ended up just copying my entire repo into my destdir, which is some 9k items big lol

Oh, that’s odd, shouldn’t have happened, didn’t happen to me.

After it finished syncing the repo, it spit out a bunch of text, one of these three:

file not in array, adding to list
---
shasum matched, skipping
---
shasum didn't match, moving to target dir

depending on was spit out, we’ve got a different situation.

Maybe change the json file path to somewhere that the script user has permission to write to. I’m going to make a couple changes to the script because I’m an idiot and didn’t pass variables properly…

import json
import hashlib
import sys
import os
import shutil
from subprocess import call


def loadjson():
	try:
		with open('/srv/wrapper.json', 'r') as wrapperjson:
			fileconfig = json.loads(wrapperjson.read())
	except:
		print("Error loading file, defaulting to empty")
		fileconfig = {}
	return fileconfig

def savejson(fileconfig):
	try:
		with open('/srv/wrapper.json', 'w') as wrapperjson:
			wrapperjson.write(json.dumps(fileconfig))
	except:
		print("error writing file...")
		

def getcsum(filepath):
	with open(filepath,'r') as fileobj:
		return hashlib.sha256(fileobj.read()).hexdigest()

def validatejson(fileconfig):
	# walks through all the dirs:
	for root, subdirs, files in os.walk(repodir):
		for item in files:
			# Can't use "file" here because it's a reserved name
			fpath = root + '/' + item
			csumval = getcsum(fpath)
			if not fpath in fileconfig:
				print("file not in array, ading to list")
				fileconfig[fpath] = csumval
			elif fileconfig[fpath] == csumval: 
				print("shasum matched")
			else:
				fileconfig[fpath] = csumval
	savejson(fileconfig)
	return fileconfig

def postcompare(fileconfig):
	movelist = []
	for root, subdirs, files in os.walk(repodir):
		for item in files:
			fpath = root + '/' + item
			csumval = getcsum(fpath)
			if not fpath in fileconfig:
				print("file not in array, adding to list")
				fileconfig[fpath] = csumval
				movelist.append(fpath)
			elif fileconfig[fpath] == csumval:
				print("shasum matched, skipping")
			else:
				fileconfig[fpath] = csumval
				print("shasum didn't match, moving to target dir")
				movelist.append(fpath)
	return movelist, fileconfig

def movefiles(movelist,destdir):
	if not os.path.exists(destdir):
		os.makedirs(destdir)
	for fpath in movelist:
		print("Copying " + fpath + " To " + destdir)
		shutil.copy(fpath, destdir)

def buildreposynccommand(repodir, repos):
	com = "reposync"
	for item in repos:
		com += " -r " + item
	com += " -p " +repodir
	return com

# Main entrypoint:
if __name__ == "__main__":
	print("Starting script")
	# Variables here, change me:
	destdir = '/root/test'
	repodir = '/var/ftp/pub/centos-7-rpms'
	repos = [
		'base','updates','extras','centosplus','epel','epel-testing','fasttrack','C7.3.1611-base','C7.3.1611-updates','C7.3.1611-extras'
	]
	reposync_command = buildreposynccommand(repodir, repos)
	fc = loadjson()
	fc = validatejson(fc)
	print(reposync_command)
	call(reposync_command.split(' '))
	movelist, fc = postcompare(fc)
	movefiles(movelist,destdir)
	savejson(fc)

Try this.

after reposyncing, got more lines of

file not in array, adding to list

then some lines of

shasum matched, skipping

then copied over almost my entire repo ( I say almost because it was already 4k items copied and I KNOW the BASE repo hasn’t updated that much since my last sync earlier this month.

Ill try your v0.0.2 lol :slight_smile:

yeah, I’m sure about it as well.

Yeah, sorry… I should really pay more attention to my code.

Problem now is that since it’s been sync’d, it’s not going to catch any changes in the files… :frowning:

Do you have a copy of the repo from before you ran it?


EDIT: Oh, just to let you know, I’ve updated the method of listing the repos. Now, just list repos in a python list and it builds the command dynamically.

No, I don’t have a copy, but I have the repos separated, so Base in one repo, updates in another, etc

Is your revised code supposed to do all of the repos?

The way I have things set up is I have all of the mirrored repos in separate folders on the desktop. and another folder just labeled “to_burn” with folders for the repos that are updated

I’m doing base,extras,updates,spacewalk,jpackage-generic,epel