Disclaimer: I am not a programmer (obviously). My programming skills so far mostly extend to some hastily thrown together Python scripts and some code in Mathematica. (Also I did a course in C and a bit of Java in school but we didn’t do a lot there) Most of the “programming” I currently do is implementing some maths for data analysis, so don’t expect too much from me.
I can not guarantee that I will be able to complete the challenge because other obligations in my life take priority but I will do my best to make it. For this reason I have slightly modified the rules I will promise to adhere to, because I don’t like to make promises that I am not sure I can keep.
I, anotherriddle, will participate to the next Devember. My Devember will be “Learning the skills required to code my project org-software*”. I promise, I will do my best to program for my Devember for at least an hour, every day of the next December. I will also write a public devlog and will make the produced code publicly available on the internet. No matter what, I will keep my promise.
My challenge*:
Improving programming skills to get closer to realizing my pet project. The current title in my project folder is very unimaginative “org-software” for organizing software. Some time in the future I want to be able to write a proof of concept program with a user interface and the basic functionality realized.
So, my Devember challenge will be:
improve Python programming skills
learn how to make simple GUIs with Qt 5 (Pyside 2)
try to build a few of the features of my “org-Software” if I get that far (probably not)
side quest: improving my work environment and setup for programming
As me trying to learn serious programming is probably not very exciting for you guys, girls and Apache helicopters I will also document my outline for the “org-software”.
Project org-software:
The problems I want to solve with my program are manifold, let me explain:
I regularly work with a huge amount of files and data sets, regularly also together with a group of other researchers. Some examples for files are research papers, binary files, FITS files, CSVs, time series data in various forms, python notebooks, mathematica notebooks, … . Some of these files are part of multiple projects which makes it hard to keep them in sync in a hierarchical file system, also duplicate files in various places that are at different stages of progress are regularly a problem. For this reason I want to adapt a file system that works with tags that I can use to “create” directories. Another feature I want is the ability to add comments and tags to any kind of file without touching the file itself to avoid problems with the programs that use them. Going further I want to be able to mark “positions” in any kind of file (like timestamps in videos, coordinates in photos or FITS files, sections in PDFs, locations in websites, …) and be able to add notes. Furthermore, the UI (a bit of a mix of markdown editor and file manager) should aggregate files and comments by tags, provide search functionality and ideally a version control system. All these features should be cross platform and have to work on Windows 10, Mac OS and Linux.
This is a rough outline what “org-software” should be able to do, I have written down a lot more notes than that. I will continue to work out the features I want in a minimum viable product and add to this thread.
It is really hard to describe what I want this program to look like in the end. I hope it is at least somewhat understandable what I am trying to do.
Dev log updates
Day -1
Day 0
Day 1
Day 2
Day 2 | Devember dev log 3
This is only going to be a short update today. Nothing much exciting happened.
I ran into dependency issues again and wasted a lot of time.
IDEs are not helpful when you need other dependencies and it can’t handle environments properly (or when the user is just stupid :eyes:)
Made some progress with understanding sql.
went through some of the sqlalchemy tutorial https://docs.sqlalchemy.org/en/latest/orm/tutorial.html#create-a-schema
The next plan of action:
I will need to sit down tomorrow and get my Qt setup running.
continue with sqlalchemy and the interactions with Pythons datatypes.
reminder to self, take a look at redmine and other project management software
or second 435,153,374,899,987,524 after the mean photonic decoupling in this universe
Currently I am in the process of reading up on Pyside 2 and am doing some tinkering in the editor while I have memtest running on my main rig. Yesterday I finally got some additional 32 GB of ram and now I’m testing it to avoide surprises in the future. It might not be the smartest time to change something on my system right now but I am regularly running into the 16 GB limit and start filling swap. So happy about the additional memory
I am still not entirely sure whether to choose Pyside2 or PyQt5 … decisions, decisions …
I think I’ll go with Pyside2
Thanks! I know already that I bit off more than I can chew with this one. I have never done anything remotely on that scale. If you have any feedback down the road I’d appreciate it. I’m not even sure if I was able to convey what I want to do and if people would find it useful.
This section will get heavily cut down, it’s only for brainstorming.
keywords
tagging file system
database file system
transactional file system
knowledge base
document management system
features
• Make links to inside documents, audio, video, changing pdfs
• version control
• Make easily navigatable by keyboard only
• menubar with few symbols, more description
• Deduplication
• create files from inside and outside the software
• give every item tags as a substitude for a directory structure. For example, a pdf or tutorial video or link can be "used" in projects, self assembled learning material
• use rich text formats
• no floating windows, everything has a fixed place and does not cover other menu items or workspace
• make content "offline available" via button click, i.e. a video tutorial, or website
• archive button -> compresses files and moves them out of cache
• have a "primary tag" field where a project name can be typed in. It lists all the files with that tag, all links, comments
• reverse image search
• canvas to drag and drop images, files, Flowcharts, for brainstorming
• separate into server part of the software and the client part, so it can easily be used for more people
• monitor filesystem changes
• have an optional local cache
• option to prepare question and link to forums, i.e. I have a question about some code that I am writing, I'd like to have a "link" that opens a reddit-thread and notifies me about changes
• have an option to
• auto-group tags or associeate them together
• warn when a new type of tag is created and reccomend an alternate, already existing one
• make it easy to consolidate tags
• when a link is copied into a project file it automatically gets the tag of the project
• tag scans
• thumbnail of video, website or item
• mindmap
• auto-caption videos for in-video-search
• separate all settings into a control-panel; no settings should be accidentally changable from the main UI -> it should be aconcious decission
• abonement-feature for websites (i.e.youtube, reddit) when there are changes
• make personalized "Workflows" and also have example workflows
• make the software modular, so if you only want a tagging filesystem or other single features, it is possible, or at least hide everything that is not necessary
• have a process that forces to document
• have a software make screenshot with the file version at that time (snapshot)(show visually the "version")
• make easy to move around items in a list (drag and drop)
• make it easily possible to separate data (i.e. import and export only project related data)
I kind of wasted a lot of time reading up on docs today. At this point I have mostly various code snippets from testing and I am far past my time budget for today. In my opinion those are not a lot of use to anyone so I will post code when I think it makes sense to avoid spamming useless stuff.
Not quite, it might seem that way because version control is one of the features I want. I don’t want to reinvent the wheel, the plan is to use as much of the techology out there as possible. What I am going for is more of a piece of software that ties the functionality of various software together.
I will defenitely do more research, look at other software projects and write a better feature list and start out with some minimal requirements.
Also, it is really not realistic that I will ever finish this project but I am going to learn something along the way (I have never done databases and GUIs before). The ideal outcome of all this is for me to be able to present a bare minimum, functional proof of concept that shows what I am going for and maybe even find someone that finds utility in it and wants to help with developement. The worst case outcome is that I will have learned some GUI and database programming and a bit about the complexity of real world projects.
As a stopgap while I get my first (semi useful) code cobbled together (still running into dependency issues) I think it is a good idea to better explain what I am trying to do. Also I will set some minimal requirements that I am aiming for. I already had a couple of “oh shit, I have no idea how I am going to do that” moments.
software that I am totally going to use or steal ideas from
find a unique identifier for any file (or watch the filesystem for changes)
associate every file identifier with any number of tags and the filesystem path (this will be done with a database)
make it possible to add comments to any file with a keyboard shortcut or a drop down menu (this should work independent of files sytem and operating system)
navigation should be possible with keyboard only as well as mouse or touch screen only. (tab and arrow keys)
The GUI should have a search function and a “search bar” for tags that can be combined with logic operators
The GUI should display all “search results” and optionally associated comments in a window (details will follow)
reverse search, display all tags and projects that a certain file is part of
version control will be directly done with git
When copying files or text inside the org-software environment there should be an optional prompt to add the appropriate tag. By default the source of the file (i.e. website, pdf, …) should be saved and associated with the tag
links to inside files that are also tag-able (i.e. page or chapter in pdf/book, coordinates in image or fits file and time stamp + coordinates in videos) (not sure what’s the best way to implement this, possibly double checking with search and image recognition?)
…
I will need to continue to break this down further. My next steps are going to be getting my feet wet with GUI design and sqlAlchemy and solving the problem of ID-ing files. Probably will be using the tsmu approach. Also I’ll need to find out how best to “link” into files without changing them. This will be a lot of experimentation.
This probably already exists but I don’t know what to search for, maybe someone can help me out. A hash is supposed to be completely different when a tiny bit of a file changes. Is there something like a hash that is the same if files are very similar?
I hope I’ll have time tomorrow to continue working on some code but it will be tight. My overall time spent on this by far exceeds the two hours intended by the challenge even when I have nothing to show for, for now. Which is a bit frustrating.
The next plan of action:
I will need to sit down tomorrow and get my Qt setup running.
continue with sqlalchemy and the interactions with Pythons datatypes.
reminder to self, take a look at redmine and other project management software
If you are only interested in the “org-software” project skip this update, not much of interest here.
Quick Update from the bus:
I don’t have anything coherent yet, mostly modified code snippets from various tutorials but I have not posted any code yet, so here is at least something.
I try to keep to an hour because there is not a lot of time at the moment. For me it’s kind of hard to pick up the thread from the last day my natural work cycle is usually more around 2 hours per item.
Anyways I made some progress with
planning the overall project
I realized how little I usually work with python it’s really mostly implementing or changing a bit of math
sql is strange and even though I don’t need to handle it directly I need to get used to sqlAlchemy. I will probably need to come up with a couple of simple examples on my own.
random code - random sample from playing around with sqlAlchemy and other stuff - mostly from tutorials
import sqlalchemy
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:', echo=True)
connection = engine.connect()
connection.execute(
"""
CREATE TABLE users (
username VARCHAR PRIMARY KEY,
password VARCHAR NOT NULL
);
"""
)
# tags VARCHAR
connection.execute(
"""
INSERT INTO users (username, password) VALUES (?, ?);
""",
"test_user_0", "1234"
)
result = connection.execute("SELECT username FROM users")
for row in result:
print("username:", row['username'])
connection.close()
#reminder: don't write useless comments
#connection.execute(
# """
# INSERT INTO users (username, password) VALUES (?, ?, ?);
# """,
# "test_user_0", "1234", "test_tag"
#)
#
#
import subprocess
import pyinotify
w = pyinotify.WatchManager()
cases = pyinotify.IN_MODIFY
class EventHandler(pyinotify.ProcessEvent):
def process_IN_MODIFY(self, event):
print "modified"
# note: inotify works on linux only
#
#
Maybe I have some time later to make this post a bit more coherent and add the more interesting stuff. I have some handwritten notes. Usually I don’t write about what I do, so I have to get used to that.
Today was hard for me to find a continuous free hour, so I did more research and continued a bit with the sqlalchemy tutorial. But I found this gem:
The phrase I was looking for is “fingerprinting”. I’m not sure yet if I can use that same principle for what I have in mind but it’s very interesting. Sometimes I get too easily distracted when I find something cool that my timer went off. Anyway, hopefully more interesting updates will be coming soon.