The Ultimate Home Server - Component: Knowledge Repository

Yeah, same here, vi is my principal input tool.

My biggest issue is not collecting stuff, it’s finding what I’ve written later. I collect/write code, scripts, txt, html, “office” docs (MS xls or Libre odt or whatever the platform supplies, WordPerfect probably somewhere), PostScript, PDFs, images, music/sound, videos, zips of above and on and on. What I’m really looking for is an indexer so that I can do a local search through all of that, plus my browser’s tabs and bookmarks, my shell history files and various system headers, man pages, etc.

Back in the early 2000s I set up Swish-E to crawl my machines, so I could search around, but it was a chore and I was never really satisfied with how it worked. Biggest problem was parsing and indexing non-text files, like pdfs and office-like proprietary formats.

Now that I’m looking, I’ve got about 2 million files in my personal archives, dating back about 40 years to when I started saving crap on disk (everything before that is on paper or tape or punch cards, gah!). I’m sure there is a lot of duplication in there, but where? Where’s that C or C++ code that computes training impulse that I wrote in 1992-5 or whenever it was?

I currently have ~2,500 items in my library, so yup, your ammount would be fine.

By default it pulls in whatever is embedded inside the book. Sometimes that is just about everything, other times that can be nothing.

There are plugins to pull metadata from lots of different places (wikidata, barnes & noble, goodreads, etc).

1 Like

Yeesh. I hear that. Usually if I’m searching for something I’ve got a shell with all the directories available abstracted to the system at hand. With computing power having gotten infinitely better over time it’s usually “grep -ir” or “find ./ | grep -i” recursively.

Young me would’ve shuddered at the prospect of the wait :).

Isn’t that just hoarding? how much of the 2500+ items have you touched or read?

This is like the Smaug Dragon from the hobbit.

First of all, this is my first post on the forum. Wendell’s video spurred me into making an account just so I could discuss this topic, so I’m sorry if my reply is somewhat inept.

I’m an academic (kinda, I’ve finished my Master’s, but I still live in an R&D world in my job), so taking notes and also annotating stuff is really important for me. Over the years I’ve been thinking about what I need for an organization system, and have come up with many of the same requirements as pfeiferj. I would say that I probably value PDF annotation above nearly everything else, since I need it basically all the time for annotating scientific articles. Also, since I’m mostly dealing with scientific articles, an included reference manager would be a must.

My dream solution would have as a starting point a system that allows for plain text annotations (I don’t trust the longevity of binary formats), with support for Markdown and at least basic LaTeX (numbered equations shoud really cover most of what I need). Ideally, this system could also be used as a layer which would exist on top of PDFs to allow for better annotation. Also, highlighting PDFs would also be a nice to have. Finally, the ability to add and search tags is a must.

I had recently started looking into possible solutions which satisfy all my needs, and took a basic look into the following Software, categorized by their adherence to the features I need:

Software Open Source Self-Hosted Plain text Notes PDF Annotation Reference Manager LaTeX Support Markdown Support Mobile Pen Input Writing OCR
Zettlr y y y n y y y n n n
Joplin y y y n n y y y ? ?
Xournal++ y y y y n y n y y ?
Polar y n n y y n n y y ?

You may notice that these options fall into different categories of software (text editors vs. note-taking apps). None of them really tick all of the boxes. This was just an initial search I performed, and I posted these here in hopes of receiving feedback from people who have used them, or at least starting a discussion on their suitability.

I’ve also recently looked at hypothesis, but haven’t had a chance to test it more thoroughly.

Finally, looking into the future, I would also like something that would allow me to include code into my notes. For what something like that might look like, take a look at Quarto.

Thank you, pfeiferj, for giving me a great starting point for this reply. Thank you, Wendell, for starting this whole discussion. I am so incredibly happy to find people with whom I can discuss these kinds of things, because they have been bugging me for years now.

P.S.: There are some more things I would need from an academic standpoint, but they are out of the scope of this thread. If anyone wants to discuss them, though, please do reach out to me.

10 Likes

Joplin was a open source app i found that was compatible with my evernote exports…

and you could use nextcloud webdav to sync it with… for self hosted thing.

and now that work blocked my ablity to sync notes with evernote… .i need to migrate off evernote.

Is it hoarding? Maybe, but probably not?

In terms of the things I have tagged as having read or not, I’ve read about ~80% of the items, and about ~70% of the total word count. The actual read word count somewhat less than that, as some things I have read only parts of (like textbooks, reference materials).

How many are worth keeping?

Oh, hmm, yes that is something that I would need at least occasionally as well. However, at least for my use cases, an extension/plugin would work, it would not have to be integrated deeply into how the program is written. How about you?

Based on pandoc, I’ll have to save this to look into.

Given that if I want to reread something, or reference something, and I can’t find it, it will seriously bug me, probably %95 are worth keeping.

If it’s something not worth saving in the first place, it generally won’t get added to calibre.

Off the top of my head I don’t really know how deeply it would have to be integrated. I can think of two simple use cases for this feature:

  1. Referencing an article within my own notes. In this case, I would just like to reference the article by it’s bibliography entry name, instead of the file name, and have the software show me a nicely formatted title on hover (e.g. “On the aerodynamics of bovines” Smith et al. 1997), and give me an option to go to the associated PDF file and its annotations.
  2. Formatting the references for publishing blog posts (I don’t see markdown replacing LaTeX for peer-reviewed publications any time soon), i.e. basically the LaTeX \cite command.

I’m not really familiar with Zotero, but I think that’s the kind of use case that could be solved with a simple integration.

1 Like

The system I use currently consists of the following:

  1. Do good old-fashioned pen and paper note-taking, benefits of writing etc.
    (I also use physical books, but I keep an ebook copy in calibre-web for posterity)
  2. Scan the note pages into a pdf with OpenScan from F-Droid
  3. Review my notes, then edit and revise them into a blog post for my site, this helps me return to the original notes and curate/edit/revise the information into something I would be willing to share with others.

So far this has been working pretty well for me getting knowledge jammed into my brain matter, although I am looking to replace the need to carry around heavy textbooks and notebooks with https://supernote.com/, it won’t be in for another few weeks but I can keep everyone posted with how I like it if it piques anyone’s interest.

1 Like

What I’ve used so far is… the filesystem. Seriously, nothing other than an unlimited tree supporting unlimited filetypes even comes close.

How I write notes varies: text files, LyX, markdown, spreadsheets, scanned handwritten, more spreadsheets, some photographs, some CAD drawings, some pseudocode, some actual Matlab or Julia code, some more spreadsheets. It will always vary and a new format is always a possibility that needs to work in the same structure. And these notes of my own creation are interspersed with a similarly diverse set of information that I have copied from elsewhere.

It grows unwieldy and needs reorganization, in parts, from time to time. But I really can’t see anything being a reasonable substitute.

If there was a sane way to run version control on the entire thing, I’d like that! But I’d be highly surprised and doubtful. Sometimes the pdf is a scanned document to preserve, sometimes the pdf is a book that might be revised, sometimes the pdf is built from a source document and should be ignored. Maybe if we had a file attribute, directly in the filesystem, that told the version control system how to handle each file… I’m doubtful that a sane default for that attribute would even be possible.

Syncthing has been a good thing in my testing. I don’t always run Syncthing, but when I don’t, my laptop is practically never more than a few steps away from me. My phone tends to have Ghost Commander and Termux in support of this craziness.

I’m a bit crazy. I’m convinced it would be more than a bit were it not for computers, people around me tend to think I have that backwards. YMMV.

1 Like

I am actually gonna try cheating with the note taking apps by basically creating a QT program that actually does organization with notes and can also create a new “section” (file) by launching an application. Basically combine that program with say, Xournal++ for instance. I could theoretically expand this to add other programs too so if I want two sections (aka, files) and one of them is a Xournal++ file and the other is say, a LibreOffice Document I can do that, I could probably even include Krita in the organization setup.

I haven’t been on level1techs in a while, just thought I revisit and found some of these Ultimate Home Sever posts very interesting since I am also working on a Home Server but not quite sure what to get for what exactly, it currently functions as just a file server but I want it to do more. I had a web server selfhosted for a website but I gave up on that and am moving the site to WordPress.

@wendell just heard your “Calling All Wizards” and I think we are looking for the same thing here.
I use(d) DEVONthink. For me it’s (sadly) the thing that comes closest to the solution I’m looked for. It builds databases on your machine which you can sync between devices with something like Seafile or Nextcloud.

Pros:

  • you can archive webpages (with a chrome plugin) and emails to the local database
  • most things are searchable
  • a build in OCR makes the rest searchable too
  • you can annotate PDFs (although I haven’t tried pen Inputs)
  • no cloud service stuff or subscription
  • you can putt anything in the database markdown, PDF, txt, epub, etc.
    • and convert one to another (also web archives to plain text or markdown)
    • set a preferred application to open and edit file types
  • scanning things and putting them in the database works great
  • you can create links to reference stuff (everything in the database and external web links) in documents

It would be the perfect solution for me. The UI, tag system and search capabilities really help me organise my brain. For me this thing really is a Knowledge Repository.
BUT it’s Mac only. There is a Server edition for running it on a Mac and accessing most of the functionality though a web browser (I haven’t tried that). But I want something that I can host probably.

(some rambling)
What I’ve learned on the way to a new solution.

For pure reading calibre is ok. I can sort book with tags and search to some capacity.
For some documents paperless-ng is really thought-through. No proprietary database, searchable. But it just isn’t build for collecting knowledge. It is build for archiving documents.
Things I have learned from searching for a solution:

  • being able to archive web pages is a very powerful tool to organize my thoughts
  • being able to collect scans, photos, PDFs, EPUBs and text files together is makes things really easy
  • full-text search on everything is a very powerful tool to organize my thoughts
  • a system for getting my stuff back to port it to another solution is critical

DEVONthink made me realise what good knowledge repository can do for you. Maybe someday an open source project will fill that enormous gap.

2 Likes

+1 for Hedgedoc

For something that’s self hosted with robust clients (web + native PC, tablet, phone), what about project management software?

It’s been a while since I was on a dev team, but I remember using a self hosted axosoft instance that was our collective brain across all projects. It’s a bit overkill for a personal todo list, but there’s built in support for file storage, notes wiki, workflows, and an API for extending it to whatever else you may need.

I don’t have a good solution and it sounds like others here aren’t happy with their solutions. But this post got me thinking, how do libraries do it? I mean, they’ve been making collaborative and accessible knowledge repositories for a long time. Here is some relevant stuff from a Wikipedia dive…

Library perspective

For a long time, written documents were incredibly valuable and sometimes chained in place so that they could not be taken away. Being small curated selections, they did not require complicated methods to organize them. Since the printing press, organizing libraries was revolutionized by classification systems such as the dewey decimal system (which I learned in primary school). Amazingly, the DDC is over a century old and STILL proprietary, but the OCLC “generously” offer an API for some look ups. I think this type of classification could be useful for a personal knowledge repository of mixed media as is often the case in modern libraries.

Could librarian’s tools actually help?

Certainly this type of categorization would be helpful for organizing purchased items that make their way into library databases. After all, why not take advantage of the work publishers do to make their content searchable?

But what about the notes in those books? Or personal note books? Or home photos and videos? Personal finance and ownership (banks statements, car titles, house deeds, surveying records, repair history, etc)? Can we generate and organize labels to store these in a searchable way? I haven’t seen a library that addresses these things directly…

My dream knowledge base

To my mind the dream would be to have a knowledge base that is accessible in both a physical and digital way, while leveraging the computers in my home.

I’ve got a few hundred books - being able to search them in person on the shelves should be just as straight forward (arguably more difficult, but that’s okay) as searching on my phone / laptop / desktop and viewing on an e-reader.

I’ve been filling up a personal bookstack wiki with information on technology and woodworking and other interesting stuff. How can I reference this information physically, as well? Maybe a printed QR code pasted a placeholder “book” for my shelves?

I’d like to be able to search and stream music and videos from my phone / computer / TV (e.g. jellyfin, plex, emby) but I’d also like to be able to pull out a physical item and have it play. Could be its original physical media, but I also love this RFID juke box for example.

Users should be able to collaborate on this repository - could be a family member or even coworkers. Anything too “techy” will just get in the way - it should to be accessible to a variety of people and skill sets so that Grandma can add her recipes too.

I wonder what that would actually look like? Maybe its simply too much work to be feasible?

2 Likes

+1 for the DevonThink suggestion. A self-hosted, open-source DevonThink competitor would be most welcome.

I also quite liked Zengobi Curio when I lived in Macland - it seems the mac is blessed in this regard.

1 Like

I don’t really think I have anything i’ve been using for long in terms of a knowledge repository, on purpose, but if i think about it…

I’ve been using Google Keep, Office Lens, scattered Word documents, scattered Google Docs, scattered Excel files… nothing that fits “repository”. I’m trying to correct this with, right now at least, Obsidian. I don’t have much to say about it that probably hasn’t already been said. A knowledge database is a very personal thing and will take time to figure out my own methodology for.

In the past i’ve always heard people talk about this big air quotes ORG MODE concept in emacs. I’ve tried to wrap my head around it but it just is inscrutable to my brain. Also, there are tons of structured to-do or project planner type programs that also don’t fit with my brain. So i went and did the most simple thing i could think of to keep track of things for myself. I fired up the ol’ Visual Studio Code with GitHub CoPilot installed in it and i made my own darn thing. I don’t know how to program but i know how to tell a machine learning model what i want it to do. navjack/ConsoleJournal (github.com) is what i ended up making. A simple python thing that asks you what you did and you just tell it whatever you want and it will update whatever text file you change the code to point to. i have it set right now to update my “Untracked to-do list” in Obsidian.

like i said, knowledge repository is a new thing i’m trying to figure out for myself and its a slow one to totally evaluate effectively. I need to get everything i have together and neat and then have that be linked together by all the Obsidian pages. I’m sure calibre will fit in to here someplace also.

I too am searching for improved workflow around this. This is what I have learned so far. Please share your thoughts and Ideas.

The importance of documentation and notes - Wiki

I have found my self in situations where I know I have solved a particular problem, but am unable to recall how I did it. This is infuriating and the reason why I began taking notes in wiki a few years back. This has served me well in my personal and professional life.

I most often take notes with notepad++ and end up cleaning them up (or not) and adding them to the wiki. Some times I might take photos of a whiteboard or something else useful. The photos are uploaded to my Nextcloud and are available to be added to the wiki together with the notes.

Key features of a wiki

I chose Dokuwiki because it is a simple platform that has many plugins available. Yes it looks and sometimes feels clunky but it is very functional.

What I like about it:

  • Support for pasting images and dragging files directly in the edit window
  • Strong emphasis on document structure
  • built in version control and change visualization
  • contextual editing - The ability to edit the text under a heading or subheading instead of the whole document - I did not know I needed this and now cannot live without
  • Good search functionality
  • Section templates - Ability to add a template document for a specific section in the tree navigation that will serve as a base for all future docs under that tree
  • Based on files, easy to back up
    *Themable pdf export to export some content with company letterhead for clients in my professional life

What I dislike

  • It is and feels old
  • Some of the modules are hard to understand
  • the learning curve is steeper than some other alternatives - I feel it was worth it
  • Updates are very infrequent
    *Not possible to paste in html with images (from webpages as rich text)

Task management and ingest - Clickup

A wiki by itself is not enough since there are many things that require my attention and they need to be prioritized. In the past I have used Trello to deal with this but found it was missing some features I needed. Trello is an easy and modern tool to get started.

I ended up choosing Clickup for managing tasks, projects and incoming tasks. It has a rich feature set and the people behind are improving it rapidly.

What I like about it:

  • The ability to create sub tasks - This is key in my process because it enables me to group related tasks into projects
  • Custom fields and attributes
  • board, list and other ways to view tasks
  • Calendar integration and good due dates
  • custom views of tasks with filtering support with many parameters

What I dislike

  • Not self hosted - Can live with this
  • slow UI, specially when there are hundreds of tasks
  • Sometimes there are bugs and issues - They do fix these
  • Sometimes it is complicated to use, since there are so many features - Learning curve

Bringing it all together

So I have a wiki and some task management, now what…?

In my experience this is not enough since the mind works much faster than these tools and I find this combo to be lacking in speed and ease of use. I often find my self having quick ideas and thoughts that require some further processing. I find that if I do not do it immediately, the ideas tend to evaporate back to the ether they came from.

Ingest workflow - Slack + Clickup = Wiki

In the past I have saved quick ideas as IM messages to myself and it has been a somewhat working solution.

Since I have my phone with me at all times I want to use it as the main ingest tool for quick ideas, notes and tasks. The process needs to be as quick and stress free as possible.

The plan

I am planning of including the Slack IM tool into this process since it is a tool that I use all the time to communicate with different teams and colleagues. I will build a Slack API integration that will enable me to build an ingest workflow for my thoughts, ideas and tasks.

The idea is to have a Slack integration that would enable to add Clickup tasks directly from slack. Yes I know there is a Clickup integration already but this is not fast enough for me since it requires the user to make multiple selections and choices when adding the task.

I will set it up so that in slack there will be preset commands that add the text after the command as a task in Clickup on a preset ingest list.

/idea quick description of the idea before i forget it would create a task on the idea-list with the header of ‘quick description of the idea before i forget it’

I am planning of having multiple commands for around 5-10 different ingest lists like: Movies, work related tasks for A,B and C, home related tasks, books, etc.

I think that this will work quickly and easily enough that I will be able to capture my incoming ideas and tasks effectively.

No ingest without outgest

Now that there is a place to collect ideas and tasks quickly, there needs to be work done around the tasks to make it useful.

I am planning on working, enriching and organizing the tasks in Clickup. Some of the tasks will rot away and eventually end up deleted. Some will be enriched with further information and moved to other Clickup lists. Some will be enriched and moved to the wiki. Some will stay in the ingest list and be completed there.

I hope this will spark some thoughts and ideas as @wendell did in me with this topic.

1 Like