R9700 project giveaway - AMD mini-pc complex rag system windows native (no wsl)

ajc9988 · December 23, 2025, 4:04pm

So, I purchased a Bosgame M5 AI mini-pc with a Ryzen AI Max+ 395, Radeon 8060s, and 128GB of unified memory this summer. I want to use one of the PCIe x4 slots for NVMe to an external PCIe x4 board and use the R9700 Pro with it.

I have been developing an advanced RAG system that operates on Windows Native. Talk about a headache. I tried different approaches as I am not that great of a coder, but am excellent at troubleshooting hardware.

The most recent incarnation is an MCP server with a hybrid RAG configuration, utilizing PostgreSQL 18 + pgvector and Neo4j. The ingestion system requires multiple stages, including docling (with tesseract or rapidocr for the ocr system) parser, a custom chunking design down to the paragraph level and footnote awareness (in progress), with sliding windows if the paragraph needs further split to avoid truncation, domain detection as a means to select the NER (tons of NER based on domain) in a multiphase NER system for entity extraction before being ran through Glirel for the relationship extraction/building. Entity extraction is so that you can have a base core level of relationships and entities, but then you can have domain specific relationships and entities. This is so that you can define hundreds or thousands, but narrow them down automatically to only those relevant for the scientific text, legal text, medical text, so that you do not overly bog down the graph database with irrelevant relationships and entities, while at the same time having a rich environment (instead of simplification). That includes sub-domains (like legal breaking to civil and criminal, to civil-securities, down to statutes). Additionally, the domain detection is used to enrich the metadata for the postgres+pgvector database.

Then you have another small model like qwen3 or granite4-tiny go through and generate summaries for the document as a whole (3 sentences), the section (1 sentence), and paragraph (1 sentence) to further enrich the information for selection of chunks relevant to a query.

And that is just the ingestion system (and there are more models used in the ingestion, but to recap Qwen3-embed, Deberta-MNLI for domain classification, granite4-tiny for summarization, domain specific NER + gliNER + glirel for 2-stage NER). Meanwhile, retrieval uses colbert, qwen3-embed, BM25 (built into postgres 18), graph retrieval (repurposing and custom training a graphsage/graphrag combo to come later), and the RRF to the Qwen3 reranker to feed to the main model being used.

All of this is on Windows 11 without the use of WSL, meaning I have had to develop with the growing pains of HIP SDK, eventually getting the current python ROCm packages and having to migrate to new environments (started where the Ryzen AI 1.5.0 was on Python 3.10, so some AI/ML packages were not available and easy to break using the vitisai, directml execution providers; now on Ryzen AI 1.6.1 on Python 3.12 with ROCm 7.10 from repo.amd.com with torch 2.9). I could regale you with stories of doing the compiling for a custom triton amd windows and the headaches of trying to get flash-attention 2 to work, going with sdpa on math for docling, etc. Development on windows has been… fun.

So let’s throw another variable in the mix: that R9700 Pro. I would set it up with a daughter board (occulink nvme to pcie), then offload some of the models (or possibly the main model for something like a smaller ai coder like Qwen3-coder-30B or granite4-tiny with a large context window) to ensure I do not run out of memory. I already use about 66% of the 128GB when everything is running to date. When you add a main local model, you are now getting to the edge of what this machine can handle. I keep the models warm, but may have to move to lazy loading otherwise.

I also have an advanced ingestion for code repositories specifically when the domain is detected with per language custom AST-based checks to enrich the neo4j graph database.

But, this is a work in progress. It is not done. But this would be an amazing addition and shows how small dedicated models add to a sum larger than its individual parts.

ajc9988 · December 24, 2025, 6:11pm

I’m doing a 2-phase NER design. It is likely overkill for your use case, but is great for mine (Personal Injury Law). My intake involves an MNLI for domain detection and JSON definitions for categories and sub-categories per domain. It has to discover domain to understand if it is an academic paper, a legal case, a medical record, etc. Then the parsing is to help set it up for two things: 1) chunking (with document summary, section summary, and paragraph summaries, table and chart awareness and footnote awareness) down to the paragraph level, and 2) entity extraction, which requires a proper parsing and domain detection for categories and sub-categories to then select the proper NER for entity extraction and relationship buildings. I use multiple NER models from HF for the medical entity extraction and relationship building, then GliREL for the relationships.

huggingface.co

OpenMed (OpenMed)

Health x AI

They have a host of open medical NER models for different uses in the medical community. With the proper domain detection, and parsing, you could tag the metadata for specific things easily or summarize it easily, then have that pull the relevant information, including diagnostic coding per procedure, and map the care in a graph database.

That’s why I am interested in your project, as yours seems like a more targeted approach to what I am creating and I’m hoping to glean what I can from yours to work on my own. LOL.

But, as you can tell from the description, this is much more involved, code intensive, and requires studying the per model licenses carefully to find models that allow for commercial use without license fees. Then you have the limitations on the Neo4j licensure if you are doing each patient with its own database instead of proper RLS with both restrictions on project (per case or per client ID) and tenant-id restrictions, among other things, to prevent access by unauthorized personnel, etc.

I’m still learning the database management side myself, plus it needs to qualify for HIPAA, which locally hosted, but having the restrictions on access for each patient or client is still needed to guarantee compliance.