Seeking recommendations for AI with focused towards Sysadmin/Devops

TheRoamingTech · September 22, 2024, 12:22pm

Hello, everyone!

I’m a bit late to the AI game, having mostly been an observer due to a busy work schedule. I’ve recently regained my focus and started rebuilding my lab to explore and refresh my knowledge as well as certifications, and I’m really enjoying it.

That said, finding information these days feels like a challenge—Google is pretty much worthless, and “google-fu” has become a lost art.

I’m looking for recommendations on the best AI tools for Sysadmins and DevOps professionals that are reliable and possibly self-hosted (as long as the resource demands aren’t too high). I regularly watch Ryan and Wendell’s content and know they’ve shared their preferences, but I find it hard to keep track of which they prefer, when, and why.

I realize this might be a loaded question, as there probably isn’t a single “best” option. However, I’m new to AI and have limited time to dive deep into the topic like I typically do. I’m interested in using AI as a tool to aid my learning in areas relevant to my current role as a Sysadmin at a white-glove consulting firm/MSP, while also exploring modern implementations and DevOps in my home lab. Being that I am more of a IT generalist at the moment, I would be using it for your typical tech questions to understand the concept better, review scripts, etc. I wouldn’t be focused on it creating items for me as much as it would be a learning tool.

I appreciate any insights you can provide! If this has been asked before or if there’s a helpful write-up out there, please point me in the right direction.

Thank you!

JayVenturi · September 22, 2024, 3:41pm

Sir
As you said: loaded question as it depends on what type of Ai you refer to and to put it mildly you can go from CNN to NLP and everything in between.

Then:
Are you talking dynamic DB, training and verification DB, net scraping etc

Then:
Are you discussing distributed, cloud, time renting, or local

Then:
….for what application, assist, environment, etc

Glad to help,

As there is no “AI”, it’s just programming, decision trees, algorithms, and processing in ordered nodes, the industry has segmented itself into the sections above and the derivatives.

What “segment” are you referring to?

TryTwiceMedia · September 23, 2024, 5:06pm

Honestly sounds like you want a ChatGPT subscription.

That’s just LLM to give you a good start to most projects you hand it.

But be warned, if you are engineering solutions:
There is no commercially available machine learning platform that can develop novel solutions.

ubergarm · September 23, 2024, 9:23pm

I’ll give my very opinionated answer and assume you’re interested in more recent GenAI and LLM stuff. I’m a devops/sysadmin/robotics/software engineer generalist that is funemployed right now, and this is how I kick the tires on the new coolness and learn a lot along the way.

Setup

Do your development on Linux or (Windows Subsystem for Linux if you must). Running this stuff comes down to just basic python venv dependency management, compiling c programs and OS package management, and a little concurrent networking fun.
You’ll likely need 24GB VRAM to try out many of the local models (smaller models are possible on less)
If you only want to consume 3rd party APIs, you could do that (e.g. claude/openai/grok or groq or whatever/blahblah) too. I’m aware they exist but almost exclusively run models locally hosting for myself and friends.

Image Diffusion

Check out ComfyUI for a python server self-host a visual web app for connecting many models and tools together visually to create complex workflows.
Best model to run at home currently is FLUX.1-dev available on civit.ai. You can also run older stable diffusion models and there are others coming out still. There are a million bizarre LoRAs too for any niche interests lmao…

LLMs

I’d recommend git cloning llama.cpp and building llama-server.
Then you can download whatever GGUF models that fit into your VRAM from hugging face e.g. the recent Qwen2.5-32B-Instruct in 4-bit quants does fairly well in computer science benchmarks.. Keep an eye on r/localllama for latest model releases, benchmarks, etc.

# 30~35 tok/sec or so generation speed pgood
# more advanced options for running parallel batched inferencing too
./llama-server \
    --model "../models/bartowski/Qwen2.5-32B-Instruct-GGUF/Qwen2.5-32B-Instruct-Q3_K_M.gguf" \
    --n-gpu-layers 65 \
    --ctx-size 8192 \
    --cache-type-k f16 \
    --cache-type-v f16 \
    --threads 16 \
    --flash-attn \
    --mlock \
    --n-predict -1 \
    --host 127.0.0.1 \
    --port 8080

I just discovered aphrodite yesterday and have it running 4-bit AWQ quants in batch size of ~5 concurrent requests hitting over 60 tok/sec in aggregate out of the box. Fairly easy to install and try it out e.g.

mkdir aphrodite && cd aphrodite
# setup virtual environment
# if errors try older version e.g. python3.10
python -m venv ./venv
source ./venv/bin/activate

# optional use uv pip
pip install -U aphrodite-engine hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

# it auto downloads models to ~/.cache/huggingface/
aphrodite run Qwen/Qwen2.5-32B-Instruct-AWQ \
            --enforce-eager \
            --gpu-memory-utilization 0.95 \
            --max-model-len 4096 \
            --dtype float16 \
            --host 127.0.0.1 \
            --port 8080

If you want LLMs on “easy mode”, check out LMStudio, koboldcpp, and other pre-build binary inference apps. They also provide an OpenAI API compliant endpoint you can plug into anything that alrady consumes chatgpt etc.

Misc

There are also impressive vision models that can OCR handwritten images or describe a photo fairly well like Qwen2-VL-7B-Instruct.

You can quite accurately extract text from podcasts/youtube videos using whisper-ai

There are some open text to speech models like parler-tts with okay-ish voices, but have to be careful in how you batch your generations or it goes off the rails lol… Then you can take your TTS output and run it through an RVC v2 to improve the quality…

Conclusion

That’s enough for now, but you get the picture.

Just create a new project folder or git repo and python virtual environment for each toy you want to play with. Then go through the README and see if you can get it to run.

Once you have a decent LLM going you can ask it questions, but honestly the bots are not “there yet” imo, but occasionally useful similar to a web search.

Have fun on your journey of exploration, and keep us posted what you decide to try out!

TheRoamingTech · October 1, 2024, 3:52am

Thank you all for your input, the forums here are always great! I have a lot to look into just from the few comments and truly appreciate your input. Once I do I will be back for sure.

Thanks again!

TheRoamingTech · October 1, 2024, 4:10am

I just wanted to take a moment to thank you for your incredibly detailed response. Everyone here has been helpful, but you really went above and beyond.

I have a lot to sink my teeth into here, and as for resources my main concern is vRAM. I have plenty of cpu and RAM (around 400 GB) available, but I’m a bit lacking in vRAM and GPU power. I feel pretty behind on AI and get a bit overwhelmed trying to figure out where to start (never mind the whole time thing), but I think this will at least give me a good starting point to narrow down my focus.

I’m looking forward to digging into this and really appreciate your time and very detailed response!

Thanks again!

TheRoamingTech · October 1, 2024, 4:19am

Thanks for your response! I was thinking about this, actually. Mainly because it feels like it just fits. However, I have been using the free version more lately and find it to be pretty awful. I don’t know if it’s because of the free part, boundaries that have been added, or that it’s eating it’s own poo at this point, but it’s been extremely frustrating to use lately. For example, I commonly use it to review a script I wrote, or ask it how to do something I am fairly familiar with but want to refresh my memory, and it will give me very obvious errors that should not be there. When you correct it, it responds with “You’re correct! Let’s revise… blahblahblah” and it gives me the exact same information with the exact content it acknowledged as being incorrect.

ubergarm · October 1, 2024, 2:23pm

Thanks!

If you have plenty of CPU and RAM (but not so much VRAM), then check out the Mixture of Expert (MoE) models. They take up a lot of RAM when loaded, but only a small fraction of the weights are active during inferencing so more acceptable speeds on CPU.

You could use llama.cpp with bartowski/DeepSeek-V2.5-GGUF at the largest quant size that will fit in your RAM (250GB for the Q8 quant, less for smaller quants). You might be able to get 3~7 tok/sec depending on your hardware and quant size. Probably one of the best local MoE models for refactoring scripts or writing bash/python functions etc. I’ve heard ktransformers is faster and more compatible with DeepSeek models, but I haven’t tried it yet. (llama.cpp has some errors with deepseek and prematurely stops generation).

fwiw, you can see how the previous DeepSeek-V2-Chat model is competitive in this random ass code benchmark with some of the paid APIs:

Cheers!

TheRoamingTech · October 13, 2024, 9:18pm

Random question, do you have any suggestions if I were to upgrade my setup? Or even build a dedicated machine and get the biggest bang for my buck? I assume it would be in some GPU’s and with the new gen (of everything) coming out I am starting to see some great deals on used hardware. I mention building a dedicated machine because I have been considering building a new NAS and obviously that would not require a lot of resources, especially proc/mem/gpu, so it would technically be shared with the NAS which is very lightly used.

As always, thanks !

ubergarm · October 13, 2024, 10:53pm

My experience is that for big ai models you generally want a system with the most RAM i/o for the cheapest $$$.

More RAM total allows you to load larger models. Faster RAM i/o generally translates into faster inferencing/generation speed.

In my opinion the sweet spot is likely remain a used 3090TI FE w/ 24GB GDDR6X VRAM running at ~1000GB/s for $600-$700 USD. Compare that to my AM5 2x48GB DDR5-6400 benching barely ~75GB/s. I’d love to see a comparison chart of various systems including RAM capacity, aggregate i/o bandwidth, and $$$.

If you want to try the largest models (albiet likely slower inferencing speed), you could go with an older server class system with more (but slower) RAM capacity. If it supports multiple PCIe slots, you could add one or more used GPUs as desired.

Honestly, I’d suggest to start off with whatever gear you already have on hand! For example, try to get koboldcpp running with the bartowski/Meta-Llama-3.1-8B-Instruct-GGUF largest sized quant that fits in your current GPU.

Get a feel for the speed and quality. Try out larger models for a few $ on platforms that run open models like https://groq.com/ or whatever.

Then if you decide its worth it, you could add a used GPU with 24GB VRAM. Eventually, you might build a new rig and move the GPU over etc. If you don’t have a specific project in mind that will keep a dedicated machine running constantly, just play around a bit at a time and evolve your kit as needed.

Just spitballing haha… Have fun and keep us posted what you decide!