Sysadmin Mega Thread

not really…

dmesg told me that its using 01:00.0

and i can find that in lspci… but nothing else looked simliar…

tried lstopo…
neat picture… but doesnt really help me figure out my question… ill check the manual and see if its topo pic can give me insights…

actually… i see pci 08:00.0…
that is empty… it appears to be on the chipset and not the cpu… so im betting thats the m.2 slot on the rear of the MB.

Thanks everyone

3 Likes

I finally took a week off from work and literary on the first day of my holidays I get an email about imminent infrastructure collapse as the first thing in the morning when going to turn off my work phone

Fucks sake

2 Likes

How to search the internet for help on Oracle Cloud issues. Works every time.

7 Likes

The number of sites which require something like that is kind of depressing. :neutral_face:

1 Like

The second week of training (and the last week of training) for my job started so well, but Friday was such a shit show. Friday was the first day we were expected to take calls without someone looking over our shoulders remotely, but if we needed help or had a question, there was supposed to be someone available in team chat to help us. First thing in the morning, I couldn’t log into the company VPN. I had to reach out to the company IT help desk. (I work in-store support) It turned out I didn’t log out of the company VPN properly the day before, and some settings changed, so the VPN software wouldn’t let me log in. Because of the hurricane that hit a part of the United States Thursday, many company stores’ cash registers couldn’t correctly be closed out Thursday evening because the power was out. When the power was restored Friday morning, All the stores that lost control Thursday had errors and needed to be correctly closed out before the stores could start a new day of sales, which forced the store manager to call store support because when a store register isn’t closed correctly their is a procedure that needs to be followed, which a store manager can’t do store support has to do it. Every call I got except one had this problem. To solve an incorrectly closed cash register, I would have to remote into the cash register and reset the cash register, tell the equipment to send yesterday’s sales reports to headquarters, and make sure the store manager isn’t locked out of his store registers. Unforchanly, for some reason, I don’t have the program on my computer that would allow me to log in remotely to any store’s equipment. I tried to reach out in the chat to get one of my Co works to log into the store’s equipment but never got a response, so the only thing I could do was create a ticket for him and put him back into the store support queue. I got assigned ten calls where the problem was an improperly closed register, and all I could do was fill out a ticket and send them back into store support’s queue. By the end of my shift Friday, I didn’t want to deal with technology at all. All I wanted to do was get drunk, and I finished the bottle of alcohol I got for my birthday. Have any other forum members had a shitty first day as I did? Thanks for letting me vent. I feel better.

I thought the whole point of having an IT department is to help automate tasks, not devolve into more hard copy paperworks.

But anyway the password reset was due to updates requiring 2FA, so yey, still a win for security modernization. The problem is the only option is either a Titan security key or a cellphone number. Why not a software TOTP option? This is frustrating…

Just in case you are wondering, yes this is a government facility… :persevere::joy::rofl::sob:

2 Likes

I’d gladly trade slow ass shit unintelligible confusing Service Now request forms for good old sheets of paper.

1 Like

The place where we used to keep the best track of our items was one where we had double entries: paper and electronic. You could be sure that if something went wrong with our entries, that you would find the truth in papers, as those have to be signed by both logistics and the borrower. Same for returns.

Also, I don’t remember well, but I think it was an ISO requirement for keeping both paper and electronic track of items. I don’t recall if it was 27001 or 9000 (I think it was the former).

1 Like

Yeah. I started at a place that was on fire when I walked in. I managed to fix the issue on day 2 and then on day 3 I became the lead. Unfortunately, I am still at that place. The IT stuff is no longer an ess show but the customer and our ISEO counterparts are.

Ever work Federal Contracts? Most of those people that are in charge do not even have the credentials that you must have to retain your job.

Enough said. Thank you for your service!

3 Likes

First week back from a holiday and already had a jawdrop moment

So we have a second geographical site with a DC that I’m planning on building a disaster recovery site, but I never seen it before or saw pics and was just told “oh its two racks and bunch of desktops”. I’ve had one of the guys from that office hop on a video call with me on Wednesday 1and show me what there is.

It wasn’t bunch of desktops…

It was 30 desktops in shelf racks all connected together into a beowulf cluster. This thing is tripping multiple breakers and during the summer the AC was not able to keep up (How do ya like 30c/86f DC room??). They’re been demanding server upgrades and infra redo for eight years and nobody listened until I came on board.

What in the actual fuck.

Told this to C-Suite, they went “good job” and told me to create an action plan for remedying the issues
Which means MORE PAPERWORK, SEARCHING FOR VENDORS AND EXCEL SPREADSHEETS AAARRGH!!!

At least we’re killing the Pentium III (Yes, turns out we had something older than P4 servers SOMEHOW) servers

IN OTHER NEWS!

  • Found out that we actually have local centos 7 repo mirror for updating hosts on the network, turns out that specific host is in the DMZ and can never update its own repo
  • Spent the entire week building a large ansible playbook that creates local mirrors for centos7 and extras, rocky8 and extras, Dell OMSA for 7-8, epel for 7 and 8 and docker for 7 and 8 and it just passed the final test at uuuh… 10pm just now. Yay my homelab with hypervisor clusters is being useful!
  • CTO insisted on just using janky script that creates 4 SSH tunnels (two of them are literally SSH tunelling each other on the local system) to create a proxy and let the DMZ host update from the internet (WHAT’S THE POINT OF THE DMZ AT THAT POINT???) , I insisted on making the mirror stuff because damn I’m not manually updating stuff via a janky bash script
  • Had a chat with Dell solution architects regarding financing some HW because haha what even is a budget in this year and I think I’ll go Dell APEX, they weren’t happy about me choosing TrueNAS for backup storage and the storage guy went on a tangent how Dell stuff is better
  • Cybersecurity insurance renewal came with some new mandatory requirements, bricks were shat

Now I need some help regarding keeping things patched, SSH login security and keeping network inventory and auto-discovery.

Is there any self-hosting free alternative to Ansible Tower? Preferably something that still uses Ansible as all I’m writing at this point is just Ansible playbooks for deploying stuff. Primary use is to use it for automatic patching management and kicking people to use LDAP for authentication via SSH (Key auth and LDAP creds are considered 2FA SSH huehue)

What’s the ideal way of doing auto-discovery of hosts on a network with Zabbix? I’m in the middle of deploying it but I had to stop as its currently being used for SNMP crawls to have nodes disabled. I also want to have a security onion or just Wazuh deployed as well but I need to learn way more stuff about it and I don’t have the capacity in my homelab for it

2 Likes

With the amount of ransomware hits from the past 2 years, no wonder. Every insurance company wants people to pay, but to never pay them back, so asking to up the security makes sense. Well, I wouldn’t buy cybersecurity insurance in the first place, I’d just be doing best practices and teaching our people not to click that link.

You could probably combine Rundeck with your ansible playbooks. I used to use Rundeck to backup Oracle DBs on a schedule and get alerts if they fail or timeout or go past the backup schedule into working hours and kill the jobs. Well, I didn’t use Ansible, I only used so-called “janky shell scripts” (because they worked and got the job done very well).

I don’t know about SNMP, I always configured it manually in Centreon when we migrated to it, but Zabbix has a very cool discovery feature when using the Zabbix-agent. I would highly suggest using the agent. After I enabled the discovery, it discovered VMs I didn’t know were not added to monitoring (I used to install the agent and later on enable SNMP on every first install). I wouldn’t have migrated from Zabbix IMO, but it was a 2 to 1 decision. Centreon was really neat when it came to custom building monitoring and auto-actions (using shell scripts, of course), because of its legacy from Nagios.

We had some custom scripts written in Ruby on Zabbix, that nobody understood. I don’t know how they ended up there, but they were monitoring stuff (duh!). Only later did I discover we could write shell scripts in Zabbix too, but it was too late. Me and my colleagues inherited the infrastructure, similar to how you did and we had to untangle a lot of stuff. Did some upgrades, mostly to the hardware and software side and some side-grades on the network side, until my colleagues did some real network upgrades after I left the company. We planned it together, but I just had other plans.

I agree the SSH tunnels should get out, but the DMZ doesn’t necessarily mean it should have no access to the internet. Although in all fairness, a proxy would definitely be a nice addon for security, while blocking http and https traffic if not going through the proxy. You could just add a proxy in the DMZ that has access to the internet and do a manual export of the proxy in the script that would launch the repo sync (http) on the local network mirror.

1 Like

Penny pinching? Greed? “But that is how Google started…”

You guys still do not have a CIO or CISO? You should ask for a raise or the Title.

Power wise, they are better than P4s. Ask me how I know.

You can usually build a nice Pizza oven with those, lawl.

Is some business sectors it is required now, at least here in the USA. Legalized stealing if you ask me but with all of the breaches and what not it is a prime market for the uninitiated.

2 Likes

VP gave me thumbs up for Rundeck after threatening to throw Ansible playbooks at the already pre-existing CI/CD pipelines for our product already over capacity.

I was told no to having a proxy as it was too much workload to maintain and we can’t afford to throw more bodies at the problem

Penny pinching and lack of care. Before Covid, the previous IT decided to just give each developer their own desktop under the table for running test VMs, instead of building out proper infrastructure for that site. When covid hit and everyone had to work from home, panic ensued in March 2020 and all of those boxes were put there like that and never touched after that

Yeah they won’t give me the CISO title because I’m the lone IT person in the company and there’s no budget for hiring a full time CISO, so I’m trying for the extremely dumb option of…

CISO as a Service.

Bingo. Its not a government requirement but its a requirement from one of our large customers.

3 Likes

Welcome to the club. just make sure that they make it worth your time while you are acting as.

1 Like

So I tried Rundeck, and I can summarise the experience of just trying to run a simple “uname -a” for six hours unsuccesfully straight in this gif:

trash-know-your-place

Rundeck’s way of handling SSH keys is a hot garbage mess

SSH keys need to be in the old RSA PEM format, modern SSH key formats like ed25519 are not supported due to unmaintained library, keys saved in global key storage can’t be used in projects, and Ansible integration completely ignores the key storage built-in and can’t be used

All that I got from Google-fu were constant open github issues (or closed by a bot for “being stale”) regarding ssh authetication. Also official docs gave me a bunch of 404s. Seems like it went super downhill once Pagerduty bought it.

So yes, still need recommendations for Ansible Tower (and Rundeck) alternatives for running Ansible playbooks.

1 Like

Holly water! I’m sorry to have recommended that. When I used it (versions from 2.something to 3.something-else IIRC), it was fine. I had a user that the rundeck server was running under, aptly named rundeck. It just had a RSA 4096 key in openssh format, generated by ssh-keygen. I got the pubkey copied to all the Oracle VMs and I created 4 jobs in rundeck to execute “ssh [email protected] /path/to/backup/script.sh.”

The commands were ran on rundeck locally, using the VMs in each group in series (group1 whatever alphabetical order they had in series, with group2, 3 and 4 in parallel, but each having the oracle backups run in series). And we could execute the job on-demand if needed. It was running just fine for us, but I didn’t really like it that much. It was fine, but nothing I couldn’t do with crontab and pssh (well, it was a glorified crontab with on-demand stuff for us, and some nifty alerts in case parts of the jobs failed, or backups timed out and had to be killed).

For what I needed it to do, I could probably recreate it myself in php, but I’d rather not introduce that to the mix. K.I.S.S., if I can use a terminal and I don’t have other users who want a fancy GUI, then I’ll use the CLI.

So you can’t use rundeck as a wrapper for your ansible playbooks like this first commenter on plebbit does?

I don’t know what has changed in rundeck, but 1.7 years ago when I last used it, it was fine. Again, running all the commands locally on rundeck as a glorified crontab and alerting in case of failure or timeout, not through other things that may have been added recently.

Semaphore seems to still be maintained, has seen a few commits the past days.

And this one is supposed to be an alternative to Ansible Tower. Never used it, so use it at your own risk.

During testing I was able to run commands and do ssh auth IF I did not use Ansible. When I followed the video guides from rundeck themselves for Ansible, I couldn’t even get the Ansible inventory file to populate the nodes in Rundeck unless I used YAML (Ansible supports both ini and YAML) and I disabled fact gathering.

When running playbooks by hand they worked, but rundeck just spat out ssh key errors when it tried to run that same exact playbook. Turns out that damn thing is creating a copy (or something) of the ssh key to /tmp/rundeck/ and trying to use that one instead of the specified ssh key in the job, so I just gave up trying to get it to work

I believe the ssh issues are caused by the jsch library that has not been maintained for ages now.

Rundeck seems fine for regular scripts or shell commands, but the Ansible support is definitely underbaked as hell

2 Likes

AWX not suitable?

Not a recommendation. I haven’t gotten around to using it yet, so just curious.

1 Like

Main issue with AWX is that its not “production ready” and updates will sometimes break the instance where you have to start from scratch. AWX is not off the table yet, but I want to explore stuff other than AWX first to not deal with AWX just exploding in my face randomly from an update.

2 Likes

Yeah, and I assume no willingness to buy Tower from RH? If it were me, I’d give AWX a shot and just stay on top of snapshots/backups in case of a breaking update, but admittedly, that becomes untenable at a certain point if it’s seeing a lot of continual use.

2 Likes