Sysadmin Mega Thread

Dynamic_Gravity · May 21, 2022, 2:47pm

Yeah it definitely jank lol.

But thanks!

And the text was an unexpected issue. I don’t know if it’s because I’m going from digital to analog or if the basic output is the problem.

The text is illegible and I think if I put a DE on it might be able to tweak the resolution and settings.

I think it’s a 480i display but I’m not sure. The owners manual doesn’t mention the specs.

thunderysteak · May 22, 2022, 3:00pm

Hey! I’m the person that did a major refactoring of the Ansible Playbook for Mastodon and ported Mastodon to function on RHEL. Totally did not do it because Debian/Ubuntu always grenades itself on me and I wasn’t happy with some security stuff.

If you use the playbook, I’d recommend including this fix to fix issues with media

github.com/mastodon/mastodon-ansible

Fix for Mastodon 500ing on media upload due of incorrect file permissions

mastodon:main ← thunderysteak:main

opened 02:43PM - 22 May 22 UTC

thunderysteak

+22 -0

Turns out that file uploads are being handled by NGINX, and because we are runni…ng NGINX under a non-default user due of avoiding permission issues for the files in home folder, `/var/lib/nginx/` now has incorrect permissions and gets missed completely in testing until you have full instance deployed and try to upload media and causes 500. Just fixing permissions for `/var/lib/nginx/` to allow nginx to write files into `/var/lib/nginx/tmp/` fixes the 500 issue

I also 100% recommend throwing it behind a reverse proxy. I also got a playbook for that but only on my local gitlab instance as its missing letsencrypt

playbook.yml

---
- hosts: all
  remote_user: <YourUserHere>

  handlers:
    - name: restart nginx
      service: name=nginx state=restarted

  vars:
    reverse_proxy_ip: <IPOfMastodonHost>
    domain_name: <FQDN>

  pre_tasks:
    - name: Permit SELinux permission to allow NGINX to make proxy connections with httpd_can_network_connect
      become: yes
      shell: "setsebool -P httpd_can_network_connect 1"
  
    - name: Permit SELinux permission to allow NGINX to make proxy connections with httpd_can_network_relay
      become: yes
      shell: "setsebool -P httpd_can_network_relay 1"

  tasks:
    - name: Copy nginx config with RHEL folder stucture
      become: yes
      template: 
        src: ./files/nginx/mastodon-reverseproxy.conf.j2
        dest: /etc/nginx/conf.d/mastodon-reverseproxy.conf
        notify: restart nginx

/files/nginx/mastodon-reverseproxy.conf.j2

map $http_upgrade $connection_upgrade {
  default upgrade;
  ''      close;
}

server {
  listen 80;
  listen [::]:80;
  server_name {{ domain_name }};

  location / { 
    return 301 https://$host$request_uri; 
    }

}

server {

listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name {{ domain_name }};

keepalive_timeout 70;
sendfile on;
client_max_body_size 80m;

location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass https://{{ reverse_proxy_ip }}/;
proxy_read_timeout      90;
tcp_nodelay on;
  }
 
location /api/v1/streaming {
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto https;
    proxy_set_header Proxy "";

    proxy_pass https://{{ reverse_proxy_ip }};
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_buffering off;
    proxy_redirect off;

    tcp_nodelay on;
}

    ssl_certificate /etc/letsencrypt/live/{{ domain_name }}/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/{{ domain_name }}/privkey.pem; # managed by Certbot

}

There’s room for improvement, but it works “in production” for me

cotton · May 24, 2022, 5:20pm

Please let me know if this is a good use case for autofs?

We have hundreds of systems that have NFS mounts setup in /etc/fstab. DBA’s on occasion log into these systems and run database dump similar to sqldump to the NFS server.

However in a given year a particular server may only have this done once.

When we have to patch the NFS server and it requires it to reboot. When we do this many of the client systems cannot reestablish their connection to the NFS server and it causes the file system to hang. You have to umount -l the share and the reboot the system, but it cannot remount the drive.

I’d like to avoid this situation. The DBA won’t mount the share, run their db dump, and then unmount it.

Therefore, we have these mounted NFS shares on clients which are rarely used, however, when we patch the NFS server, many of them get hung up in a bad state which requires the system to be rebooted.

Therefore, to solve this issue, setting up autofs might be an option. In the sense that when the db dump takes place it mounts the NFS share until it’s done, at which point it unmounts it.

Is this a good use case for autofs, or is there a better approach? Also, if autofs is a good tool for this any gotchas you’ve faced in the past or docs you found helpful?

cotton

ThatGuyB · May 24, 2022, 11:41pm

I’m going to talk out of my ass a bit, but I’ll explain my reasoning.

The definition of reconciliation is to make 2 sides be friendly with one-another, or to convene 2 or more parties to come to an agreement.

In the context of IAM (Identity and Access Management), there are two key words usually going together: access grant and reconciliation. This goes together with the concept of least privilege. You grant access to whoever needs, only grant as much access as needed and only for a period of time. Then you reconcile, that means you “get together” and decide if the other party still requires said permission. If they don’t need them, you reconcile to revoke the previous grant. If they still need access, you reconcile to have the party keep their permission for another period of time.

TBH, I don’t remember reading about access reconciliation in ISO-27001, but that was more of a skimming through the standard, rather than an in-depth read and memorize. Although not really a standard in the US, the things mentioned there are very good best-practices. And the best thing about the standard is that you can always change how things are done in your infrastructure depending on the risks you want to assume. I’m not familiar with a US counterpart standard.

So, in that regard, reconciliation basically means revoking previously granted access or renegotiating access for a period of time.

Man, I remember when I used to get all of those permissions to give people on certain Samba shares involving different projects, but the access was more or less permanent, due to the nature of the company (everyone was working on everything). I could have probably automated the removal of the rights on the shares and would have been pretty easily given how I gave the rights on the projects (the Unix way, using group ownership and Samba shares config files), but I didn’t. Thankfully when people were leaving the company, it was just as easy as deleting a user and that was it. Now that’s a reconciliation.

oO.o · May 25, 2022, 1:09am

Yep, this was the conclusion we came to. Good call.

rcxb · May 25, 2022, 3:47am

autofs is fine, but also look into the soft and intr options to mount for NFS. They will help a great deal.

oO.o · May 25, 2022, 8:03pm

Anyone have experience with Aspera or Signiant? I think we’re going to end up with Signiant because I don’t think Aspera has been cleared by the infosec people which can take months/year+ as far as I can tell.

Use-case is large file availability across hybrid work for video postproduction. Obviously not much you can do about people’s internet connection at home but we’d like fast LAN connection to the same files people are accessing at home. Ideally the home user experience is equivalent to desktop OneDrive/Google Drive/Dropbox/etc where files are cached locally and optionally stored for offline but are also available locally at office over 10GbE.

Note that this is not in the environment I typically post about here. This is an international F500 situation. I will not be implementing anything, just advising.

thunderysteak · May 25, 2022, 9:33pm

Hey remember how I said that I accepted my first proper sysadmin job like a month ago? I started on Monday and uh…

regret

…this is not a dumpster fire, this is a rubbish island covered in oil and on fire.

I need help trying to unfuck the last 5 years worth of damage and neglect caused by the previous sysadmin, without either resigning or drinking myself to death. I don’t even know where to start…

This is some of the extremes I found from looking at this company’s infrastructure, and I’m here only 3 days and still lacking access to most things. I started pushing heavily for 2FAing everything, but I don’t know what to focs on next.

Either extremely old or missing documentation, and split across 3 different knowledge base systems that are public facing
Previous Sysadmin allowed personal devices on the corporate network and VPNs, and people keep using their personal laptops to this day even in the office today for work
No backups.
Everyone has local admin rights and nothing is regulated what can be and cannot be installed
No MDM for laptops or iOS devices (MDM exists only for Android)
No monitoring of any kind
No disaster recovery, incident response or “boot from zero” plans
Nobody locks their laptops when leaving it alone, resorted to starting to put passive aggresive “lock me pls” notes onto people’s laptops after not listening to me
No available corporate laptops or budget available for purchasing new laptops, but somehow still got a brand new M1 Pro Macbook Pro in box
Asset management is non-existent. We have a system for it, but its completely blank
2FA missing from almost everything, including a public facing knowledge base containing plaintext passwords. Massive push-back from getting 2FA enabled for things
Unpatched CentOS 6 and CentOS 7 boxes
Most of the infrastructure is unknown black box with no documentation
No antivirus or endpoint protection. Even HR during my induction brought it up that its weird that there’s no AV on corporate laptops
admin/admin or root/root credentials for out of band management for everything
We apparently have LastPass Business subscription, but literally nobody uses it or knows about it other than two different people
The previous admin applied the company for ISO 27001 certification before quitting and the security audit is soon

And I saved the best one for last:

Running stock regular nmap port scan against the internal infrastructure is enough to systemically crash everything

Today I channeled my inner BOFH and approved the third party security auditors to “throw the whole kitchen sink at us”, knowing well that it will cause everything to explode into an uncontrollable ball of hellfire just to get the management to allow me to give me free reign to fix everything.

redocbew · May 25, 2022, 9:50pm

Backups and documentation first I guess. Finding all the things you need to backup will probably help with some of the other things along the way.

And yeah, I’ve had similar experiences to the nmap thing. Guess what happens when someone sets the error threshold for automatic removal of servers in a load balanced pool, and then leaves only one server in it?

Yep. It’s dead Jim. D-E-D dead.

oO.o · May 25, 2022, 10:29pm

Sounds like years before you’re gonna pass that audit

SgtAwesomesauce · May 25, 2022, 10:31pm

If I were triaging it, I’d do:

Backups
Triage everything else

Once backups are done,

Verify Backups
Build your disaster recovery, incident response and boot-from-zero plans (Document your discoveries as you go)
Sort out those centos 6 and 7 boxes.
fix your OOBM credentials, and implement a password policy. 2FA comes later.
Three birds; one stone: Design endpoint GPO and enforce antivirus and lockdown all company devices. Any device that wants on the VPN must have that policy applied to it.
Roll out monitoring

Passive things you can work on sort of… as other things go:

Re-arrange the finance guys desktops, so they can’t quickly find the docs when they leave their workstation unlocked. A simple right click, arrange by, and whatever madness you’re feeling in the moment.
You’d be happy to add personal devices to the domain so you can install the AV and GPO on it, but that’s the only way they’re getting on the network. The whole company will be asking for corporate laptops, and you can just direct them to whoever’s in charge of your budget.

That’s my thought process. Should keep you busy until September.

Dynamic_Gravity · May 25, 2022, 10:40pm

I would say patching takes precedence before building out DR.

DR can be a time sink, especially since they don’t currently have one.

And while the pain of downtime is there for patching it is the fastest way to go from insecure to secure.

Right now their entire infra is a big honeypot that is only safe because of obscurity or the main firewall.

Other than than that the order would be the same.

Well spoken, Sgt.

SgtAwesomesauce · May 25, 2022, 10:43pm

Yeah, hindsight, I figure that’s probably a better priority. Only reason I recommended DR first is because backups are useless without a recovery plan to make use of them.

Dynamic_Gravity · May 25, 2022, 10:46pm

Yeah that makes sense.

I think the proper priority would be in the sense of:

Do stuff that prevents bad actors from getting inside your network

If they get inside, lock it down to where what they want to do is impossible/difficult

patching
permissions
isolation

If they find out a way to do what they want to do have DR ready to go should you need to do a black start or they wipe your backups.

That would be the general flow for risk assessment. I am not a SecOps guy though so someone more knowledgeable than me please feel free to correct. But as I understand it that would be a good starting point for risk assessment.

oO.o · May 25, 2022, 11:22pm

I would backup before patching because it’s likely patching will break things.

Mastic_Warrior · May 26, 2022, 12:02am

The SecOps (CISSP here) approach that I would take is get backups first and then once you trust them, then patch. If the bad actors are already in, your information has already been leaked. And if you patch it and break, now they have the only viable “backups” to your data.

Depending on your CISO or CIO’s policy, you can then start patching the system on a UAT environment using those backups and test it before you break it in production. At this point, the priority should to get a DR policy in place because the worst possible situation should be assumed.

2FA is only valid if the people on your network can be trusted. Remember the CIA Triad. If the people are that careless about what they are using at work and are already using personal assets, more than likely, their other assets that they would use for 2FA are also compromised.

Again, from a CISSP perspective, I would assume the worst, build a snapshot in time to be able to undo anything that I bork in the sake of “fixing” things, and then once I have that, then purposefully start restoring order (breaking things) to keep those on the outside out. After that, I would then start preventing inside actors.

I mean, there is a lot more that I would do, but the above should keep you busy and allows you to do your due diligence as a sysadmin until your CIO/CISO can come up with some real policies that give you the authority to start making things more secure.

oO.o · May 26, 2022, 12:07am

I get the impression that there isn’t one…

SgtAwesomesauce · May 26, 2022, 1:02am

Im operating on the assumption that they’re already there.

oO.o · May 26, 2022, 2:56am

Might be worth throwing a transparent filtering bridge in front of the gateway just to get some block lists in place and optionally suricata or something. Can do that without touching any existing config whatsoever.

Mastic_Warrior · May 26, 2022, 2:41pm

I assume so too, otherwise, that person should be fired.

@thunderysteak If there is an acting one, then that is the person that you need as your biggest ally.

If nobody is assuming that role, this could be a shoe in to get into the C-suite early, if you intend on staying there for the long haul.