Too much privelege?

By design, how bad of an idea is it to have a hybrid W2K16/RHEL environment where the app admins have rights to do stupid things outside of planned maintenance windows? One environment I am familiar with has app admins mostly just lookup logs on the production environment on a daily basis to troubleshoot. Anything else would be unnecessary outside of applying new, infrequent code pushes.

An example would be applying updates manually to the wrong environment because there are no scripts in place to prevent human error.

What would be the best way to idiotproof this? Obviously don’t do X in th wrong environment but am I crazy to think there should be multiple levels of safeguards in place?

1 Like

Hi, Welcome,

This kind of thing reminds me of: Accidentally destroyed production database on first day of a job, and was told to leave, on top of this i was told by the CTO that they need to get legal involved, how screwed am i? : cscareerquestions

Essentially, if a human is expected to interact with the system, they’re part of the system. You can design a system / deployment / software to be more tolerant to human error, same as you can design a system to be more tolerant to machine errors or software bugs. You can fix bugs more easily than you can fix humans, typically.

At a large cloud hyperscaler where I’m employed, most hardware is touched by humans when first assembled and plugged into racks and then some number or years later once it’s obsolete and out of date and about to be decommissioned. Nobody ever SSH-es into the machine to look around, or install things or update stuff, this is all done by software, that is maintained by humans.

Typical safeguards we have are for example, we don’t start up / update / or turn down most workloads by hand. If the workloads are not generated on the fly, but rather of they’re hand curated, then their configuration is checked into version control where it’s been peer reviewed. The system that automatically turns jobs up/down and/or updates them has automatic integration with a system that performs automated canary analysis. Because most things are either sharded for capacity or replicated for availability and often both , the canary analysis service can use heuristics to apply a statistical A-B test and judge whether the update comes with significant regressions.

This is after, regression integration testing and so on.

On top of that, we have policies to artificially slow down rollouts in order to give humans enough time to detect potential issues before the issues get a chance to spread too widely.


In general with privileges that allow one to do destructive stuff, if humans don’t use them daily, take them away and build a sudo like mechanism that allows them to escalate the privilege to a level to allow them to do potentially destructive stuff.

If no sudo, then a separate user account is probably the right idea.

Over time, try to get humans out of the loop of doing this kind of thing, and switch to more declarative version controlled workflows for deploying changes.

Also, try to evolve out of maintenance windows. Replicate the services instead and try to make partial, revertible changes only when there’s folks around to detect and fix issues instead. ie. try to move to a “don’t deploy on Fridays model”. After that, as your confidence grows (“confidence” == look at historical metrics to determine likelyhood of failure and compare to acceptable risk), start deploying the particular kind of change on Fridays or over the weekend.

2 Likes

Put test in a different ad forest with a one way trust so test trusts prod but not the other way around.

Give people testing stuff elevated credentials in test but not prod.

2 Likes

So much This ^ if you cannot do it the right way.

1 Like

hence it always goes back to the the one thing most people dont do and is frequent backups!
its absolutely essential in an environment where multiple people have admin privilege.
otherwise the possibility some one will bork it up rises to astronomical rates.

many times in industry many senior technicians will not teach adequately for the reason of job security.
this leaves many juniors to learn by trial and error and sometimes the errors can lead to their job termination.

1 Like

backups are all well and good, but time to restore is not zero, and corruption/misconfiguration/abuse of privilege etc. may not be identified until way after the event.

Even with the most complete backups in the world, if you need to use them you may be waiting many hours or days to return to operation, during which your service is broken.

2 Likes

if your admins are doing stoopid shit.
get em into the classroom and teach em what they can do and how you want it done.
sure you can add a backup security layer but teaching them what is and isnt allowed on your system would be the best option.

also penalties if they abuse the system knowingly after.
(at an old job a boss caught one of the interns playing games on a work pc. he told him to stop and remove it… 2 days later he was caught playing again. he got fired there and then.)

Yeah sometimes even if you’re qualified though, having a test environment is just good practice.

Isn’t just for admins, but also for developers, coders, etc. Test new stuff in TEST, not live.

3 Likes

I have a co-worker that is new to IT but after I showed him some scripting things, he things he is the master of the known universe and is habitually putting “new” scripts that he has extensively “tested” in production. He will not tell anyone until something borks and sleuth it out to see that hey, this was cause by a script running on the WSUS server or a RHEL jump box. Then I open it up and see that he wrote it. I take it up with him and usually he says, “It has been running for a while so I don’t think it is my script. No one said there were any issues.”

Drives me up the wall. If don’t know it is there , I am not looking for it. If it degrades over time, then did you really test it. Also, why unleash it unto production instead of using our testing environment. I even build an in house lab to do crazy stuff with before even bringing it into testing or production environment. My employer does nothing to this guy and he is our lead IT. I have vowed to stop fixing his messes, and now he has taken things personal.

I am the only one that knows how to use the backup systems and robots. I am the only one that is really GNU/Linux savvy. I upkeep most of our systems, yet he keeps picking fights with me. I am about to go on vacation and I hope things fail because my phone will be off. Maybe then, management will listen to me and dump this IT noob.

Sorry, I needed to rant. Sorry for off topic

2 Likes

The title felt a little comically. The best thing would be to have a CD / CI / CD pipeline (what I call Continuous Development / Continuous Integration / Continuous Deployment). Everything should be automated, you develop stuff, your build servers or other automation systems push the update from testing to staging (UAT) servers, then if it doesn’t get explicitly disapproved after a while, gets pushed into production. If you don’t have such a pipeline in place, at least make a procedure for everyone to follow - everything first has to go to through a staging server that must be identical to production (except for the data in it, obviously). And always make a snapshot before pushing to prod, so that in case an issue is found, you can revert quickly and just restore from a daily incremental backup.

I remember it as well, in the same comment section, there was also a guy at GitLab who nuked the whole GitLab .com DB on his first day at work, but instead, people were glad he found that and fixed the issue. Such a big difference in company policy.

This goes without saying, backup is important even in staging if you want to save time by restoring broken stuff (instead of setting up a new environment similar to prod from scratch).

This ^. Just like Wendell said recently, “always ask the question <how long?>” multiple times.

I don’t like hierarchies in IT departments personally, which is why I prefer working at medium-sized businesses. Some workplaces I’ve been through had a “boss,” but my relationship with them was either as “fellow computer janitors,” or other places had no direct IT managers (basically me and my colleagues were bossless aside from the higher management), so “ok, you have a higher position than me, like CTO / CEO, but who cares, we’re both humans” and we used to go out for a beer from time to time. So my solution to this problem would be an actual honest meeting in the department. Cancel all your activities for 1, 2 or 3 hours and have a serious discussion, no laptops or phones allowed (people from other departments don’t care about your issues, they care about their issues and getting them solved ASAP).

Which is why I find it really important to get in a workplace where you can understand each other and not have too many clashes. Clashes are unavoidable, so you need a system by which you can make decisions that everybody from your team is ok with and you need some rules in place from the beginning (be it democratic in the department, or just highest position gets to make the rule, a la monarchy - just make sure everyone is ok with them from the start). I had my fair bit of clashes, because I’m a bleeding-edge tinkerer, I played with stuff like Wireguard, CentOS 8 just when it launched, even Fedora Server and other “new software” and tried testing them in a slight-production environment (not really prod, but more of a "if it goes down, nobody will complain for a few hours while I put up something better - or having an alternative already running side-by-side). But if I couldn’t convince my teams to use or at the very least further test those solutions, I wouldn’t complain about it. Well, it basically amounts to the need to find adults, which is hard to do these days, with all the overgrown children all around us.

Good. If people cannot take responsibility for their own action, then they need to learn a hard lesson to do so. They can get pissed at you all they want, but when they won’t find people like you around who fixes their mess, they will get to taste what responsibility means anyway and maybe even harsher than just nobody helping them (like, getting fired). It’s a good lesson. However, I believe you should be more open about it with those people too. Tell them upfront, if you haven’t already “I won’t solve your issues anymore, if I see something originating from you, you’re on your own - and I’m doing this to actually help you, because my help until now has been dragging you down, you can do better than this” or something like that. Your co-worker has been privileged for far too long! :rofl:

tl;dr do an automated pipeline which goes from testing → staging → prod, if you can’t have it be automatic, make a procedure for manual stuff going through the same testing methodology, have clear procedures in place and make sure everyone follows them.

2 Likes

mandate code signing for all powershell via group policy and only permit scripts signed by the CIO, it manager or equivalent technical review person/change board for your organisation.

harsh? yes but eventually you get to the size where a cowboy with power shell can cause way too much damage and it needs to be controlled.

rolling stuff in prod here where i work without at least change advisory board review is a written warning

if nothing else, mandate peer review. because everyone makes mistakes. yes, including you (the reader, not aimed specifically at anyone posting in thread), and myself.

code signing powershell also avoids the “this isn’t what i wrote” bullshit.

code changes, sig breaks.

2 Likes

as an aside, code signing also tends to encourage more versatile scripts because you pull variables out to parameters, csv files, etc. to avoid continually having to re sign your scripts.

which promotes testing, putting functions into modules and re use

2 Likes

yup. if your company can be broken by a mistake, that’s not the employee’s fault who made the mistake. your process is fucked

edit

oh and yeah if it sounds like i’m working for a mega corp… well our change board has like 4 people.

CIO, myself (head infrastructure nerd), information systems head and our ERP lead.

total IT/IS department size including help desk and developers : 14 people

2 Likes

Good point on the PS code signing, but that does not help us with the bash scripts. And he tends to disguise his by taking stuff that has already been on the network and used for lengths of time and replace them with his. Like this behavior is some serious insider agent type stuff bu the company that I work for is utter garbage.

Yeah, we have all talked with him but once he became the lead, he became the ultimate authority and my company’s management is computer illiterate and is lacking on proper business processes and security practices. Needless to say, I just recently reach out to our site manager and had him read the snarky and alarming things that the led has put in the passdown, and basically this dude was told to apologize. So he did a sorry not sorry apology and then basically said that if we are going to run code and scripts without reading and understanding them, then we are on our own. NaniTF? So I reported this to our site manager and nothing has happened. This is a new site manager that has been here for maybe two months. The previous site manager was a goober and the guy before him retired (great guy)!

Either way, the work place is toxic AF and I am looking to jump shift because I am tired of being the one to work mandatory overtime or change my shift through out the week to fix other people’s messes even after having meetings and going over the implications of making near sited actions.

1 Like

Why not make a Git server (GitLab / Gitea) and force everyone to create new scripts or commit to existing scripts. Every change can be monitored. GitLab also has an option for code review, so somebody other than the person who wrote the code has to review it (not sure how that works, I wasn’t the one using it).

This is a military contract and we are COMS. We maintain what is here and there is a CCB (selectively selective) that determines if and when we add new technologies or make over arching changes. Plus we are isolated from the internet and we have UC and C systems/networks to deal with. Everything is purpose driven and undocumented/unapproved systems raise eyebrows and spark investigations and audits. Which is why my coworker tries to sneak his changes in. He would rather ask for forgiveness instead of ask for permission, even to the detriment of the team. Ugh!

I would go insane in such an environment no matter how good the pay is, as I hate overreaching top-down control, working with governments and not having (slightly) flexible work hours. Add to that a toxic working environment and a jackass team lead and you’re in for a lot of fun. I don’t know what your situation is, I cannot tell you to switch, all I can wish you is best of luck, buddy.

1 Like

exactly the type of place i worked at.
production twits making changes and other people or equipment suffering for their stupidity.
I damn near lost a few fingers because someone bypassed a safety switch ( estop control pressure release valve)

2 Likes

Sounds like someone needs to make the call to fire the guy.

There’s definitely grounds for it. If you do not have an established change management process you need to get one in place so that he has to comply with it. Rather than complaining about the guy, establish the policy and then nail him for violating it.

If he does not comply with it then he is failing in his responsibilities of employment.

Proper change management doesn’t have to be draconian, but it does help ensure that due process is followed (i.e., notifiy stakeholders, have a back-out plan) or at the very least change is documented so that triage of any fallout is easier.

1 Like

There is a CCB controlled by the gov’t. They just have their heads so far up their rectal tunnels that they don’t know how to use it except to tell us “No, you cannot do that because [they] said no.” When asked who is they, not follow up. This is usually in relation of us trying to streamline something or make something resemble a sane process. Our lead is friends with the ISEO team and therefore he gets their support when we go to the gov’t about the issue. This place is full of High School Musical drama.

1 Like