How do you work with remote systems?

gnif · November 3, 2018, 2:25pm

Just curious how others do this kind of thing, here is what I came up with and why.

The main part of my business is remote Linux server management and consulting services (Shameless plug: https://hostfission.com), and as such I have to maintain a method of storing critically sensitive information for my clients, specifically server access details.

A prior company I worked for who shall remain nameless would use a single ssh key and deploy the public key onto each server that they required access to. Obviously the danger here is that if a rogue employee takes a copy of the private key, all bets are off. This never sat well with me, so when I started working for myself I decided that there needs to be a better solution.

I wrote a management system to store my clients details in encrypted form. The protection model is based on how LUKS works, a single master private key, which is encrypted with a user’s private key, which is encrypted using a very strong password only the user knows. This way the data in the database can not be decrypted without breaking one of the user’s passwords, which is actually the key to their key.

But I still don’t want a user to be able to access the data or the master private key directly, so once they have logged in a broker is forked for the duration of the session which brokers access to the encrypted data. This process is run under a special user account by means of setuid. A series of utility tools provide a means to query information from the broker.

Ok, so now there is a way to store and load data into a secure system, there is still the issue of authenticating with a remote host. Initially I was using the same method of my previous company, but I cringed every time I thought about it.

There are three goals here I had:

Do not allow the admin to ever have access to the server’s private key
Do not use the same key on more then one host
Do not interrupt the standard workflow, keep key auth automated.

So, here is what I came up with. I wrote a ssh wrapper which performs the following:

Forks a custom SSH agent
Sets the environment up to use the custom agent
Runs ssh with the appropriate arguments to connect to the correct host

The custom ssh agent obtains the public key from the broker for the server in question, and any public keys from any forwarded agents and combines them to provide them to the ssh client which then forwards them to the host. When the remote host selects one and sends a challenge, the agent will do one of two things.

If the public key is the server key, it will ask the broker to sign the challenge with the private key; or
If the public key is a forwarded key, it will ask the forwarded agent to sign the challenge.

On connection a small python script is pushed to the client which is used to bootstrap an environment consisting of my admin tools & scripts. Some of these scripts require access to information in the management database, and as such I extended the SSH agent protocol to implement some side channel communications with the broker back on the other end.

One such task I use this extra channel for is signing keys for backup server certificates. I deploy a script in the bootstrap environment that is able to generate a private key and csr, and then ask the broker via the ssh agent to sign it. After the broker inspects and verifies the CSR is valid, it signs and returns it. This way with a single command I can generate a valid certificate over encrypted communications without ever moving the private key over the wire.

Anyway, enough talk, here it is in action.

2018-11-04-004115_544x271_scrot

This session shows the entire workflow.

On my local PC I listed the keys in my local agent
I connected to the management server, this uses my local key to auth, and immediately tries to login to the management software.
Again I list the keys, you can see I forwarded the agent through to the management server.
I then search for a server to connect to (enter) with the name looking as a search term. I can also specify other fields, such as hostname or IP by using the syntax field:search.
I selected the host found to connect to it and in the background my custom agent is spawned and configured for use with that host.
Authentication is successful and the connection is established. This also bootstrapped the remote host with the management tools.
I list the SSH keys again, this time you can see that the custom ssh agent has added the public key for this specific server to the list.
I use the hf-get-authkey tool to ask the the management server what this server’s public key should be via the SSH agent sidechannel. This is used when I am adding a new server to the database for management. After connection via password auth, I simply run hf-get-authkey >> /root/.ssh/authorized_keys.

This system has many more features I have added to it over the years, such as integration with monitoring, backup and VPN services.

So, this is how I manage my clients, I am interested to know how others do this?

Note: Many details of the security measures used here are intentionally omitted for security reasons and the demo shown here is not exposing any sensitive information, but for good measure the keys used here were retired before making this post

risk · November 3, 2018, 3:18pm

Can’t go into detail but relatively short lived certs O(daily for users, few weeks for client machines in TPM themselves) + passwords + otp + jumphosts (ssh-in-ssh for easier audit and CRL distribution).

Also different groups of humans get configured with access to different things.

All wrapped into a ton of custom scripts^Wcode to manage and monitor the setup with tons of playbooks and regular drills.

No sensitive info leaves the servers except via a browser tab (client side https cert mandated by a pointed at via wpad), or a terminal window, all “work” is remote and other than screen buffers nothing is really stored locally, the ssh tunneling and validation setup is just for accessing an on-site workstation dedicated to you (practically a VM) that you then use to do the work on real machines that are accessed pretty much through that very same hoopla.

Takes lots of effort, not really practical for a small shop.

gnif · November 3, 2018, 3:32pm

Completely understand

Same deal with the keys on managed servers, they are rotated daily and there is a HSM in the mix also.

Nothing is ever exposed via a browser in our setup, the software I wrote is 100% cli, not due to security even though it does increase it somewhat, but because terminal interfaces are generally more efficient.

risk · November 3, 2018, 3:53pm

We have a lot of webserver/browser stuff - now that I think about it, probably due to ergonomics, and low marginal cost of adding an extra webserver to things once you already have a couple. (e.g. basic HTML is easier than terminal escape codes, and you can easily draw graphs and format table and stuff, but yes getting carried away and having 100MB Django rendered status pages is a thing, sadly)

gnif · November 3, 2018, 4:03pm

Ah yes, we are graphing things and using browsers for that purpose, but it’s not considered extreme risk data like the management details.

BansheeHero · March 26, 2021, 10:02pm

I found this when searching for ssh-agent topics. i wonder how it fares in 2021?

Without going into details, Forwarding was frowned upon and we used pssh to manage the keys.
risks are the same as you described. (And upgraded to Ansible PBs from pssh)

I wonder how the world of secure minded people has changed. In the current environment I work at it is completely different. But then again, now I am working on Windows DMZ solutions

cotton · March 26, 2021, 11:00pm

Could you just setup a FreeIPA realm in their environment and join all the nix machines to that realm?

If they’re using AD you could setup a one-way trust from the Linux Realm to their AD environment. At that point the customer could control access to their nix machines via AD sec groups, and you’re off the hook for user management (including disabling account, password changes, etc) plus they keep their current infrastructure workflows.

If it’s strict nix environment, you just create host groups and user groups and create realm users for the clients and yourself. There’s a webgui so there may be some hand-holding to get the client to figure out how to add/remove users.

It’s pretty simple to deploy, depending on the distros involved, but it’s doable.

Just an alternative way of approaching the problem. You can treat ssh as a service which uses keberos.