60,000 Computer Challenge

I don’t want to take away from Wendell’s thread. He put in a lot of work and it’s crappy of me to start a separate discussion to take away from his VLOG/Tutorial sets.

Thought Experiment or Case Studies (if you have them). You have a large enterprise. 60,000 workstations need to go out. Do you use Linux? Windows? OS X? All four (???)?

From the other thread:

Use whatever you want, Kickstart, PxE Boot, AD, SCCM, Chef, Octopus, Chocolatey, NuGet, Ninite Pro, Spacewalk, Thin Clients with AWS Workspaces or Azure Remote Desktop

Provide reasons, don’t just spam fanboyisms and tools (like I just did).

None of this 644 MB of RAM bs either. These are workstations, emphasis on DESKTOP

If you don’t want to play and were just circlejerking, let me know and I’ll close this.

1 Like

The most systems I have ever kickstarted at one particular time was ~70.

I use RHELat work. and fedora at home for my desktop. I’d love to use fedora at work for my desktop but I havent found anything that can connect with cisco jabber yet…

The traffic on a /16 network needed for 60k clients (being remote installed at once) would be pretty intense.
Can you imagine a broadcast storm?

Arise from the dead @Zoltan

Watching thread. I don’t manage networks or clients but I’m interested.
We use lots of LDAP capable software and I wonder what would be used other than AD.

heavily depends on the application, doesn’t it?

If I’m just administering a bunch of work computers, apple has the lowest overall lifetime costs, so I’d probably just go with a VPP deal and use their enrollment/configurator with a backend of whatever I need for compute/storage

I tend to deal with servers rather than workstations, but the preferences are mostly the same thing.

Debian, all the way down. I know it, I love it, and only two things about it piss me off enough to kick puppies. It also eliminates the entire licensing and true-up process from the scenario.

Initial boot using PXE and the netinstall image, configuration via debian-installer preseed, stored in source control.

I don’t use config management. I’ve got extensive experience using Salt, CFEngine, Ansible, Chef, and Puppet, and every one of them has challenges that eventually become workflow bottlenecks, or present scaling challenges. When you’ve got a few hundred servers, they work fine. When your rolling restarts upgrade 20k servers at each invocation, the probabilty of network flapping or drive failure approaches 1, and a lot of them don’t handle that well. Puppet was particularly bad at this, and the master nodes scale very badly in the first place.

There’s also the tendency among tech people to flat out disable them. We tried using Puppet on the company issued laptops at my last company, and every developer had a nightmare story about the company’s printer drivers (or something) hosing the versions of Python they need, and it’s always easier to disable the thing that makes changes to your system than fix the problems with it.

Bash scripts everywhere! Seriously. It works for bare metal, nearly every cloud provider offers running an arbitrary script on provisioning, and you can build minimal containers from them quite easily. Simple tools work surprisingly well when you’ve eliminated complexity elsewhere.

Taking inspiration from the “immutable infrastructure” camp, we shoot any system in the head that isn’t operating in spec, and “spec” includes software versions. When file state is important (as it sometimes is), we abstract that away and solve it at the storage level (usually Ceph). Wiping the local disk shouldn’t ever be scary; anything important should be on redundant, remote storage.

Because of this, we don’t need to ensure that our systems upgrade properly; we never upgrade. Every system is in a fresh state every time the configuration script runs, and we run automated tests before putting the system back in rotation. The declarative benefits of config management aren’t super important under this paradigm. It also gives developers leeway to bring their systems out of spec as long as those deviations aren’t actually harmful. If they do something that violates policy (like open disallowed ports), the monitoring tests pull it from rotation. In the case of laptops and desktops, we just drop access to all but a limited subset of read-only systems on the internal network.

This same process works pretty well with graphical systems too. Ultimately, video drivers and Xorg are just packages that need to get installed. Giving devs the leeway to adapt their system also eliminates the preferred desktop environment debate.

As to centralized logins; we don’t do that either. I’ve had to fuss with far too many LDAP servers to put my trust in them. Even redundant setups feel a lot like single points of failure. The user database is stored in Vault, and our provision scripts build the appropriate local configuration. This is immensely helpful for the rare times someone actually needs to get on the box rather than reprovision it. You can still authenticate when you don’t have a working network and shared passwords aren’t ever needed.

The only change I might make from this process if I were dealing with only desktops would be OSTree and Fedora Atomic. It’s got a lot of promise, but is still rough around the edges. When that matures, it might be a viable replacement for the server-centric setups I use, while still keeping the immutable-like nature of the infrstructure.

That hasn’t existed very long, so hasn’t become a major consideration for anything I’ve built out, but I’d give it a shot for future deployments.

2 Likes

That’s still pretty good for Kickstart.

Lol

Not at all. You have Accounting/Finance, Marketing, Sales, Pre-Sales, Client services, I.T., Engineering, Product Support, Product Development, Operations & Logistics, Market Research, Risk Management, Legal, and Human Resources. Total employee count is 60,000. They need workstations.

Not servers. Not VMs. Not super 1337 terminals. They need to do work, talk with clients, consultants, ticketing systems, e-mail, chat, etc.

gotcha, misunderstood the question

yeah, I’d definitely use apple VPP + thin imaging if long term expense is one of my concerns

anything that needs serious compute just let them remote to a rental AWS or in house hardware

edited it for clarity.

Ah man. I was really hoping that rise from the dead thing would work :wink:

1 Like

I actually manage something like 5.000 workstations and about 200 servers, this is how I would go about it:

60.000 workstations wouldn’t be that much of a management overhead. I use KACE over PxE to install a custom windows 7 ISO with everything preconfigured. After the installation is complete, I just need to get a list with names from HR, convert that list so my powershell script can read it and create AD users, O365accounts and whatever access to folders their function requires. All settings are preconfigured with GPO and GPP’s.

Our infrastructure would need some extra switches to get all 60.000 workstations on there, as well as a beefier FireWall.

The workstations would just boot into windows normally and any applications (Citrix, Office, CAD,…) is deployed with KACE using a managed installation. Any package updates/patches are also deployed using KACE.

Pretty boring, but hey, get’s the job done FAST.

1 Like