Major setup ovehall, no idea how to title this or where to post it

this is going to be long, and I’m sorry this is my first post, I don’t normally think I have much to add here so just read

I need to totally revamp my servers at work, the phone system, and need some new software to organise orders/deliveries and track bills/payment.

EDIT: sorry to go so long/heavy, there’s a lot to do long term, I don’t want to miss important info out - I set/built all this up myself so far

I’m a geek but more advanced server stuff is a bit out of my league

I need to sort the hardware side of things first before I start on software

current setup (because everyone loves photos)

Dell R730
2x Xeon E5-230 v3
128gig ram
10x 1.2tb sas drives (in raid 6)
1x 1tb SSD boot drive

Dell R710
2x Xeon X5670
144gig ram
6x4tb sas (raid 6)

3 net app disk shelvs (plugged into R710)
24x 3tb sas (raid 6)
24x 3tb sas (raid 6)
24 x 4tb sas (raid 6)

ups with… sexy(?) external batteries

there’s a 24port 1gig switch in the server rack

then 3x 48port 1gig poe switches in the small rack

I used to run 3x R710 servers
windows server 2019 for cctv and traccar
pfsense
freepbx (for phones)

I just about had freepbx setup when the pfsense server died so I swapped them and went without freebox, now the pfsense server is dead too so I’m just using the ISP router

the running R710 is 99% for cctv (hence all the storage)

it’s running traccar gps software too, but that’s low overhead

the R730 is running hyper-v and two VMs running winxp for MS MapPoint (route planning)

I got a bit click happy on ebay so I have two more R730 I can use (identical to the 1st)

I’d really like to have two R730’s running everything in VMs in an auto fall over system so if one dies the other automatically picks up the slack - I’m lost as how to do this, everywhere I look assumes a min. level of existing server knowledge so misses important stuff out, or expects PowerShell knowledge and doing everything via a command prompt - I’m dyslexic anyway but then I fell on my head a few years back, I suck at command prompt

all drives are SAS so I think they can both servers can plug into the dish shelvs? (I have 4x sas SSDs to use as boot drives in raid)

VMs would be
1x win10 for cctv software
1x pfsense
1x freepbx
1x winxp for traccar
1x new one for route planning / order tracking / accounts

I have a backup of my old pfsense config, hopefully I can get that working in a vm

once I have that done, I’ll find someone to setup freepbx for me

and find someone to write me software for tracking customer accounts / placing orders / bills / route planning

I’d really like to integrate that with freepbx so when someone calls in we can use caller ID to automatically pull up their account info

anyone have any insight for me? ideas? pointers? anything at all I’m really struggling here :frowning:

but… bonus photos because everyone loves photos

server “room”

which is ontop of a (big) fridge

which you climb up a ladder and walk across a freezer to get to

this was the best ‘out of the way’ spot to put eveyrhting

it’s been running like that for years, I’m going to section it off a bit and add aircon, it’s hot up here in the summer and too cold in the winter (I get high temp alarms in the summer and low temp alarms in the winter!)

7 Likes

Solid ghetto server room… hi hi :slight_smile:
:+1:
When a customer asks why something doesn’t work… show him/her the pictures.

P.S
What exactly does the company do?

3 Likes

Given its entire IT dept. is on top of a freezer, the easy conclusion is that it has something to do with food :stuck_out_tongue: (there, how about kicking in the proverbial open door :wink: )

Anyway, I’m not running any critical IT stuff so this is at face value, but what you’d need to look for is info on how to set up a “high availability cluster”. Mind, it needs lots of (additional) networking & storage, more battery power for your UPS’s and a good think on how to provide power redundantly to the IT rack(s).

Also, it might be a good idea to find yourself more efficient AMD EPYC systems over the ageing XEON’s you now have (although they’ll do fine for now) to save power and thus extend the UPS fail-over time. While at it, consider physically separating both halves of the HA cluster to reduce loss of data in case of a catastrophic event (fire, flood, theft) and using fibre optics to connect the racks, dual 10Gbit aren’t that expensive and switches for those speeds are also relatively affordable, especially for enterprise buyers.

Best of luck and keep us posted!

I hope the food is not stored/prepared in similar conditions… :wink:

I am able to understand a lot, I can turn a blind eye to a lot, but when I see the remains of cables lying on the floor and no one wants to clean it up, it’s one step too far for me. :wink:

hi

it’s a dog food factory

factory is nice and clean/tidy :slight_smile:

server rack was only for CCTV but has been extended out over the last couple of years as we’ve expanded

I know it’s a total mess, I’m a bit embarrassed about it tbh, but as I’ve mad changes / added things etc. I don’t want to go too far until I know what it’ll look like at the end… plus in the past if somethings gone wrong I’ve needed to fix it quick and get back to making dog food :slight_smile:

I guess I posted a wall of text when I should have kept it short and asked for an idiots guide to fall over clustering etc.

Have you looked towards CARP…

1 Like

OK, you got me hooked, I wonder if it’s something my dog likes to eat.

(Especially if it’s any of JR, Anko, Woofs, Fish4Dogs, PaleoRidge, … “level1dog” promo code please :slight_smile: )


Good news, is what you’re asking isn’t unusual - plenty have done it before. They’ve setup 2-3 server highly available setups, without paying arm and a leg for cloud stuff.

Bad news is, it takes a bit of effort, and you’ll need to make notes for yourself as you go along. … and 3 hosts is the minumum for storage.

Folks here can help you with fundamentals, I think it’s probably better to have a separate threads for your other solutions.

1.0.0 Power

Those batteries are a big cringy. I applaud the effort, but get one (or more) of those Pylontech US5000 batteries and one of those “smart inverters” that people use for solar, and spent time talking to people how to set it up so that you get inverter power when grid is off.

You need to cycle those batteries a little bit once in a while, it’s not good to keep them charged all the time, that’s just a matter of planning.

1.0 Network

You have plenty of 1GB ports, I’m going to assume you have the two servers connected directly with some cheap 40Gbps (QSFP+ nics).

If you don’t, get them cheap from ebay - and a DAC - high speed storage is super useful.

Additionally, carve out a group of 4 ports on one of your switches and call it WAN - your ISP router goes there, and so does one of the network cards on each of your servers.

1.1 Proxmox (w/ basic build in ceph)

This is a VM platform that you install on bare metal.

(alternatives are VMware, XCPng, ovirt, windows server/hyper-v … they’re all too much work relatively)

Goes on each server - you can manage your VMs across multiple hosts in a single web ui – easy peasy for basic use, and how you get HA going eventually is well documented.

You’ll need a third proxmox-ish host to get high availability, doesn’t have to be a big server, it can be a small proxmox host on an old NUC or a thin client.

technically...

, you only need corosync on e.g. a raspberry pi, … but raspberry pi are unobtainum and for 50 quid you can get a discarded old dell/hp small office machine … can add it to the cluster and don’t need to run corosync. … but having reliable distributed storage.

Since you have 3 disk shelves see if this third proxmox host can get a $50 9201-8e or similar controller – make that your third Ceph and truenas host.

This way, you have all your most critical data written on each of the 3 hosts, … and on each host it’s raidz2.

If any 1 of your two VM hosts dies (try unplugging network) … and/or comes back wonky, all your most critical data is fully read-writeable … your VMs will be spin up on as if there was a power cut, will all the data still there on Ceph - never going anywhere.

If you do a planned migration, it’ll only take a seconds, even if you have 50/100G OS drive.

1.2 pfSense

You’ve seen it before - goes in a VM on each server.

It comes with pfSync / CARP / VRRP abilities.

https://docs.netgate.com/pfsense/en/latest/recipes/high-availability.html

You’ll permanently have more than one of them running, … so if a server host goes down, you’ll be able to get into your network and maybe get access to repair one host through the other, and vice-versa.

you don’t need to migrate these.

1.3. TrueNAS Scale

Have lots of disks, use ZFS, TrueNAS lets you set up ZFS, export SAMBA (windows) and iSCSI shares, and maintain your disks from a clicky web ui.

It’s also an apps platform (especially Scale variant), it’s useful if you want to export storage in some other format.

TrueNAS wants you to pay for “high availability”, but your local storage can be pretty “durable” at relatively low cost for stuff that doesn’t need 99.99% availability (think backups, yesterday’s CCTV footage and so on).




The thing with Ceph is that it’s really more of a “datacenter” tech, it’s not very efficiency at low scale.

Ideally, you’d do something with migrateable ceph OSDs and multi-path disk shelves, and you’d be fencing access to disk shelves… but that’s more complicated than ceph’s share nothing.


Good news is you can probably bring up a couple of VMs to play with proxmox today (do nested VMs). And see how you get along.

1 Like

That’s awesome! Thankyou!

And you’re in luck, we make frozen dog food and dry treats, I’ll pm you right after this and sort you out a big box full of freebies :slight_smile:

I’m not home tonight so I’m on my phone but can’t resist replying

I have 3x dell R730 so can run 3 if those, need ram for one of them so I’ve just ordered that

For storage (was a video link missing?) Do you mean I connect 1 disk shelf to each server then they share the drives they’re connected to with each other directly, or they each have a separate disk shelf and independent storage with copies of the same data (a bit like 3 drive raid1)?

Most of the storage is for CCTV and non of that is mission critical, there will be some mission critical stuff (the VMs). I went with the disk shelvs and lots of drives because at the time that was by far the cheapest way to get a lot of storage.

I’d lose too much storage if the disk shelvs were essentially a 3 drive raid 1 system

I could mount drives directly in each server for mission critical stuff and then let them have a disk shelf each if they’ll share them? If I lost 1/3 of CCTV footage till I fixed a server I’d be ok with that

1/3 of CCTV footage can be offline for days while I fix it… Phone system can’t

1 Like

Somehow missed it, added v1 and v2 above.

===

Ceph is hugely complicated, it’s a major miracle proxmox has a web ui for it, and that it’s good enough to ship.

The simplest data setup people run, and the setup in those 2 videos is that they run with all data 3 times replicated, with a minimum of 2 OSDs confirming they got each write before acknowledging to the client that the write indeed happened successfully (haven’t pixel peeped the second video, but I guess that’s what they’re running as well).

This means you can take down one of your hosts to upgrade proxmox, anything you write will still have at least 2 copies on 2 nodes.

I think this also means if something fails, while you’re upgrading the first node, your virtual block device ends up hanging on writes (needs testing).

It’s also super expensive (bytes) relative to e.g. 6 drive raid6.

It’s also super slow relative to TrueNAS/ZFS over network, because for every write, in addition to shuffling the data, there’s a few things that need to happen over the network with respect to metadata.

So … I’d only really recommend it for VM OS drives, or data that is “active” / “currently being read or written”, because OS don’t take kindly to their storage being yanked away, and this is the easiest way to have un-interrupted storage, and to keep things running.

For example, for your CCTV, you can make an OS drive and a “short term data” drive. That way, when your “OS drive” is running windows updates, and the NTFS filesystem is sync-ing data to virtual disk left and right, and spamming Ceph with tiny barrier writes (operations that prevent further writes until all existing writes have made their way onto stable storage, which takes a while because metadata over network takes a while). Your other virtual disk that’s only writing video, can keep only writing videos independently, and the windows update won’t slow your other thing.

Then, for example, every hour, or every day (not sure how much data you have), move CCTV data over from Ceph to raidz2 on a samba share.

If this fails, samba throws an error, and you keep recording new CCTV and you haven’t lost anything.

ZFS RAIDZ2 is both cheaper than Ceph, and higher performance, but lower availability.
It’s higher performance because all the metadata stuff does not have to be coordinated over the network - everything just runs on one machine - making a lot of stuff faster.

However, unfortunately, everything running on one machine means that when you need to upgrade TrueNAS / ZFS … your filesystem and your data will be unavailable during that time.

In contrast, with Ceph, you can upgrade one node at a time, and you can keep reading and writing while one of the nodes is being upgraded.

With SAMBA ZFS hosts “goes poof” for whatever reason, the network share becomes unavailable to the client, and that throws an error that prevents the app from using the network share.


What storage do you have exactly, and what controllers are you using?

You mentioned hyper-v … are you running RAID / or using your controllers as HBAs ?

right now I’m running a Dell Perc H800 sas raid card, wired extermnally to the 3 disk shelvs, useing it’s hardware raid to run the drives in raid6

I don’t care about losing all the saved data, that doesn’t matter (EDIT I mean saved data while changing stuff around, it’s fine to format and start fresh)

there’s 2 shelvs with 3tb drives in - about 50tb storage each (after raid6)
1 shelf with 4tb drives about 80tb storage (after raid6)

(3tb shelvs have 4 hot swap drives each, the 4tb shelf has 2 hot swap drives)

100tb gets me about 3 months storage… so call it 1tb per day of cctv

it’s not great but I could drop down to 3 servers with 1 shelf each and ~50tb per server… I’ll do that if it’ll save a lot of hassel

is the hardware raid ok to use? am I better getting cards/flashing cards to be HBA and let proxmox have direct access to them?

(cctv is running on the R710 right now which I’d rather remove)

new system would be 3x R730

I already have 2x 400gig SAS SSD drives for them (I was planning to raid them as boot drives)

and I have 20x 1.2tb drives to go in them (they take 2.5" drives) - I could get 10 more so they have 10 each

I’m totally open to buying new bits / changing astuff to suit - I’ll ned the fast network cards etc.

That is going to be a long march if management is not aware that this is edging on… falling through the freezer I suppose?

Depending on how much backing you have by the money-wielders, maybe sit down every manager in a room for a “wish for the future” idea gathering. Sizing future plans before buying VM-Hosts helps to avoid the “Yo boss, I need yet another thing” walk of shame :wink:

:zap: :warning:

There are fancy closed-loop server rack ACs, then you just add your cooling power elsewhere (= outside) and off to the races! That way your servers are not inhaling dust all the time.


Network

I would tie all servers together via 10Gig (SFP+ hardware is dirt cheap), then let servers talk to the inside world (NOT the internet!) through a very angry looking firewall. That firewall should also segment office (especially those expected to get potentially bad emails), technical (controlling machines) and production floor.
Should end up with one VLAN per department (depending on size), plus management for servers and switches and at least one for the production floor.

This is going to cause a rough start and be a huge pain for a few days, you are going to be very relieved when Jane from accounting admits to have clicked on a spicy email.

Security
I like firewall appliances. Lots of ports, depending on brand not too fussy and sometimes with a base license included.
Could go pfSense too, off course.

Servers
Hardware boxes:

  • At least two VM-Hosts :white_check_mark:
  • Storage Server :white_check_mark:
  • Backup Server :question:

Virtual machines: Proxmox works, VMware works, not sure I recommend M$ Server and Hyper-V.

I own the company so there’s no-one to beg/answer to but myself - I’m ok with spending money on stuff, just don’t want to spend a fortune when it cash can be used elsewhere (it’s only a small company)

I’m thinking if I need one disk shelf per server to upgrade to Netapp DE6600 (60 drive bays) vs the DS4246 (24 bay) that I use now, then I could buy a bunch more drives and still re-use the old drives

1 Like

Curious why your retaining so much footage. Is the goal for multiple years of footage or just 3 months worth?

a few months worth

I fell through a roof and landed on my head 7 years ago, 3 scull fractures, bleeding brain and a load of other injuries

I’m ok now, but I was a bit dopey for a while after that… and guys working for me decided to take the opterunity to rob me blind… it’s left me a bit paranoid

tbh it’s overkill… a month will do

1 Like

1TB a day feels like a lot unless you have a lot of cameras. You could possibly reduce your storage footprint significantly while still giving you the level of video retention that you want. How many cameras are recording? Are you recording audio? What VMS are you using?

1 Like

I’ve given this a lot of thought, the CCTV stuff isn;t mission critical, the VMs will be

I’ll leave the CCTV running on the old server and have the 3 new servers running just the VMs

ordrerd 40Gbps QSFP+ nics and cables - no switch - I can just wire them direct right?

ordered a bunch of SAS SSDs and some more 2.5" sas drives

will have 4x 400gig sas drives and 8x1.2tb spinning disks in each new server - should be plenty space for what I need

I’ll update more once I get some stuff installed, wanted to update formwares before installing anything but the Dell lifecycle thingie wasn’t playing ball, it didn’t want to connect to the dell ftp, I’ll look at it later :slight_smile:

Dell r730(XD) you can mass update from lifecycle controller, no need account on Dell.
If you have really old firmware, register on Dell site and pull (use service tag) dvd ISO image for mass upgrade and after that use lifecycle controller to upgrade to final version. Do you need YT link?
HDD, if I remember correctly and NVMe upgrade from different ISO.