So I acquired 20 or so Dell optiplex 330's from my workplace and figured id build a cluster of them, I'm currently in the process of designing and building a custom rack with cooling with the help of some friends.
The idea is, ill use roughly 10 for the first cluster. I would love to know what Linux software or OS I could use for a VM to run across all the machines (parallel processing | Beowulf) So in theory the VM would show up as 20 Cores, 20 GB ram (not counting the master computer that will probably be a quad core Intel machine I have with 8GB ram).
Now I have all the networking tools required with a GB switch and 1k' Cat 5e cable, might get cat 6.
Some background information on me.
Im 19 and work at a brand new high-school as their IT (Graduated 2014) I sadly manage roughly 5xx MacBooks (Mid 2010) along with a 100 or so windows computers. I'm starting up a small local company for computer builds, repairs, etc. Along with the help of some friends. Since I was about 8 and got my first computer I made it my goal to learn everything about them and have since gathered more knowledge then most graduates (so I'v been told).
Thanks if your reading this and any idea's or suggestions from the community are as well welcomed non-the-less. By the way |Grepping| since I was 8 or so.
http://hadoop.apache.org/ This and Fedora 21 are worth a look. You can do a lot. The distributed file system in the hadoop suite will be handy for you. You can actually learn some valuable job skills with this.
You can also trying to do a matrix ethernet thingie where you build a 2 or 3 dimensional grid for connectivity. e.g. add some cheap ethernet cards blah blah blah.. but to start just get a cheap switch and see what happens.
matrix ethernet thingie? Could you elaborate? I don't see it / unless i misunderstood it. (you may use proper network slang - network engineer here)
@thread
I don't see it, if nothing else its a waste of electric power to run at home. But...
1) To create stable vm/cloud based system you need some NAS storage they are all going to use/share. (all cloud magic is on its storage) and a good one is going to be expensive... even very. (I use netapp storage - its awesome)
2) Network - I recommend having 1 cluster to make this cheap as possible. You need at least 1 high-speed managed switch and i doubt those servers support 1gig, and maybe load-balancer (don't go for infini-band there are lots of problems there -> they shade the +'s) this switch will have to be connected with console and utility port to your vcenter management. From there you will be able to create acl's / add ip's etc what you need... (switch/router needs to be on compatibility list if you want to go with vmware) I recommend that 1 cluster to contain about 4-6 machines... it's going to be much easier on your pocket.
Still its a expensive game, it doesn't behave as you think it would.
I've only ever done a 2, 3 and 4 ethernet link matrix with a beowulf cluster, the largest being 64 machines. The "matrix" referrs to the arrangement.
With 4 machines and 2 ethernet links per machine you might have one switch for rows and one switch for columns. With three switches and 8 machines (2x2x2 cube) you might have one switch for x, one for y and one for z instead of having a single switch for all devices.
At the time, and this was a while ago, this was a more efficient setup to provide interconnectivity to all machines on the network, and did provide some level of redundancy in case of a problem. I think low latency communication was the name of the game, too, so some folks recommended disabling store-and-forward functionality on the switch and setting everything to cut through I think.
with 4 ethernet links it is a bit harder to visualize than with 2 and 3 ethernet links. Looks a bit like a hypercube lol
RE network/san -- checkout hadoops distributed file system. you may be pleasantly surprised. it's a distributed file system that kind of sort of has crap hardware in mind.
Google/facebook/etc are buying the most crap hardware they can, and software that the apache foundation (not just hadoop and hadoop sub projects, but other apache foundation stuff) is making it so that you can get 5/6/7 9s of reliability out of a pile of crap hardware.
equilogic/emc/dell/netapp prospects are bleak on a 10 year time scale because of this. No one does big iron anymore. Look at how expensive/hard it is to get Xeon E7. Have you even heard of Xeon E7? No, you can get two dual-socket E5 systems for half the cost of a single E7 and match them in compute power. So companies are not buying these "big iron" solutions at anywhere near the pace they once were.
ok i think I get it. I just lost myself trying to compare this to what i have done. But I use ibm blade-chassis (12+ blades) with added network card (have 4x 1gig ports) where 2 go to one and the other 2 go onto different network. ;and my vcent software points to managed load-balancer if one segment is down it migrates traffic without downtime to same virtual server on different network.
I just do not see why someone would need additional 2 or 4 ports :| ; i usually use 2nd port as backup anyway's saving resources on switches.
I'm currently researching hadoop, elasticsearch and mongodb but more as just setup not actual messing around with it. Since we want to deploy PredictionIO ... and our pogrammer wants it to run on ubuntu ;/ (tears)
We actually are still with using previous gen xeon's :) (never heard of the new ones, or seen it) if it needs more resources we just add another physical cpu to vm.
Thanks for all the reply's I'll be taking a look at Fedora 21 and hadoop (I'v looked at before i believe). Im currently building a custom desk solution as well as a rack for all my equipment, hoping to throw up a small test rig soon for this!
@CyklonDX all the machines's NIC's are gigabyte, and I have Gigabyte switches already as well as all the cables and tools needed. Only thing I really needed to know about was an OS/Software best suited for what I was looking for :) at one point I had 15 of them running Beowulf with rocks on them but it was not what I wanted. Also my home router that deals with all devices in my house is running quad-core (intel) with 8GB ram, 2 high end NIC's. It runs everything from DNS, DHCP, SAMBA, CUPS, firewall active connection virus scan intrusion detection and much more, its the gateway after my modem to even get to my network. :)