Need help building a gpu workstation

I need to build a workstation to run and test my cuda codes before deploying in a bigger cuda cluster. Need help choosing the system as I have little knowledge on current hardware. But Xeon and Tesla is what I was looking for.
Any info will be highly appreciated. Budget is max $5000.

what type of the code? (what type of calculations)

are you loading a lot into memory?

Well do you need per core performance, or a lot of cores and RAM? Otherwise if it's only for simple testing, just buy a cheap quadro?

It would honestly be surprising if anyone here knows much about this subject.

I assume you'll be using icc? Will you be running architecture dependent compiler flags (e.g. -xHost)? If so wouldn't it matter what your cluster is running? For the GPU, I think it would also depend on what your cluster is running since certain code will only run on certain platforms. Just as an example, tensor flow will only run on cuda compute 3.5+ platforms.

Thanks a lot everyone for taking the time to check in to my problem.

It is cudaC code with a lot matrix manipulation. Yes, loading a lot in the memory.

In fact I do not need a per core performance. I have developed the code using a dell mobile workstation with a quadro 1100M. But the code does not scale well in a larger cluster (with K40s) as I was testing with a smaller and different problem set. Now I would like to use the same test cases but with coarse mesh.

I guess I have answered all in my previous comment.
Thanks again for the inputs waiting for a build suggestion.

k80 or k1

I assume you've used CUDA in the past? Memory bandwidth is a huge limiting factor. The xeon phi is a nice platform if you're not stuck on cuda.

So then 40-;lane X99 chip with 4 cheap quadros to test it's scaling?

This just happened to be the cheapest CPU for X99 with 40 PCI-lanes, it only runs at 1.6ghz base clock, so ya probably upgrade that just because given that budget

Motherboard has a ton of PCI-e, including dual gigabit and dual 10-gigabit I guess for some reason

How do you guys hooks all the cards up together anyways? Too bad no OpenCL, the firepro cards seem better for the money as far as hardware specs go

Wait couldn't you build a bunch of cheap tiny systems and cluster those to test things?

PCPartPicker part list:
Price breakdown by merchant:

CPU: Intel Xeon E5-2603 V3 1.6GHz 6-Core Processor ($209.99 @ SuperBiiz)
Motherboard: ASRock X99 WS-E/10G EATX LGA2011-3 Motherboard ($645.99 @ SuperBiiz)
Total: $855.98
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2016-06-20 10:26 EDT-0400

let me know about this build and add one K40 with this