@heffjos -
I apologize for the wall of text. Hopefully there are some useful bits in there for you.
Intro/Background
I have never developed in R before, but my undergraduate degree was in software engineering and I'm working on a Master's in BI & Analytics (Stats). I'm not too worried about learning a new language. I think the challenges I see right now are: figuring out our Dev Stack, learning R/Azure and data collection/integrity.
I'll probably get some Microsoft hate, but I work at a full Microsoft Shop. I mean top-to-bottom. O365, Azure, etc. The only other big things MS makes we do not have are their CRM and ERP. I work for decent size auto manufacturing company. We have a whole slew of engineers and manufacturing plants in the US and across the globe.
My Mission
I've got my marching orders - utilize the Azure stack to it's fullest. The cool thing is we're just getting into the Machine Learning, Data Mining, Neural Network type stuff. Really that's a small fraction of what I need to do. I'm researching the tools, processes, best practices and setting the Dev Standard.
I've been looking at Microsoft R Open, Microsoft R Server, Microsoft R Client on SQL Server 2016 R Services, Visual Studio R Tool Kit, R Server for HDInsight, Azure ML Studio and I'm trying to get into the beta for Open Mind Studio - not sure if there is R in there or not.
Proof-of-Concept
I'm working on retrieving data recorded from the PLCs (Programmable Logic Controllers) and CMMs (Coordinate Measuring Machines) and both sets come from a MS SQL Server database. A third source, which won't be ready for the Pilot, is from our Material Science Lab (MetLab). This is all for one product line and we will start with one base multi-regression model.
The data isn't too huge yet. We're pulling data for the month of June, 2016. It's 91K rows x 900+ columns for the PLC data and is serialized. The CMM and part of the MetLab data are only associative temporally (date/time), so we have some work to do there. The data is fairly clean, but will need some scrubbing.
Post Pilot, the data will get biggish - TBs in size to start with. As we add more sources, potentially even larger. Honestly size isn't even the REAL issue - it's stuff that's being recorded on paper AND data integrity. Data is at best 3rd place in the mfg. production environment - gotta change that.
We're working with the Azure Platform to ingest, stash, prep, model and visualize it. This is all to get a better understanding of the process, the data and get buy-in for bigger projects.
As you can see, Microsoft put its acquisition of Revolution Analytics to work.
I have MUCH to learn.