Data Science!

Hey guys, I have started down the path of Data Science I am currently loving R and the Shiny framework for web apps..
I'd like to know who's gotten into it? What has your experience been? Any production on Data Analysis? What Language do you prefer? Python, R, Matlab yuck, or other?

2 Likes

I know enough R to make pretty data visualizations and embarrass myself for serious work.

I prefer lua to Python for Data stuff, I've used Torch, Caffe, Tensorflow. Matlab in production.

Ended up pivoting from data science to machine learning in my work/research

Torch is the future.

4 Likes

This is incredibly interesting thank you very much!! This is what I love about these forums

I am precisely starting to get into machine learning, it feels like the next logical step from Data Analysis, then AI, maybe..

1 Like

let me know how that goes, I will one day head down that path.

1 Like

Sorry to bump/necro this thread, but I thought the below information would be pertinent to the discussion.

Through work, I've poked at Shiny a bit. There are some good things there for sure, and it certainly makes it easy to present your findings. We're still evaluating it for surfacing some data to our engineers.

The company I work for is a Microsoft shop, so I'm becoming versed in the Microsoft Cloud stack. I currently work with Azure ML Studio and cloud data storage. As of yet, our projects haven't required a Hadoop Cluster (HDInsight) to solve.

The best piece of advice I've received: "Broaden your tool belt - a hammer alone isn't going to get this done!"

Don't get too wrapped up in individual tools - unless of course your company has selected for you. Know what's out there tool wise, know what the limitations, and start solving real problems! I know, easier said than done.

Unless you're going to have someone prep your data for you (which I don't recommend), know how to work with SQL, JSON, and parse text...it sucks, but sometimes there's no other alternative.

For languages, it depends what you want to do. I'm interested in Deep Learning and GPU compute. In the graphic below it looks like Python and C++ would be a reasonably safe bet. Personally, I would lean towards Python. Python and R, as least right now, are gaining momentum. In the Microsoft space R is edging, but slightly. For many things, Microsoft supports both R and Python. In open, I would argue Python is leading.

There are other caveats here as well. I know there are packages in R that support MXNet and some of the other popular packages, so it's not always clear cut just looking at the base offerings.

Another tip I've picked up from other in the industry is the use of notebooks (Jupyter). These allow you to provide reproducible results AND document as you go.

I was initially thinking about building an NVidia digits box, but now I'm not so sure. It comes configured with a web interface and would provide: Caffe, Theano, Torch, BIDMach. If I did go this route, it would be a cut down version of it. Otherwise, I'll roll my own monstrosity.

When my classes break in August, I want to consider either doing the Microsoft Stack EdX courses or fast.ai's MOOC. The former is too many hours than I have time for (200-300+), so I'll likely do the latter (70+) and learn some Python at the same time. I want to accelerate my learning!

Hope this helps.

2 Likes

Helps a lot thank you. Very interesting and informative. :+1::+1::+1:

Modeling Random Samples from Normal Distributions with OpenOffice Calc & C++ Part I

Modeling Random Samples from Normal Distributions with OpenOffice Calc & C++ Part II

Yeah, it’s pretty trivial, but I have to do something with too much time on my hands.