Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.

From Our CTO: Getting Started With Data Science

stephanjou Contributor.
Contributor.
0 0 214

Interset’s internal mission statement starts with “We catch bad guys with math”! As a company that places a lot of emphasis on a principled, analytics-first approach to threat detection, I’m very proud of the Data Science team at Interset, as well as the culture of principled statistics and education they foster here.

From Our CTO Getting Started With Data Science.jpgAs a result of this, I am occasionally asked how someone could get started in data science. This is a great question, as some of the best data scientists in the world, and at Interset, do not have a traditional data science background (whatever that means). In fact, they were more likely to have moved into data science from some other quantitative or technical field.

I thought I’d share my suggestions here, in case they help others who have an interest in learning about data science and its applications. Each section has a number of suggestions, sorted in order from most accessible to most advanced.

Courses and Videos
This is my favorite one video on AI, its history, and deep learning: AI, Deep Learning, and Machine Learning: A Primer (Andreessen Horowitz). From Andreessen-Horowitz, it’s only an hour, but worth the watch if you haven’t seen it before. Frank Chen does a great job of describing this history of AI, its limitations, but also why it has such fantastic promise today. Frank’s follow up video is also great: The Promise of AI.

This is the best Coursera course on machine learning: Machine Learning. The course is taught by the famous Andrew Ng from Stanford, Google, Baidu, and considered one of the fathers of deep learning. The tooling is a bit dated, but Ng does a fantastic job layout out the foundational math, with exercises, in a very accessible way.

A deep-dive course into neural networks and deep learning: Neural Networks for Machine LearningIf you really want to dive into neural networks (deep learning), you want to learn from the master, Professor Geoffrey Hinton. He’s my professor from the University of Toronto who taught me neural networks in the first place! Hinton has always been a great lecturer, and he is also considered one of the fathers of neural networks and deep learning.

Books

Foundational textbook on statistical learning: The Elements of Statistical Learning: Data Mining, Inference, and PredictionRequired only if you really want a textbook with math. If you do, this is the one to get in my opinion. You can actually download the PDF of an early edition of Hastie, Tibshirani, and Friedman’s book for free. As an aside, Hastie and Tibshirani have a free online course on statistical learning, hosted by Stanford. I have no personal experience with this course, but it looks fantastic. 

R Resources
The great news is that there are a lot of freely available tools out there to experiment right away with machine learning and data science. My favorite environment remains R. It’s what most academic researchers use, and it’s what we ourselves use at Interset.

If you’re going to do anything in R, you’ll want to become familiar with Hadley’s R packages, known affectionately as the Hadleyverse. Here’s a guide: The Hitchhiker’s Guide to the Hadleyverse.

Of course, you’ll want to download RStudio itself. There are other IDE’s, but this is the best one.

Your First Data-Science Pet Project
Here’s something very important to me: The best, fastest way for you to go from a data scientist who just has “book knowledge,” to an experienced, effective data scientist, is to have actually solved a problem you cared about. You need to have felt the pain of dealing with large volumes of messy data, wrestled with long nights of feature engineering—trying to separate the good columns from the bad, spending days staring at a sea of numbers and text to find the hidden signal.

From trying to solve or better understand a problem that impacts you or your family, to figuring out the best way for your favorite sports team to win the next competition—if you can find that challenge you’d like to solve using data science, and spend time trying to squeeze every ounce of value from a data set, you will learn more from that exercise than any other exercise I can think of.

There are a lot of data sets out there, some cleaner than others. Here’s a very small subset to inspire some thought:

About the Author
Two decades and over ten 1.0 new products and solutions -- architecting, designing and inventing algorithms, software and technology from small startups to one of the largest software development companies in the world. Specialties: Development leadership, big data, analytics, software architecture, web service architecture, mobile development, cloud computing, visualization, and Windows development
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.