Founded Data Hackers, Data Science Club w/ Kevin Huang and Anni Dong.
- Sysomos
- Nyu
- Stitchfix
My journey into data science was old and probably does not reflect the current climate of the industry. That being said, its probably both easier and harder. In particular data science is its own thing, rather than software engineers being picked for math skills, passing engineering interviews just to write SQL.
Each role, as a previous manager put it, is really the job of the interpreter and translator for the business. Each one depends really on the size of the company and the product they tend to offer.
I tried to find the {n} types of data scientists posts we often find online and i realised that they are not written by data scientists and its different every year.
I want to outline three clear paths I think students should consider and try out before decide to specialize if thats something they are interested in doing.
Analytics tools are mostly sql and data visualization tools. our goal in this profession revolves around taking the data the customers and organization produces and studying patterns to make business suggestions.
Some examples of analytics work that I've done in the past would be at Facebook. At facebook I was looking at studying mechanisms to identify bad content and taking it down as quickly as possible. My responsiblity was to come up with dashboards and metrics to track how well we are doing, and then have our engineer team be able to set goals agaisnt the metrics I designed.
I personally don't think this is true to what data science is, and should be primaility called business intelligence and analytics work. The only reason its not called these two is due to the independence that data scientists are given. Usually you might be told what to investigate as a analyist and a request from business managers, but as a ds you take responsibility for the paths you take. This is probably the most high impact type of data scienctist at a large company since you're driving big teams to set goals based on metrics you set.
My issue with this is that if you're someone with lots of interest in machine learning and writing code, like me, this isnt the best role since you often may feel unable to fix things yourself.
Algorithms on the other side is all about being the one who tries to fix the problem or improve metrics. I find that often this is more on the side of what you do being the product itself. For example, the person building the 'people you may know' feature on facebook
An example of this kinda of work might be working on improving fraud detection algorithms at uber, or building price prediction models at airbnb.
Its also the once that probably requires the most training and experience but don't be intimidated! I think if you focus on studying the subject matter and really find the relevant bits in the courses here at waterloo that you can find success.
If you notice, the transition from Analytics -> Production Engineer is kinda parallel to the arts -> math -> engineering gradient that I think exists. While Analytics is more product manager, algorithms is more mathematics + coding, while engineering is more systems work.
I also do a bit of this at Stitchfix, there I decide on a project I think is important, define what I need to collect do build out the algorithm, and even build production systems and applications to collect the data along with then building algorithms to leverage the data.
I think this is probably the easiest way for a software engineer to get into this space. Help set up systems that empower data scientists and do some data science on the side. I find that this is much more accepted than the data scientist trying to write shitty code.
As I've outlined the types of roles that exist i, think its important to try both algorithms and analytics work to really figure out which one you like, as they are very different and work different parts of your brain. I personally don't believe data science is a real thing, and that I'm just having fun writing code and doing math for money.
Moreover, its important to note that interviewing for each role is very different. Althought i have not interviewed in a long time, i know that Facebook is analytics and is very product focused, they ask questions liek "how do you know you're making the best decision, if we want this type of complex outcome, what is a simple metric to work off of?" Where as at Stitchfix, I had 6 hours of 1:1 interviews were i had stats based problems, systems based methods, product questions, coding questions, etc.
Some courses i recommend and comments.
-
STAT 230,231 - Don't worry about marks, just really understand the meaning of the distributions and hypothesis tests, you'll use them forever
-
STAT 331,333,334,341 - Just do it, these built the foundations of my knowledge
-
STAT 441, 442, 440 (by Ali Ghodsi or Martin Lysy) project based final lets you explore new topics
-
MATH 235 - Just know how to think about matrices, look up the applications of singular value decomp
-
CS 371/475 - Super helpful for knowing the pitfalls of numerics Also look into linear algebra and computational mathetics (MATH if you're into algorithms