Data Science & Analysis has become one of the most important career skills you can learn. It is high paying (a data scientist median salary is above $100k) and heavily demanded in the workplace (as Datanami says, demand far outpaces qualified applicants).
So if you’re looking for a skill to learn, this is a great one to choose. You don’t need advanced schooling or special degrees… the best learning sources are actually online and done through self-study. And with so many technology advances right around the corner (machine learning, robotics, etc), it has never been a better time to get into this field.
The issue is how to learn it. Data & Analysis is a big field and there are a lot of learning sources. So we’re here to help with a learning path and career guide to help you maximize your professional outcome.
Each of the subjects here should be taken to build a comprehensive data science education. We use “data analyst” and “data scientist” interchangeably for education sake, as the skill-set is very similar. However, we’re working toward a data scientist career outcome, with the ability to take a more analyst type role if you choose. Let’s learn!
The best way to learn data science is to get right into it. You will lose interest and not understand the practicalities of the subject if you keep learning theory. Therefore, we want you to learn to code and simultaneously do projects for the majority of your initial learning.
R is by far the most widely used programming language in statistics and data science. There are other commonly used languages in this industry like Python, Scala, and occasionally others. But start with R and get to an advanced level and then supplement. By strictly learning R and having the statistical background we recommended, you’ll already be qualified for data analyst jobs. And then you can decide later if Python, SAS, or an entirely different specialization is more suitable to your interests.
Learning R comprehensively will take a long time and we don’t advise rushing through it. Depending on your dedication, allocate 3-6 months to mastering it.
Where to learn? The best and most comprehensive source is DataCamp R Learning, which can get you to a high proficiency level.
As you learn, you will want to do projects for two purposes:
First, to apply your learning. If you don’t write actual code and apply it to real business scenarios, you’ll never retain the course knowledge.
Second, rather than restricting yourself to small code snippets only, you should try building to a full analysis and then stick it in your portfolio to prove your knowledge.
The nice thing about programming and data science is the massive learning community. So there is significant documentation, reusable code packages, help forums, and collaborative tools to work with other learners. And for data science, there are many repositories of raw data to practice analysis with. We like Kaggle Datasets, Github Repository 1, Github Repository 2, and Dataset Subreddit
If you want to collaborate with other programmers, Github is usually the go-to place.
Finally, to use existing code packages or to create new ones, you’ll need to access CRAN. The tricky part is figuring out what exists in their massive library. Here’s a good guide to help you out.
And when it comes to projects for R or any language, we wrote our own article Get Data Science Experience
After you’ve finished with R, it’s time to start thinking about how data scientists work at bigger companies. Obviously they are working on much larger issues, with significantly more data, and doing more sophisticated functions. This requires the right data architecture to make it possible.
Companies invest a lot of resources in this topic. They have to capture and extract the right data, store it in a way that’s logical for eventual use, and have it available and in a format that is consumable for users. For your typical companies, this data may be exclusively used by analysts and non-technical users. For others, it may just be stored for future use. And for the tech companies nowadays, it may be programmed to provide an automated function.
Data architecture determines the optimal way for each company to do this process. We won’t go into excessive detail, however, it’s an important subject to gain familiarity and perspective on. Data architecture provides the potential or limitation on what a data scientist is capable of doing down the line. In terms of learning it, we like Data Warehousing Tutorial and Youtube’s Data Warehouse Videos (obviously for data warehousing) and then A Beginner’s Guide to Warehousing Big Data Architecture (for big data solutions).
R is the most widely used data science language but Python is definitely second. It has more versatile uses, efficiency as an object oriented programming language, and greater functionality when your code gets large-scale (in web apps or machine learning for example). If your career is going the developer route rather than the business analyst route, learning either Python or Scala (in the next tab) is a smart move.
The best place for a comprehensive Python education is definitely Dataquest Python Learning
Going one level deeper than the overall data architecture, you will store your data within a series of databases. That data is stored for a purpose… to retrieve information as part of a business function, for analysis, or as documentation. The information behind what you see on a website sits in a database. For example, when you search for flight availability, it is searching a database. Then when you pay, your customer information and transactions are stored in a different database. Go to Into to Relational Databases to learn this topic.
SQL is then the standard querying language to retrieve or modify that information in a way that is usable. Because data is always stored in an optimized database structure and SQL is used to retrieve it, this is a critical part of the learning path and almost always required in job descriptions. It’s not overly complex and so we like W3School’s SQL Tutorial for the purpose.
Scala is a newer programming language and has some advantages to Python in terms of speed and ability to handle more complex functions. The popular data processing framework can handle various languages but is actually built on Scala.
So if you’re looking to become a true developer, learning Scala can be fruitful. It’s impossible for us to recommend, as it strictly depends on your preferences and what your specific use requirements are. So we’ll provide the top learning sources but consider this optional as part of the learning path.
SAS is a data analytics software package with its own language, however, is not open source. It requires expensive licensing but has some great features and is widely used in the enterprise analytics space. SAS is very versatile, fast, and integrates well with large database solutions.
This is another optional one but something to consider depending on your goals. If you aspire to work for a bigger company, there’s a good chance they’ll use SAS for analytics and acquiring the skill would be a valuable resume add.
Where can you learn? For beginners, we suggest reviewing the Tutorial Point SAS Tutorial. For people looking at more intermediate levels, we suggest The Little SAS Book and Complete & Practical SAS, Statistics & Data Analysis Course
Hadoop is the most widely used data infrastructure framework for big data. It stores, processes, and analyzes massive datasets without requiring expensive internal hardware. There are many applications built on the Hadoop framework to process and mine the data you’ve stored there.
Chances are strong you’ll work with Hadoop in your career, as more and more companies are migrating their data here. Just look at the average data scientist job posting and most will require good knowledge of it.
Spark hasn’t been around for long but has quickly become the most widely used data processing framework. It works on Hadoop and is significantly faster than their built in processing engine (MapScale) and is thus widely used for bigger jobs, streaming data requirements, or machine learning.
Remember when you learned Scala? Spark is built on top of Scala so that will now come in handy (although you can also use Python or Java). You will likely interact with Spark in your data science career.
Up until now, we’ve focused on the most technical aspects of data science. However, data is the product and visualization is the sales method. The reason you’re doing all this analysis is so it can be consumed by others. It might be your bosses that need data to make decisions. It might be a customer that’s reading your data analysis or using it for their own purpose. So you need to jazz it up to appeal to a non-technical audience. The consumers of your analysis may not have ever heard of R or Python. They just want something easy to read and friendly for use.
You can create nice graphs, visuals, dashboards, and more using any application (R, Python, SAS, Excel). However, the average person at your company won’t be opening up R to read your analysis. They also might need to be self-sufficient with reporting. So the answer is to use a data visualization tool. For you as a developer, it may or may not be necessary but for general use, it’s often highly necessary.
Tableau is one of many great data visualization tools but the most widely used one (after Excel). It interfaces with everything we’ve talked about including Hadoop so you can visualize data with any size or type of data set. So you might cleanse and manipulate data but then store it for Tableau to work its magic.
We encourage you not to underestimate this part of data science. Simply Google “awesome data visualizations” and you can see how powerful this can be. This is how your great technical work reaches the masses and earns you accolades. It’s also significantly easier to learn compared to the other topics here.
The best sources to learn is Tableau Training.
One of the big changes under way in data science is how data is moving to the cloud. Software was the first step in the cloud computing evolution and data is now following. Rather than store data using expensive hardware and complex internal architecture, cloud data services offer cost efficiencies and better performance for big data requirements. Hadoop and Spark are the obvious choices for this but it goes beyond that.
So many companies are migrating to cloud services like AWS and Azure. As a data scientist, you need to have good familiarity with what these services offer and how it fits your data needs. You don’t need a course in this but in your learning, you ought to be familiar with their big data resources and applications. So spend some learning time on AWS and Azure. Check out their database and analytics products. And develop a perspective on how this can help an organization satisfy their increasing data needs. You might lead a migration or be working primarily from these tools in the future.
By now, you probably will have been looking for statistics learning since so much in data science has a statistical foundation. For intermediate statistics, you can find great learning material from our Statistics Learning Page.
If you get into data science, you’ll have a large advantage by having advanced statistics and math skill. While you don’t need to learn these sequentially and can save them for later, they are important topics to have mastery over. Here are some of these topics and corresponding courses…
Probably the biggest topic in data science right now is machine learning. For decades, we’ve been imagining what the world would be like with artificial intelligence that could get smarter through vast quantities of data. Now that big data technology exists and we’ve generated all these applications to program and process that data, the day is finally here. And this is why you’ve seen Watson defeat Jeopardy champions or the driverless cars actually work in tests.
Machine learning builds on everything you’ve done in this learning path. It’s just the next advanced step in the ladder. You don’t have to go further and become a machine learning expert but it is a fascinating field and is attracting significant investment. So going down this learning path has tons of potential for your career.
If you make it this far, you can take our suggestion – the famous Machine Learning from Andrew Ng
We commend you for your choice to learn data science skills. This will help you take the next step up in your career. But how can you turn skills into real job outcomes?
The following guides will help you apply your new skills and see significant career benefit…
It’s pretty tough to differentiate yourself in the job market. But now that you’ve learned a valuable new skill, we think you can do it with your resume.
For more information and to sign up, visit our Building a Top Resume Guide Page
Cover letters are the first thing employers will see when you’re applying to them. So you certainly want it to look good and make a strong case for you to get the job!
For more information and to sign up, check out our page Writing Amazing Cover Letters Guide
Now that you have better qualifications, it’s time to PROVE it to employers. This is what they love even more than resumes! Portfolios are your edge compared to other applicants.
The Portfolio Guide will help with…
For more information or to sign up, visit our page – Creating a Professional Portfolio Guide
The interview is the equivalent of closing the sale. This is when you’re getting more personalized and thus requires substantial preparation.
Here’s what this guide covers…
For more information or to sign up, visit our page Owning the Interview Guide
Everyone should know how to do a business proposal but few do! Yet it’s so important to advance in your career (job applications, bidding on work, proposing ideas, etc).
For more information or to sign up, visit our page The Business Proposal Guide
Sales can be a freelancer’s biggest challenge. But it doesn’t have to be so bad. We’re here to help! This guide will help you improve with:
For more information or to sign up, visit our page – Increasing Freelance Sales Guide
Rather than go from gig to gig, you can do better. Your freelancing can be built into a full business and earn much better, more consistent revenue.
This guide helps the freelancer:
For more information or to sign up for this guide, visit our page at Freelance Gigs to Lucrative Business Guide
Now that you’ve improved your data skills, it’s time to focus on how to write them specifically. What do employers look for when hiring analysts, engineers, and data scientists?
This guide is one of a kind… we don’t know of anyone else providing such specific advice! And best of all, it’s super cheap! For more information or to sign up, check out our page:
We want to work with YOU directly. What can we focus on?
For more information or to sign up for a time slot today, visit our page Reverse Tide Personalized Career Services
No matter what career you pursue or how you utilize your data knowledge, your learning shouldn’t stop here. Good professionals will continue their education on a perpetual basis. So we want to get you started with some of the best sources in this subject, across many different mediums.
There are many websites dedicated to the latest news, perspectives, and analysis in data science related topics. In these sites, you will find some of the most prominent names in the industry and are well worth following: KDNuggets, Smart Data Collective, Data Science Central, Flowing Data, and What’s the Big Data.
For each, you can choose how to subscribe to their content. Sign up for their RSS feed, email list, or social media accounts and you’ll get a regular dose of high quality content.
Podcasts are one of the best ways to continually learn new content. With each, simply subscribe to their channel and then listen in the car, on the train, while going for a walk, or working out. There is some outstanding content in this subject and it’s an effortless way to learn. The best in data science are O’Reilly Podcasts and Data Skeptic
We think books are a great way to reinforce how you can use data to solve real life issues. So our book suggestions are not learning materials but focus on ways to apply your learnings in data and statistics to the modern world. Here are our three favorites…
Forums or message boards are a great way to learn data science more collaboratively. You can ask questions about your studies or job scenarios, get advice on technical subjects, or read along and learn from your peers. There are a lot of good options for data subjects. Our favorites for data science are Cross Validated and Data subreddits
Data Science is an amazing field. But you’re best off pairing the skill with various others. That will make you a better data scientist and better job candidate. Here are our favorite subjects to complement your data skills…