I am assuming that you are a fresher, so if you are planning to begin your career in Data Science, there is a protracted sojourn.
Now to become a data scientist, you must be having below questions in your mind:
But how do I go about becoming one?
Where should I start from?
What is my learning roadmap?
Which tools and techniques do I need to know?
How will I know when I have achieved my goal?
Let me help you with them. Here is a complete roadmap or the skills required to become a Data Scientist.
Let me know, let me explain all of these skills one by one
- Fundamentals: These includes all the fundamentals of mathematics such as Matrices and Linear Algebra Functions, Hash Functions and Binary Tree, Relational Algebra, Database Basics, ETL ( Extract Transform Load ), Reporting VS BI (Business Intelligence) VS Analytics
- Statistics: This includes Descriptive Statistics (Mean, Median, Range, Standard Deviation, Variance), Exploratory Data Analysis, Percentiles and Outliers, Probability Theory, Bayes Theorem, Random Variables, Cumulative Distribution function (CDF), Skewness and other Statistics fundamentals.
I would suggest you pick a dataset from UCI repo. and start right now!
- Programming: Expertise in any one programming language, I would suggest ‘R’ or ‘Python’.
- Database knowledge: You should have the basic knowledge to store and analyze the data. For example MySQL, Cassandra etc.
- Machine Learning and Advanced Machine Learning (Deep Learning): Understand different types of Machine Learning techniques such as Supervised, Unsupervised, Reinforcement Learning. You should also be familiar with the fundamentals of Neural Networks for deep learning, Anyone library used for creating Deep Learning models, such as Tensorflow or Keras. It would be a plus if you learn how Convolutional Neural Networks, Recurrent Neural Networks and RBM and Autoencoders work.
- Big Data: Data Analytics is in the frontiers of IT. Big Data Analytics has become crucial as it aids in improving business, decision makings and providing the biggest edge over the competitors. This applies to organizations as well as professionals in the Analytics domain. As a Data Scientist, it is very important to have knowledge about frameworks that can process Big Data. Two of the most famous ones are ‘Hadoop’ and ‘Spark’.
- Data Ingestion & Munging: The process of importing, transferring, loading and processing data for later use or storage in a database is called Data Ingestion. This involves loading data from a variety of sources. Few Data Ingestion tools: Apache Flume and Apache Sqoop. Also, all the activity that you do on the raw data to make it “clean” enough to input to your analytical algorithm is data munging. You can use ‘R’ and ‘Python’ packages for that.
- Data Visualization: Good hands-on knowledge is required on various visualization tools. Even, you can use a programming language for that purpose. Some of the visualization tools are: Tableau, Kibana, Google Charts, Datawrapper
- Data-Driven Problem Solving: All the things we have discussed so far, includes tools and technologies that you can learn. But, Data-Driven problem-solving approach is something that you need to develop. It will only come with experience.
You can also go through the below video to know all the steps discussed in detail. Let me know your thoughts about it.. :-)
A Data Scientist needs to know how to productively approach a problem.
This means identifying a situation’s
- salient features,
- figuring out how to frame a question that will yield the desired answer,
- deciding what approximations make sense, and
- consulting the right co-workers at the appropriate junctures of the analytic process.
All of that in addition to knowing which data science methods to apply to the problem at hand.
Hope this answer helps. :-)