How do I become a data scientist step by step?

0 votes
I am a novice learner in python, I work for as a software developer, how should I move ahead to be a data scientist.
Jul 26, 2018 in Data Analytics by Anmol
• 1,620 points
71 views

1 answer to this question.

0 votes

I am assuming that you are a fresher, so if you are planning to begin your career in Data Science, there is a protracted sojourn.

Now to become a data scientist, you must be having below questions in your mind:

But how do I go about becoming one?

Where should I start from?

What is my learning roadmap?

Which tools and techniques do I need to know?

How will I know when I have achieved my goal?

Let me help you with them. Here is a complete roadmap or the skills required to become a Data Scientist.

image

Let me know, let me explain all of these skills one by one

  1. Fundamentals: These includes all the fundamentals of mathematics such as Matrices and Linear Algebra Functions, Hash Functions and Binary Tree, Relational Algebra, Database Basics, ETL ( Extract Transform Load ), Reporting VS BI (Business Intelligence) VS Analytics
  2. Statistics: This includes Descriptive Statistics (Mean, Median, Range, Standard Deviation, Variance), Exploratory Data Analysis, Percentiles and Outliers, Probability Theory, Bayes Theorem, Random Variables, Cumulative Distribution function (CDF), Skewness and other Statistics fundamentals.
    I would suggest you pick a dataset from UCI repo. and start right now!
  3. Programming: Expertise in any one programming language, I would suggest ‘R’ or ‘Python’.
  4. Database knowledge: You should have the basic knowledge to store and analyze the data. For example MySQL, Cassandra etc.
  5. Machine Learning and Advanced Machine Learning (Deep Learning): Understand different types of Machine Learning techniques such as Supervised, Unsupervised, Reinforcement Learning. You should also be familiar with the fundamentals of Neural Networks for deep learning, Anyone library used for creating Deep Learning models, such as Tensorflow or Keras. It would be a plus if you learn how Convolutional Neural Networks, Recurrent Neural Networks and RBM and Autoencoders work.
  6. Big Data: Data Analytics is in the frontiers of IT. Big Data Analytics has become crucial as it aids in improving business, decision makings and providing the biggest edge over the competitors. This applies to organizations as well as professionals in the Analytics domain. As a Data Scientist, it is very important to have knowledge about frameworks that can process Big Data. Two of the most famous ones are ‘Hadoop’ and ‘Spark’.
  7. Data Ingestion & Munging: The process of importing, transferring, loading and processing data for later use or storage in a database is called Data Ingestion. This involves loading data from a variety of sources. Few Data Ingestion tools: Apache Flume and Apache Sqoop. Also, all the activity that you do on the raw data to make it “clean” enough to input to your analytical algorithm is data munging. You can use ‘R’ and ‘Python’ packages for that.
  8. Data Visualization: Good hands-on knowledge is required on various visualization tools. Even, you can use a programming language for that purpose. Some of the visualization tools are: Tableau, Kibana, Google Charts, Datawrapper
  9. Data-Driven Problem Solving: All the things we have discussed so far, includes tools and technologies that you can learn. But, Data-Driven problem-solving approach is something that you need to develop. It will only come with experience.

You can also go through the below video to know all the steps discussed in detail. Let me know your thoughts about it.. :-)

A Data Scientist needs to know how to productively approach a problem.

This means identifying a situation’s

  • salient features,
  • figuring out how to frame a question that will yield the desired answer,
  • deciding what approximations make sense, and
  • consulting the right co-workers at the appropriate junctures of the analytic process.

All of that in addition to knowing which data science methods to apply to the problem at hand.

Hope this answer helps. :-)

answered Jul 26, 2018 by ANMOL
• 3,620 points

Related Questions In Data Analytics

0 votes
1 answer

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 13, 2018 in Data Analytics by zombie
• 3,690 points
28 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
31 views
+1 vote
1 answer

How do I sum the values of a variable by group but RETAIN all records?

Hey @Kirk, what you can do is, ...READ MORE

answered Jun 6 in Data Analytics by Kalgi
• 37,320 points
23 views
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
69 views
0 votes
1 answer

What is the difference between correlation and covariance?

Correlation and Co-variance both are used as ...READ MORE

answered Jul 24, 2018 in Data Analytics by ANMOL
• 3,620 points
1,554 views
0 votes
1 answer

What is the difference between random forest and decision trees?

The basic difference is that Random Forest ...READ MORE

answered Jul 30, 2018 in Data Analytics by ANMOL
• 3,620 points
343 views
0 votes
2 answers

What is the difference between LDA and PCA for dimensionality reduction?

Principal Component Analysis (PCA) is an unsupervised ...READ MORE

answered Mar 6 in Data Analytics by Seema
• 140 points
731 views
+1 vote
1 answer

How do I perform feature selection in a disease prediction data set?

Feature selection is based equally upon logic ...READ MORE

answered Aug 20, 2018 in Data Analytics by ANMOL
• 3,620 points
37 views
0 votes
2 answers

What will be first step to be a data scientist?

Your first steps towards becoming a top ...READ MORE

answered Aug 8, 2018 in Data Analytics by zombie
• 3,690 points
27 views