How good at SQL does a data scientist really need to be?

+1 vote
Aug 9, 2018 in Data Analytics by Anmol
• 1,610 points
40 views

1 answer to this question.

0 votes

SQL is a standardized query language for requesting information from a database[1]

On a scale of 1–10 where 1 only knows select * from table and a 10 can fluently build stored procedures and views, a data scientist should be at least 7.

Why?

SQL is THE language for working through a database environment. It’s not the language to perform “science” on the data, but it is the language to pull and manipulate the data. A DATA scientist needs to be fluent in DATA. Being fluent in data means that they should have a proper understanding of the final stage of data governance.

Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data.[2] The final stage of data governance: querying the data.

If a data scientist fully relied on a data engineer or an ETL developer to get all of the data they needed, they would have a tough time finding an employer who wants them.

Are you going to develop a statistical approach on a table that contains 2 billionrows? What’s your plan? Store all of that in R or Python memory? Come on…

All things aside, SQL is an easy language to learn. It honestly mirrors the English language.

A data scientist, who is typically expected to be fluent in one of R, Python or SAS, could and should be able to learn and be proficient in SQL in a relatively short amount of time

answered Aug 9, 2018 by Anmol
• 3,620 points

Related Questions In Data Analytics

0 votes
1 answer
+1 vote
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,720 points
104 views
0 votes
1 answer

How to create a list of Data frames?

Basically all we have to do is ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
84 views
0 votes
1 answer

How to spilt a column of a data frame into multiple columns

it is easily achievable by using "stringr" ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
97 views
0 votes
2 answers

What is difference between Distributed search head and Search head cluster?

 A distributed environment describes the separation of ...READ MORE

answered Dec 3, 2018 in Data Analytics by Ali
• 10,430 points
167 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 2, 2018 in Data Analytics by Anmol
• 3,620 points
52 views
0 votes
2 answers

Installing MXNet for R in Windows System

You can install it for python in ...READ MORE

answered Dec 3, 2018 in Data Analytics by Kalgi
• 41,620 points
365 views
+1 vote
3 answers

Problem with installation of Wordcloud in anaconda

Using Anaconda Python 3.6 version For Windows ...READ MORE

answered Aug 7, 2018 in Data Analytics by Priyaj
• 56,900 points
3,843 views
0 votes
2 answers

What will be first step to be a data scientist?

Your first steps towards becoming a top ...READ MORE

answered Aug 8, 2018 in Data Analytics by zombie
• 3,690 points
45 views
0 votes
2 answers

How does data cleaning play a vital role in data analysis

Data is the core you do your ...READ MORE

answered Jul 23, 2018 in Data Analytics by Anmol
• 3,620 points
181 views