How good at SQL does a data scientist really need to be

+1 vote
Aug 9, 2018 in Data Analytics by Anmol
• 1,780 points
420 views

1 answer to this question.

0 votes

SQL is a standardized query language for requesting information from a database[1]

On a scale of 1–10 where 1 only knows select * from table and a 10 can fluently build stored procedures and views, a data scientist should be at least 7.

Why?

SQL is THE language for working through a database environment. It’s not the language to perform “science” on the data, but it is the language to pull and manipulate the data. A DATA scientist needs to be fluent in DATA. Being fluent in data means that they should have a proper understanding of the final stage of data governance.

Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data.[2] The final stage of data governance: querying the data.

If a data scientist fully relied on a data engineer or an ETL developer to get all of the data they needed, they would have a tough time finding an employer who wants them.

Are you going to develop a statistical approach on a table that contains 2 billionrows? What’s your plan? Store all of that in R or Python memory? Come on…

All things aside, SQL is an easy language to learn. It honestly mirrors the English language.

A data scientist, who is typically expected to be fluent in one of R, Python or SAS, could and should be able to learn and be proficient in SQL in a relatively short amount of time

answered Aug 9, 2018 by Abhi
• 3,720 points

Related Questions In Data Analytics

0 votes
1 answer
+1 vote
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,242 views
0 votes
1 answer

How to create a list of Data frames?

Basically all we have to do is ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
997 views
0 votes
1 answer

How to spilt a column of a data frame into multiple columns

it is easily achievable by using "stringr" ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,441 views
0 votes
2 answers

What is difference between Distributed search head and Search head cluster?

 A distributed environment describes the separation of ...READ MORE

answered Dec 4, 2018 in Data Analytics by Ali
• 11,360 points
2,038 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 3, 2018 in Data Analytics by Abhi
• 3,720 points
953 views
0 votes
2 answers

Installing MXNet for R in Windows System

You can install it for python in ...READ MORE

answered Dec 4, 2018 in Data Analytics by Kalgi
• 52,360 points
1,870 views
+2 votes
3 answers

Problem with installation of Wordcloud in anaconda

Using Anaconda Python 3.6 version For Windows ...READ MORE

answered Aug 7, 2018 in Data Analytics by Priyaj
• 58,090 points
18,022 views
0 votes
2 answers

What will be first step to be a data scientist?

Your first steps towards becoming a top ...READ MORE

answered Aug 9, 2018 in Data Analytics by zombie
• 3,790 points
607 views
0 votes
2 answers

How does data cleaning play a vital role in data analysis

Data is the core you do your ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
4,927 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP