.SD in data.table in R

0 votes

What does .SD stand for? How is it helpful and when to use it?

According to some source, .SD is a data.table containing the subset of x's data for each group, excluding the group column(s).

Can be used when grouping by i, when grouping by by, keyed by, and adhoc_ by

Does that mean that the subset data.tables is held in memory for the upcoming/next operation?

Apr 12, 2018 in Data Analytics by kappa3010
• 2,020 points
1,543 views

1 answer to this question.

0 votes

.SD stands for "Subset of Data.table". The dot before SD has no significance but doesn't let it clash with a user-defined column name.

Consider your data.table as follows:

DT = data.table(a=rep(c("x","y","z"),each=2), b=c(1,3), v=1:6)
setkey(DT, p)
DT
#    a b p
# 1: x 1 1
# 2: y 1 3
# 3: z 1 5
# 4: x 3 2
# 5: y 3 4
# 6: z 3 6

Try the below code to understand what .SD does:

DT[ , .SD[ , paste(a, p, sep="", collapse="_")], by=b]
#    b       V1
# 1: 1 x1_y3_z5
# 2: 3 x2_y4_z6

The by=b statements divides the original data.table into a subset of 2 data.tables

DT[ , print(.SD), by=b]
# 1st sub-data.table, called '.SD' while it's being operated on:
#    a p
# 1: x 1
# 2: y 3
# 3: z 5
# 2nd sub-data.table, called '.SD' while it's being operated on:
#    a p
# 1: x 2
# 2: y 4
# 3: z 6
# Final output, since print() doesn't return anything
# Empty data.table (0 rows) of 1 col: b
and operates on them in turn.

While it is operating on any one of the subset, it let's you refer to the current subset of data.table by using a nick-name/handle/symbol .SD.

So, you can access and operate on the columns very easily.

But, data.table will carry out the operations on every single sub-data.table defined by combinations of the key, and then "pasting" them back together. After which it will return the results in a single data.table!

answered Apr 12, 2018 by nirvana
• 3,060 points

Related Questions In Data Analytics

0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 28,320 points
137 views
0 votes
1 answer

How to convert tables to a data frame in R ?

> trial.table.df <- as.data.frame(trial.table) //assuming that trial.table ...READ MORE

answered Apr 20, 2018 in Data Analytics by zombie
• 3,690 points
90 views
0 votes
1 answer

How to filter a data frame with dplyr and tidy evaluation in R?

Requires the use of map_df to run each model, ...READ MORE

answered May 16, 2018 in Data Analytics by DataKing99
• 8,130 points
121 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,130 points
554 views
+1 vote
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,720 points
113 views
0 votes
2 answers

In data frame how to spilt strings into values?

You can do this using dplyr and ...READ MORE

answered Dec 4, 2018 in Data Analytics by Kalgi
• 41,810 points
57 views
0 votes
1 answer
0 votes
1 answer

How to convert a text mining termDocumentMatrix into excel or csv in R?

By assuming that all the values are ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,720 points
138 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 12, 2018 by nirvana 3,153 views
0 votes
1 answer

Is there any way to check for missing packages and install them in R?

There are 2 options: Either you can use ...READ MORE

answered Apr 17, 2018 in Data Analytics by nirvana
• 3,060 points
49 views