How to create excellent examples in R

0 votes

Well, as everyone knows that while solving some situational problems or while searching for guidance, always an excellent exam is helpful

How do I create an example? What are the details I should include? How do I paste data structures from r in a text format?

Are there any tips and tricks available in addition to using dput(), dump() or structure?

When should you include library() or require() statements?

Which reserved words should one avoid, in addition to c, df, data, etc?

Any help is highly appreciated!

Apr 10, 2018 in Data Analytics by DataKing99
• 8,240 points

recategorized Apr 10, 2018 by DataKing99 482 views

1 answer to this question.

0 votes

An excellent example must consist of the following items:

  • A small dataset 
  • A running code necessary to reproduce the error  for the dataset
  • The system requirements, R version and its used packages details
  • You can also look at the examples in help files as they are often helpful.

 In general, all the code given there fulfills the requirements. Data is provided, and minimal code is provided.

How to produce a minimal dataset?

 There are various options where you can use built-in datasets. 

 A simple data set can be built by providing a vector/data frame with some values.

You can use library(help="datasets") where you can find the description of every data set. If you want more information then it can be obtained with a question mark. 

example: ?mtcars where 'mtcars' is one of the datasets in the list

Sometimes you can also choose to make a vector. There are various functions you can use to randomize a vector. Such as:

  •  x < - rnorm(20) for normal distribution, x <- runif(20) for uniform distribution 
  • sample() to randomize a vector :: x <- sample(1:10) for vector 1:10 in random order
  •  letters is a useful vector containing the alphabet. This can be used for making factors like: x <- sample(letters[1:4], 20, replace = TRUE) 
  • For matrices, you can use matrix(1:30,ncol=3) 

Just in case if you want to make data frames then use data.frame() Make sure you don't make the entries names complicated. 

Let me show you an example :

 Data <- data.frame( X = sample(1:20), Y = sample(c("yes", "no"), 20, replace = TRUE) ) 

How to copy your data? 

If you have a large dataset then you always can create a subset of your original data, using head(), subset() or the indicies.

 After that you can use dput() to give something that we can put in R immediately.

 dput(head(iris,4)) 
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 4L), class = "data.frame") 

Suppose your dataframe has a factor with various levels, then dput should not be used since it will list all the possible factor levels, even if they arent present in the subset of your chosen data. 

To avoid this, you can use the droplevels() function.  

dput(droplevels(head(iris, 4)))
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = "setosa", class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 4L), class = "data.frame") 

Also dput does not work on keyed data.table objects or grouped tbl_df(class grouped_df) from dplyr.

 In these cases you can convert back the data to a regular data frame before sharing,dput(as.data.frame(my_data)). 

You can give a text representation that can be read in using the text parameter of read.table :

zz <- "Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa" Data <- read.table(text=zz, header = TRUE) 

How to produce a minimal code? 

Let me start with what all you should not do: 

  • You should not add all kinds of data formats (unless that is the problem of course) 
  • You should not copy-paste a whole function/chunk of code that gives an error.

 Now, what all you should do, is: 

  • Add which packages which are actually used. 
  • Just in case you open /makefiles, add some code to close them or delete the files (using unlink()).
  •  If you change options, make sure the code contains a statement to revert them back to the original ones. 
  • Test run your code in a new, empty R session to make sure the code is runnable. 

Give the required information 

  • Make sure you give the complete information on R version, operating system etc.
  •  If you are running R in R Studio usingrstudioapi::versionInfo() can help you to report your RStudio version. 
  • If you have a problem with a specific package you can provide the version of the package by giving the output of packageVersion("name of the package").
     

answered Apr 10, 2018 by kappa3010
• 2,090 points

edited Apr 12, 2018 by kappa3010

Related Questions In Data Analytics

0 votes
1 answer

How to create dummy variables based on a categorical variable of lists in R?

You can use mtabulate in the following way: library(qdapTools) cbind(data[1], ...READ MORE

answered Apr 13, 2018 in Data Analytics by CodingByHeart77
• 3,740 points
2,289 views
0 votes
1 answer

How to create a box-plot using “plotly” in R?

You can use this command to create ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,740 points
7,141 views
0 votes
1 answer

How to create a new R6 Class in R?

You have to first create an object ...READ MORE

answered Jul 5, 2018 in Data Analytics by DataKing99
• 8,240 points
1,027 views
+1 vote
1 answer

How to create global data sets in R?

You can use the <<- operator for assigning variables ...READ MORE

answered Dec 12, 2018 in Data Analytics by Maverick
• 10,840 points
354 views
0 votes
1 answer

By using dpylr package sum of multiple columns

Basically here we are making an equation ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,982 views
0 votes
1 answer

How to convert a text mining termDocumentMatrix into excel or csv in R?

By assuming that all the values are ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,589 views
0 votes
1 answer

In a dpylr pipline how to use sample and seq?

For avoiding rowwise(), I prefer to use ...READ MORE

answered Apr 6, 2018 in Data Analytics by DeepCoder786
• 1,720 points

edited Jun 9, 2020 by Gitika 880 views
0 votes
1 answer

How to create a list of Data frames?

Basically all we have to do is ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
983 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 6, 2018 in Data Analytics by kappa3010
• 2,090 points
1,387 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP