An excellent example must consist of the following items:
- A small dataset
- A running code necessary to reproduce the error for the dataset
- The system requirements, R version and its used packages details
- You can also look at the examples in help files as they are often helpful.
In general, all the code given there fulfills the requirements. Data is provided, and minimal code is provided.
How to produce a minimal dataset?
There are various options where you can use built-in datasets.
A simple data set can be built by providing a vector/data frame with some values.
You can use library(help="datasets") where you can find the description of every data set. If you want more information then it can be obtained with a question mark.
example: ?mtcars where 'mtcars' is one of the datasets in the list
Sometimes you can also choose to make a vector. There are various functions you can use to randomize a vector. Such as:
- x < - rnorm(20) for normal distribution, x <- runif(20) for uniform distribution
- sample() to randomize a vector :: x <- sample(1:10) for vector 1:10 in random order
- letters is a useful vector containing the alphabet. This can be used for making factors like: x <- sample(letters[1:4], 20, replace = TRUE)
- For matrices, you can use matrix(1:30,ncol=3)
Just in case if you want to make data frames then use data.frame() Make sure you don't make the entries names complicated.
Let me show you an example :
Data <- data.frame( X = sample(1:20), Y = sample(c("yes", "no"), 20, replace = TRUE) )
How to copy your data?
If you have a large dataset then you always can create a subset of your original data, using head(), subset() or the indicies.
After that you can use dput() to give something that we can put in R immediately.
dput(head(iris,4))
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 4L), class = "data.frame")
Suppose your dataframe has a factor with various levels, then dput should not be used since it will list all the possible factor levels, even if they arent present in the subset of your chosen data.
To avoid this, you can use the droplevels() function.
dput(droplevels(head(iris, 4)))
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = "setosa", class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 4L), class = "data.frame")
Also dput does not work on keyed data.table objects or grouped tbl_df(class grouped_df) from dplyr.
In these cases you can convert back the data to a regular data frame before sharing,dput(as.data.frame(my_data)).
You can give a text representation that can be read in using the text parameter of read.table :
zz <- "Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa" Data <- read.table(text=zz, header = TRUE)
How to produce a minimal code?
Let me start with what all you should not do:
- You should not add all kinds of data formats (unless that is the problem of course)
- You should not copy-paste a whole function/chunk of code that gives an error.
Now, what all you should do, is:
- Add which packages which are actually used.
- Just in case you open /makefiles, add some code to close them or delete the files (using unlink()).
- If you change options, make sure the code contains a statement to revert them back to the original ones.
- Test run your code in a new, empty R session to make sure the code is runnable.
Give the required information
- Make sure you give the complete information on R version, operating system etc.
- If you are running R in R Studio usingrstudioapi::versionInfo() can help you to report your RStudio version.
- If you have a problem with a specific package you can provide the version of the package by giving the output of packageVersion("name of the package").