Create vector matrix of movie ratings using R project

0 votes

Suppose I am using this data set of movie ratings: http://www.grouplens.org/node/73

It contains ratings in a file formatted as userID::movieID::rating::timestamp

Given this, I want to construct a feature matrix in R project, where each row corresponds to a user and each column indicates the rating that the user gave to the movie (if any).

Example, if the data file contains

1::1::1::10
2::2::2::11
1::2::3::12
2::1::5::13
3::3::4::14

Then the output matrix would look like:

UserID, Movie1, Movie2, Movie3
1, 1, 3, NA
2, 5, 2, NA
3, NA, NA, 3

So is there some built-in way to achieve this in R project. I wrote a simple python script to do the same thing but I bet there are more efficient ways to accomplish this.

Jun 30, 2018 in Data Analytics by Sahiti
• 6,370 points
893 views

1 answer to this question.

0 votes

Why do'nt you try the dcast function, in the reshape2 package.

d <- read.delim(
  "u1.base", 
  col.names = c("user", "film", "rating", "timestamp")
)
library(reshape2)
d <- dcast( d, user ~ film, value.var = "rating" )

Just in case your fields are separated by double colons, you cannot use the sep argument of read.delim

So, you just have to read the file as a single column, split the strings, and concatenate the result.

d <- read.delim("a")
d <- as.character( d[,1] )   # vector of strings
d <- strsplit( d, "::" )     # List of vectors of strings of characters
d <- lapply( d, as.numeric ) # List of vectors of numbers
d <- do.call( rbind, d )     # Matrix
d <- as.data.frame( d )
colnames( d ) <- c( "user", "movie", "rating", "timestamp" )
answered Jun 30, 2018 by anonymous

Related Questions In Data Analytics

+1 vote
1 answer

How to extract every nth element of a vector using R?

m <- 1:50 n<- m[seq(1, length(m), 6)] The above ...READ MORE

answered May 14, 2018 in Data Analytics by zombie
• 3,790 points
27,739 views
0 votes
2 answers

How to use group by for multiple columns in dplyr, using string vector input in R?

data = data.frame(   zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),   zbc123qws1 ...READ MORE

answered Aug 6, 2019 in Data Analytics by anonymous
13,687 views
0 votes
1 answer

How to create dummy variables based on a categorical variable of lists in R?

You can use mtabulate in the following way: library(qdapTools) cbind(data[1], ...READ MORE

answered Apr 13, 2018 in Data Analytics by CodingByHeart77
• 3,740 points
2,328 views
+1 vote
1 answer

How to convert a list of dataframes in to a single dataframe using R?

You can use the plyr function: data <- ...READ MORE

answered Apr 14, 2018 in Data Analytics by Sahiti
• 6,370 points
6,349 views
+1 vote
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,284 views
0 votes
2 answers

In data frame how to spilt strings into values?

You can do this using dplyr and ...READ MORE

answered Dec 5, 2018 in Data Analytics by Kalgi
• 52,360 points
797 views
0 votes
1 answer
0 votes
1 answer

How to convert a text mining termDocumentMatrix into excel or csv in R?

By assuming that all the values are ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,632 views
0 votes
1 answer

How to cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 26, 2018 in Data Analytics by DataKing99
• 8,240 points
570 views
0 votes
1 answer

How to print new lines with print() in R?

You can use cat() instead of writeLines(): ...READ MORE

answered May 3, 2018 in Data Analytics by kappa3010
• 2,090 points
604 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP