Transforming a key/value string into distinct rows in R

0 votes

I am working with a data-set which has key-value strings.

The data-set looks like this:

quest<-data.frame(city=c("Atlanta","New York","Atlanta","Tampa"), key_value=c("rev=63;qty=1;zip=45987","rev=10.60|34;qty=1|2;zip=12686|12694","rev=12;qty=1;zip=74268","rev=3|24|8;qty=1|6|3;zip=33684|36842|30254"))

The data-set basically looks like this:

   city                                  key_value
1  Atlanta                     rev=63;qty=1;zip=45987
2 New York       rev=10.60|34;qty=1|2;zip=12686|12694
3  Atlanta                     rev=12;qty=1;zip=74268
4    Tampa rev=3|24|8;qty=1|6|3;zip=33684|36842|30254

Now, I would want to rearrange the above data-set into distinct rows, so the final data-set should look like this:

      city  rev qty   zip
1  Atlanta 63.0   1 45987
2 New York 10.6   1 12686
3 New York 34.0   2 12686
4  Atlanta 12.0   1 74268
5    Tampa  3.0   1 33684
6    Tampa 24.0   6 33684
7    Tampa  8.0   3 33684

The number of rows to be created will be determined by this common delimiter "|"

Mar 26, 2018 in Data Analytics by coldcode
• 2,020 points
70 views

2 answers to this question.

+1 vote

We would start off by loading the "tidyverse" package. It is a collection of R packages designed specifically for the purpose of data science.

  • Our first task would be to expand the number of rows with "seperate_rows()" function, we will split the 'key_value' column by ';'
  • Then, we will go ahead seperate the 'key_value' column into two individual columns and we will name them as 'key' and 'value'
  • Again we will expand the number of rows by splitting the 'value' column by '|'
  • Then will group by 'city' and 'key' columns
  • Finally we will get the sequence number and 'spread' to wide format
Below is the code to implement all of the above steps:
library(tidyverse)
separate_rows(quest, key_value, sep=";") %>% 
     separate(key_value, into = c("key", "value"), sep="=") %>% 
     separate_rows(value, sep="[|]", convert = TRUE) %>% 
     group_by(city, key) %>% 
     mutate(rn = row_number()) %>% 
     spread(key, value) %>%
     select(-rn)
answered Mar 26, 2018 by Bharani
• 4,550 points
0 votes

You can use this r package called tidyverse. Something like this:

library(tidyverse)
separate_rows(quest, key_value, sep=";") %>% 
     separate(key_value, into = c("key", "value"), sep="=") %>% 
     separate_rows(value, sep="[|]", convert = TRUE) %>% 
     group_by(city, key) %>% 
     mutate(rn = row_number()) %>% 
     spread(key, value) %>%
     select(-rn)
answered Dec 3, 2018 by Kalgi
• 41,660 points

Related Questions In Data Analytics

0 votes
1 answer

How to convert a text mining termDocumentMatrix into excel or csv in R?

By assuming that all the values are ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,720 points
135 views
0 votes
1 answer

How to evaluate expression given as a string in R?

The eval() function evaluates an expression, but "5+5" is a string, ...READ MORE

answered Jun 7, 2018 in Data Analytics by DataKing99
• 8,130 points
183 views
0 votes
1 answer
0 votes
1 answer

Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
151 views
0 votes
1 answer

Left Join and Right Join using "dplyr"

The below is the code to perform ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
111 views
0 votes
1 answer

Plotting multiple graphs on the same page in R

If you want to plot 4 graphs ...READ MORE

answered Mar 27, 2018 in Data Analytics by Bharani
• 4,550 points
56 views
+1 vote
2 answers

Custom Function to replace missing values in a vector with the mean of values

Try this. lapply(a,function(x){ifelse(is.na(x),mean(a,na.rm = TRUE) ...READ MORE

answered Aug 14 in Data Analytics by anonymous
85 views
0 votes
1 answer

Discarding duplicate rows from a data.frame - R

You can use distinct() function along with ...READ MORE

answered May 4, 2018 in Data Analytics by Bharani
• 4,550 points
39 views
0 votes
2 answers

Locating row index of a column which has the maximum value - R

Hi, Nirvana You can also try this. which(iris$Sepal.Length == ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 28,040 points
112 views