Transforming a key/value string into distinct rows in R

0 votes

I am working with a data-set which has key-value strings.

The data-set looks like this:

quest<-data.frame(city=c("Atlanta","New York","Atlanta","Tampa"), key_value=c("rev=63;qty=1;zip=45987","rev=10.60|34;qty=1|2;zip=12686|12694","rev=12;qty=1;zip=74268","rev=3|24|8;qty=1|6|3;zip=33684|36842|30254"))

The data-set basically looks like this:

   city                                  key_value
1  Atlanta                     rev=63;qty=1;zip=45987
2 New York       rev=10.60|34;qty=1|2;zip=12686|12694
3  Atlanta                     rev=12;qty=1;zip=74268
4    Tampa rev=3|24|8;qty=1|6|3;zip=33684|36842|30254

Now, I would want to rearrange the above data-set into distinct rows, so the final data-set should look like this:

      city  rev qty   zip
1  Atlanta 63.0   1 45987
2 New York 10.6   1 12686
3 New York 34.0   2 12686
4  Atlanta 12.0   1 74268
5    Tampa  3.0   1 33684
6    Tampa 24.0   6 33684
7    Tampa  8.0   3 33684

The number of rows to be created will be determined by this common delimiter "|"

Mar 26, 2018 in Data Analytics by coldcode
• 1,980 points
27 views

2 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
+1 vote

We would start off by loading the "tidyverse" package. It is a collection of R packages designed specifically for the purpose of data science.

  • Our first task would be to expand the number of rows with "seperate_rows()" function, we will split the 'key_value' column by ';'
  • Then, we will go ahead seperate the 'key_value' column into two individual columns and we will name them as 'key' and 'value'
  • Again we will expand the number of rows by splitting the 'value' column by '|'
  • Then will group by 'city' and 'key' columns
  • Finally we will get the sequence number and 'spread' to wide format
Below is the code to implement all of the above steps:
library(tidyverse)
separate_rows(quest, key_value, sep=";") %>% 
     separate(key_value, into = c("key", "value"), sep="=") %>% 
     separate_rows(value, sep="[|]", convert = TRUE) %>% 
     group_by(city, key) %>% 
     mutate(rn = row_number()) %>% 
     spread(key, value) %>%
     select(-rn)
answered Mar 26, 2018 by Bharani
• 4,550 points
0 votes

You can use this r package called tidyverse. Something like this:

library(tidyverse)
separate_rows(quest, key_value, sep=";") %>% 
     separate(key_value, into = c("key", "value"), sep="=") %>% 
     separate_rows(value, sep="[|]", convert = TRUE) %>% 
     group_by(city, key) %>% 
     mutate(rn = row_number()) %>% 
     spread(key, value) %>%
     select(-rn)
answered Dec 3, 2018 by Kalgi
• 35,750 points

Related Questions In Data Analytics

0 votes
1 answer

How to convert a text mining termDocumentMatrix into excel or csv in R?

By assuming that all the values are ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,700 points
60 views
0 votes
1 answer

How to evaluate expression given as a string in R?

The eval() function evaluates an expression, but "5+5" is a string, ...READ MORE

answered Jun 7, 2018 in Data Analytics by DataKing99
• 8,100 points
55 views
0 votes
1 answer
0 votes
1 answer

Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
36 views
0 votes
1 answer

Left Join and Right Join using "dplyr"

The below is the code to perform ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
64 views
0 votes
1 answer

Plotting multiple graphs on the same page in R

If you want to plot 4 graphs ...READ MORE

answered Mar 27, 2018 in Data Analytics by Bharani
• 4,550 points
29 views
+1 vote
1 answer

Custom Function to replace missing values in a vector with the mean of values

You have missed out on "na.rm=TRUE" inside ...READ MORE

answered Mar 27, 2018 in Data Analytics by Bharani
• 4,550 points
31 views
0 votes
1 answer

Discarding duplicate rows from a data.frame - R

You can use distinct() function along with ...READ MORE

answered May 4, 2018 in Data Analytics by Bharani
• 4,550 points
26 views
0 votes
1 answer

Locating row index of a column which has the maximum value - R

which.max(iris$Sepal.Length) This command will give you the desired ...READ MORE

answered May 16, 2018 in Data Analytics by Bharani
• 4,550 points
45 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.