Transforming a key value string into distinct rows in R

Question

I am working with a data-set which has key-value strings.

The data-set looks like this:

quest<-data.frame(city=c("Atlanta","New York","Atlanta","Tampa"), key_value=c("rev=63;qty=1;zip=45987","rev=10.60|34;qty=1|2;zip=12686|12694","rev=12;qty=1;zip=74268","rev=3|24|8;qty=1|6|3;zip=33684|36842|30254"))

The data-set basically looks like this:

   city                                  key_value
1  Atlanta                     rev=63;qty=1;zip=45987
2 New York       rev=10.60|34;qty=1|2;zip=12686|12694
3  Atlanta                     rev=12;qty=1;zip=74268
4    Tampa rev=3|24|8;qty=1|6|3;zip=33684|36842|30254

Now, I would want to rearrange the above data-set into distinct rows, so the final data-set should look like this:

      city  rev qty   zip
1  Atlanta 63.0   1 45987
2 New York 10.6   1 12686
3 New York 34.0   2 12686
4  Atlanta 12.0   1 74268
5    Tampa  3.0   1 33684
6    Tampa 24.0   6 33684
7    Tampa  8.0   3 33684

The number of rows to be created will be determined by this common delimiter "|"

Bharani · Answer 1 · Mar 26, 2018

We would start off by loading the "tidyverse" package. It is a collection of R packages designed specifically for the purpose of data science.

Our first task would be to expand the number of rows with "seperate_rows()" function, we will split the 'key_value' column by ';'
Then, we will go ahead seperate the 'key_value' column into two individual columns and we will name them as 'key' and 'value'
Again we will expand the number of rows by splitting the 'value' column by '|'
Then will group by 'city' and 'key' columns
Finally we will get the sequence number and 'spread' to wide format

Below is the code to implement all of the above steps:

library(tidyverse)
separate_rows(quest, key_value, sep=";") %>% 
     separate(key_value, into = c("key", "value"), sep="=") %>% 
     separate_rows(value, sep="[|]", convert = TRUE) %>% 
     group_by(city, key) %>% 
     mutate(rn = row_number()) %>% 
     spread(key, value) %>%
     select(-rn)

answered Mar 26, 2018 by Bharani
• 4,660 points

Kalgi · Answer 2 · Dec 4, 2018

You can use this r package called tidyverse. Something like this:

library(tidyverse)
separate_rows(quest, key_value, sep=";") %>% 
     separate(key_value, into = c("key", "value"), sep="=") %>% 
     separate_rows(value, sep="[|]", convert = TRUE) %>% 
     group_by(city, key) %>% 
     mutate(rn = row_number()) %>% 
     spread(key, value) %>%
     select(-rn)