Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
751 views
in Technique[技术] by (71.8m points)

dplyr - r data transform separate columns

I have a dataset with two columns three columns. The third column has date value mixed with some strings.

 ID     Col1        Value
 123    Start.Date  2011-06-18
 123    Stem        A1
 123    Stem_1      A6
 123    Stem_2      NA
 321    Start.Date  2014-08-05
 321    Stem        C1
 321    Stem_1      C4
 321    Stem_2      NA
 677    Start.Date  NA
 677    Stem        NA
 677    Stem_1      NA
 677    Stem_2      NA

How can I separate out the dates and store them in a different column like this ?

 ID     Col1        Value       Start.Date
 123    Stem        A1          2011-06-18 
 123    Stem_1      A6          2011-06-18
 123    Stem_2      NA          2011-06-18 
 321    Stem        C1          2014-08-05
 321    Stem_1      C4          2014-08-05
 321    Stem_2      NA          2014-08-05
 677    Stem        NA          NA
 677    Stem_1      NA          NA
 677    Stem_2      NA          NA

Thanks.

question from:https://stackoverflow.com/questions/65911890/r-data-transform-separate-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Create a new column in the data which has value from Value column wehre Col1 = 'Start.Date' or NA otherwise. For each ID we can fill the NA value from the previous dates and remove the rows with 'Start.Date'.

library(dplyr)
library(tidyr)

df %>%
  mutate(Start.Date = as.Date(replace(Value, Col1 != 'Start.Date', NA))) %>%
  group_by(ID) %>%
  fill(Start.Date) %>%
  ungroup() %>%
  filter(Col1 != 'Start.Date')

#    ID Col1   Value Start.Date
#  <int> <chr>  <chr> <date>    
#1   123 Stem   A1    2011-06-18
#2   123 Stem_1 A6    2011-06-18
#3   123 Stem_2 NA    2011-06-18
#4   321 Stem   C1    2014-08-05
#5   321 Stem_1 C4    2014-08-05
#6   321 Stem_2 NA    2014-08-05
#7   677 Stem   NA    NA        
#8   677 Stem_1 NA    NA        
#9   677 Stem_2 NA    NA        

data

df <- structure(list(ID = c(123L, 123L, 123L, 123L, 321L, 321L, 321L, 
321L, 677L, 677L, 677L, 677L), Col1 = c("Start.Date", "Stem", 
"Stem_1", "Stem_2", "Start.Date", "Stem", "Stem_1", "Stem_2", 
"Start.Date", "Stem", "Stem_1", "Stem_2"), Value = c("2011-06-18", 
"A1", "A6", NA, "2014-08-05", "C1", "C4", NA, NA, NA, NA, NA)), 
class = "data.frame", row.names = c(NA, -12L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...