Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
660 views
in Technique[技术] by (71.8m points)

columnname - Replacing same part of multiple col names in R

After creating some dummy variables, R creates some unhelpful colnames: they start with ".data_"

a <- as.factor(c("green", "yellow", "blue"))
b <- as.factor(c("blue", "yellow", "green"))

df <- data.frame(a, b)

library(fastDummies)
dummy1 <- dummy_cols(df$a, remove_selected_columns = TRUE)
dummy2 <- dummy_cols(df$b, remove_selected_columns = TRUE)

I need to put the dummys back together in a dataframe, so how do I replace the ".data_" part in each column with the name of the variable it belongs to (e.g. a_blue, a_green, a_yellow for dummy1 and b_blue, b_green, b_yellow for dummy 2)?

I found rename() but I would have to use it for every variable single handedly. Is there a more automated way?

EDIT: After using dummy_cols(), the output is a data frame with as many new variables as you have had categories for that variable before. So a with 3 categories yellow, blue and green becomes a dataframe with 3 columns called .data_blue, .data_green, .data_yellow. Those new variables are binary. Maybe this helps to illustrate what I mean.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The function wants the whole cake at once.

cols <- c("a", "b")
dummy_cols(df[cols], remove_selected_columns=TRUE)
#   a_blue a_green a_yellow b_blue b_green b_yellow
# 1      0       1        0      1       0        0
# 2      0       0        1      0       0        1
# 3      1       0        0      0       1        0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...