Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
406 views
in Technique[技术] by (71.8m points)

r - Regression with multiple binary variables?

I'm new to the world of Data Science and I'm trying to develop a little program in R that I want to use to make predictions on fragrances (perfumes). I have created a dataset with all my own perfumes where I have as columns some attributes like the notes of the perfumes like lime, vanilla, iris, etc. All of them are binary variables and for each perfume I dedicated a "Like" continuous value on the range 0-10 personally. How can I make a regression of a continuous variable (Like) by using all of these binary variables. I imagine I have to use as many dummy variables as the notes. But I have some problem in the prediction phase. I fit the model by putting all the variables as factors. And I wanted to test it by predicting "Like" of a 1 new line. But of course this new line will have just some 0 or 1. So it answers me that training and test set have different number of factors level (2 in the train, 1 in the test). How can I solve it?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is a little bit of a guess, but I think what you're looking for is setting all the factor levels in your test set explicitly:

for (x in note_names) { 
     levels(test[[x]]) <- c("no","yes")}
}

While it is generally best practice to use factors rather than dummy variables or integer codes to represent categorical variables in R (this is what they're meant for, and it means you don't have to remember or have a separate code book to know that e.g. 1=male, 2=female), in this case I think you might as well code 'absent' as 0 and 'present' as 1 - this is what any statistical/ML method is going to transform your categorical variable into anyway, and it's unambiguous.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share

2.1m questions

2.1m answers

63 comments

56.5k users

...