Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
323 views
in Technique[技术] by (71.8m points)

python - Even after applying oversampling, classifier is still predicting only 1 class(majority class from before oversampling)

I have a dataset that has 240 entries divided into 4 classes of 60 each. I am performing one vs all classification, so I take one class as class0 and all other 3 as class1. Since the data is imbalanced now, I am using SMOTE to oversample it. But even after oversampling, my classifier is predicting only class1(the majority class from before). Take the dataset as any generic classification dataset. Here is the code:

X = dataset.iloc[:,:-1]
y = numpy.empty(240, dtype = int)
y[:60] = 0
y[60:] = 1

oversample = SMOTE(random_state = 0, sampling_strategy = 0.8)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

X_balanced, y_balanced = oversample.fit_sample(X_train, y_train)
X_balanced = pd.DataFrame(X_balanced,columns=X.columns)
X_train,y_train = X_balanced,y_balanced

svc = SVC(kernel = 'rbf', C = 10, gamma = 3)
svc.fit(X_train, y_train)
pred = svc.predict(X_test)

And here is the value of pred variable

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

As you can see its all '1', which makes the confusion matrix as:

[[ 0 11]
 [ 0 37]]

One more thing to note, if I oversample before splitting the data, the result comes fine but that this a bad way of dealing with the data because we're creating a synthetic test set. Any help on this will be appreciated.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share

2.1m questions

2.1m answers

63 comments

56.7k users

...