I have a dataset that has 240 entries divided into 4 classes of 60 each. I am performing one vs all classification, so I take one class as class0 and all other 3 as class1. Since the data is imbalanced now, I am using SMOTE to oversample it. But even after oversampling, my classifier is predicting only class1(the majority class from before). Take the dataset as any generic classification dataset. Here is the code:
X = dataset.iloc[:,:-1]
y = numpy.empty(240, dtype = int)
y[:60] = 0
y[60:] = 1
oversample = SMOTE(random_state = 0, sampling_strategy = 0.8)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
X_balanced, y_balanced = oversample.fit_sample(X_train, y_train)
X_balanced = pd.DataFrame(X_balanced,columns=X.columns)
X_train,y_train = X_balanced,y_balanced
svc = SVC(kernel = 'rbf', C = 10, gamma = 3)
svc.fit(X_train, y_train)
pred = svc.predict(X_test)
And here is the value of pred variable
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
As you can see its all '1',
which makes the confusion matrix as:
[[ 0 11]
[ 0 37]]
One more thing to note, if I oversample before splitting the data, the result comes fine but that this a bad way of dealing with the data because we're creating a synthetic test set.
Any help on this will be appreciated.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…