LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 8 (http://book.caltech.edu/bookforum/forumdisplay.php?f=137)
-   -   Q2 - Classifier correctly predicts only a few classes (http://book.caltech.edu/bookforum/showthread.php?t=4666)

galo 04-12-2016 08:41 PM

Q2 - Classifier only correctly predicts a few classes
 
I can't seem to get Q2 right. I'm using the support vector classifier from the sklearn package (svm.SVC) in Python. I've put my parameters to the right values but the Ein (1-recall in the ouput) is way too high for most classes. I don't think using pandas is the reason, but still, I changed the classes to int since pandas was using float64 as a default type.

Code:

import pandas as pd
from sklearn import svm, metrics

train_df = pd.read_csv(
    filepath,
    sep = "[ ]*",
    engine = "python",
    header = None
    )
train_df.columns = ["Digit", "Intensity", "Symmetry"]
train_df["Digit"] = train_df["Digit"].astype(int)

clf = svm.SVC(
    C = 0.01,
    kernel = 'poly',
    degree = 2.0,
    gamma = 1.0,
    coef0 = 1.0
    )

X = train_df.ix[:,(1,2)].values
y = train_df.ix[:,0].values

clf.fit(X,y)

expected = y
predicted = clf.predict(X)

print("Classification report for classifier %s:\n%s\n"
      % (clf, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))

No 5 or 8 are predicted correctly, very few 4 and 6, and a few 3 and 7. This is way too strange.

Can someone show me where I'm doing something wrong?

Output:

Code:

Classification report for classifier SVC(C=0.01, cache_size=200, class_weight=None, coef0=1.0,
decision_function_shape=None, degree=2.0, gamma=1.0, kernel='poly',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False):
            precision    recall  f1-score  support

          0      0.54      0.83      0.65      1194
          1      0.93      0.96      0.95      1005
          2      0.22      0.55      0.31      731
          3      0.27      0.12      0.17      658
          4      0.12      0.02      0.04      652
          5      0.00      0.00      0.00      556
          6      0.09      0.00      0.00      664
          7      0.26      0.16      0.19      645
          8      0.00      0.00      0.00      542
          9      0.21      0.56      0.30      644

avg / total      0.32      0.40      0.33      7291


Confusion matrix:
[[987  41  65  45  2  0  0  5  0  49]
 [ 35 969  0  0  0  0  0  0  0  1]
 [ 65  3 404  48  16  0  0  48  0 147]
 [204  1 165  79  17  0  0  22  0 170]
 [ 79  9 163  11  16  0  1  81  0 292]
 [ 14  1 361  21  17  0  3  51  0  88]
 [ 38  0 282  35  22  0  1  53  0 233]
 [ 22  1 178  5  29  0  2 100  0 308]
 [298  16  89  26  1  0  2  8  0 102]
 [ 83  0 133  24  17  0  2  23  0 362]]



All times are GMT -7. The time now is 02:19 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.