LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 8

Reply
 
Thread Tools Display Modes
  #1  
Old 04-12-2016, 09:41 PM
galo galo is offline
Junior Member
 
Join Date: Jan 2016
Posts: 7
Exclamation Q2 - Classifier only correctly predicts a few classes

I can't seem to get Q2 right. I'm using the support vector classifier from the sklearn package (svm.SVC) in Python. I've put my parameters to the right values but the Ein (1-recall in the ouput) is way too high for most classes. I don't think using pandas is the reason, but still, I changed the classes to int since pandas was using float64 as a default type.

Code:
import pandas as pd
from sklearn import svm, metrics

train_df = pd.read_csv(
    filepath,
    sep = "[ ]*",
    engine = "python",
    header = None
    )
train_df.columns = ["Digit", "Intensity", "Symmetry"]
train_df["Digit"] = train_df["Digit"].astype(int)

clf = svm.SVC(
    C = 0.01,
    kernel = 'poly',
    degree = 2.0,
    gamma = 1.0,
    coef0 = 1.0
    )

X = train_df.ix[:,(1,2)].values
y = train_df.ix[:,0].values

clf.fit(X,y)

expected = y
predicted = clf.predict(X)

print("Classification report for classifier %s:\n%s\n"
      % (clf, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))
No 5 or 8 are predicted correctly, very few 4 and 6, and a few 3 and 7. This is way too strange.

Can someone show me where I'm doing something wrong?

Output:

Code:
Classification report for classifier SVC(C=0.01, cache_size=200, class_weight=None, coef0=1.0,
decision_function_shape=None, degree=2.0, gamma=1.0, kernel='poly',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False):
             precision    recall  f1-score   support

          0       0.54      0.83      0.65      1194
          1       0.93      0.96      0.95      1005
          2       0.22      0.55      0.31       731
          3       0.27      0.12      0.17       658
          4       0.12      0.02      0.04       652
          5       0.00      0.00      0.00       556
          6       0.09      0.00      0.00       664
          7       0.26      0.16      0.19       645
          8       0.00      0.00      0.00       542
          9       0.21      0.56      0.30       644

avg / total       0.32      0.40      0.33      7291


Confusion matrix:
[[987  41  65  45   2   0   0   5   0  49]
 [ 35 969   0   0   0   0   0   0   0   1]
 [ 65   3 404  48  16   0   0  48   0 147]
 [204   1 165  79  17   0   0  22   0 170]
 [ 79   9 163  11  16   0   1  81   0 292]
 [ 14   1 361  21  17   0   3  51   0  88]
 [ 38   0 282  35  22   0   1  53   0 233]
 [ 22   1 178   5  29   0   2 100   0 308]
 [298  16  89  26   1   0   2   8   0 102]
 [ 83   0 133  24  17   0   2  23   0 362]]

Last edited by galo; 04-13-2016 at 12:13 AM. Reason: title
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 03:10 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.