LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 8 (http://book.caltech.edu/bookforum/forumdisplay.php?f=137)
-   -   HW 8, A case of data being hopelessly inseparable? (http://book.caltech.edu/bookforum/showthread.php?t=566)

 AqibEjaz 05-27-2012 10:32 AM

HW 8, A case of data being hopelessly inseparable?

I post this question assuming my analysis has been correct. If not then just ignore this path of reasoning.

The data that has been provided for the HW 8 seems to be hopelessly inseparable, particularly for the one-vs-all classification case. Although the accuracy of classification, that I am sure everyone of us is getting, is quite spectacular, but actually it is a spectacle in disguise. Since our training data is heavily skewed, i.e. about 90% of training data comes from one class (say class 'All' in 7-vs-All, where y=-1) and 10 % from the other class (say class '7' in 7-vs-All, where y=1), even a final hypothesis as ridiculous as h(x)=-1 will have 90% accuracy. And this is exactly what I am getting for most of the one-vs-all classifications.

Now this brings me to my actual question and that is how do you approach such a problem where data is so hopelessly inseparable. Perhaps one should look for extracting other kinds of features from the original images. An official comment here would be appreciated.

Thanks.

 yaser 05-27-2012 11:31 AM

Re: HW 8, A case of data being hopelessly inseparable?

Quote:
 Originally Posted by AqibEjaz (Post 2585) Now this brings me to my actual question and that is how do you approach such a problem where data is so hopelessly inseparable.
With a finite resource of training examples, your best bet is often to accept that the data will not be perfectly separated, and find a compromise between a non-zero and a not-so-complex fit that would give good generalization.

 dudefromdayton 05-27-2012 02:01 PM

Re: HW 8, A case of data being hopelessly inseparable?

Or leave the box entirely; that is, find or generate data that you can use. Real life is sometimes friendlier than homework in this respect. Few today would try to classify digits solely on an R^2 reduction of symmetry and intensity. So you'd add something else, and see if that data might be separable.

 All times are GMT -7. The time now is 11:36 PM.