LFD Book Forum Q5 Least Squares Behaviour
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
06-03-2016, 07:29 AM
 Nick Torenvliet Junior Member Join Date: Apr 2016 Posts: 2
Q5 Least Squares Behaviour

wrt Q5

I've written a python script with some matplotlib to visualize and compare the various f and g in the 1000 run simulation.

In terms of process...
1- choose a population N of 100 random points (x1,x2) where x1 and x2 are >-1, <+1
2- solve for f_m and f_b of a line joining another two similarly chosen random points
3- classify points in N as +1 or -1 based on comparison of x2 and f_m*x1+ f_b to get vector of classifications f_y
4- perfom a linear least squares regression with numpy.linalg.lstsq and get g_m and g_b
5- classify points in N as +1 or -1 based on comparison of x2 and g_m*x1+ g_b to get vector of classifications g_y
6- compare f_y and g_y to get E_in
7- repeat step 1-6 1000 times to get average E_in

I am finding that when N cuts f such that there are very many of one class and very few of the other, then g will often miss-classify all of the smaller set in favor of properly classifying all the larger set.

Sometimes g will lie completely outside of the viewing window bounded by +2, -2 all around.

That g might miss-classify all of the smaller set, in these imbalanced cases, I can accept... I think. That g would lie very far away from the box bounded by +1,-1 all around troubles me. Am I right to think something is wrong here?

The error is large enough to lead to the wrong answer for question 5, but only by a hair.

I did a fair amount of debugging, I cannot see any anything other than the sometimes large variance between the f_m/f_b and g_m/g_b that the linear solver spits out when there is a large class imbalance.
#2
06-13-2016, 04:52 PM
 Nick Torenvliet Junior Member Join Date: Apr 2016 Posts: 2
Re: Q5 Least Squares Behaviour

So... haha... systematic error.

Just as another student (sandeep) was, I was getting an average E_in on 1000 trials of 100 of ~ 0.13

I believe this is indicative not accounting for the case where the slope of the linear regression solution g is opposite sign of that of the target function f.

In a naive approach to classification... you will get 100% error in that case. The case seems to occur predictably enough to bias the correct answer you might get to ~0.13.

Oddly enough... I am very satisfied with that... at least it confirms laws of large numbers.

Everything does though...

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 10:28 AM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.