LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 2

Thread Tools Display Modes
Prev Previous Post   Next Post Next
Old 06-03-2016, 07:29 AM
Nick Torenvliet Nick Torenvliet is offline
Junior Member
Join Date: Apr 2016
Posts: 2
Default Q5 Least Squares Behaviour

wrt Q5

I've written a python script with some matplotlib to visualize and compare the various f and g in the 1000 run simulation.

In terms of process...
1- choose a population N of 100 random points (x1,x2) where x1 and x2 are >-1, <+1
2- solve for f_m and f_b of a line joining another two similarly chosen random points
3- classify points in N as +1 or -1 based on comparison of x2 and f_m*x1+ f_b to get vector of classifications f_y
4- perfom a linear least squares regression with numpy.linalg.lstsq and get g_m and g_b
5- classify points in N as +1 or -1 based on comparison of x2 and g_m*x1+ g_b to get vector of classifications g_y
6- compare f_y and g_y to get E_in
7- repeat step 1-6 1000 times to get average E_in

I am finding that when N cuts f such that there are very many of one class and very few of the other, then g will often miss-classify all of the smaller set in favor of properly classifying all the larger set.

Sometimes g will lie completely outside of the viewing window bounded by +2, -2 all around.

That g might miss-classify all of the smaller set, in these imbalanced cases, I can accept... I think. That g would lie very far away from the box bounded by +1,-1 all around troubles me. Am I right to think something is wrong here?

The error is large enough to lead to the wrong answer for question 5, but only by a hair.

I did a fair amount of debugging, I cannot see any anything other than the sometimes large variance between the f_m/f_b and g_m/g_b that the linear solver spits out when there is a large class imbalance.
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 12:22 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.