LFD Book Forum Probability calculation in Q8 and Q10

#1
07-17-2012, 05:24 PM
 nellie Junior Member Join Date: Jul 2012 Posts: 4
Probability calculation in Q8 and Q10

I correctly approximated the P(f(x) != g(x)) by generating a lot of sample points. However, how do you calculate the probability exactly? Thank you in advance for your help!
#2
07-17-2012, 08:22 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: Probability calculation in Q8 and Q10

Quote:
 Originally Posted by nellie I correctly approximated the P(f(x) != g(x)) by generating a lot of sample points. However, how do you calculate the probability exactly? Thank you in advance for your help!
You can compute the total area of misclassification divided by 4 (the total area of the input space). The area of misclassifcation will be that of points for which and give different values. In one of the theory lectures, there will be a figure depicting this area. It is easier to compute it numerically as you did.
__________________
Where everyone thinks alike, no one thinks very much
#3
10-10-2012, 09:19 AM
 ketchers Junior Member Join Date: Oct 2012 Posts: 5
Re: Probability calculation in Q8 and Q10

I would be curious on how to calculate - or estimate - the average number of iterations for convergence - as well as the probability - perhaps these are really the same question in disguise. It is a very simple procedure, but there is enough random elements to it so that it leaves the realm of what I am familiar with calculating:

As I understand it ...

1) choose N random points in the unit square
2) choose a random line

This gives the target

4) run algorithm where in computing w(n+1) a random selection from the mis-classified points is used. (perhaps the random choice here is not-relevant)

A pointer to where a similar calculation is made would be fine. Thanks!
#4
10-10-2012, 09:49 AM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: Probability calculation in Q8 and Q10

Quote:
 Originally Posted by ketchers I would be curious on how to calculate - or estimate - the average number of iterations for convergence - as well as the probability
Calculating these quantities analytically is not tractable. Estimating them using Monte Carlo methods, i.e., by running many random instances of the problem and averaging, is what we are after here. As you point out, there are many sources of randomness and some will result in significant variation. However, repeating the experiment a large number of times will overcome that variance. The numbers given for this problem were chosen to achieve that.
__________________
Where everyone thinks alike, no one thinks very much
#5
10-15-2012, 02:59 PM
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153
Re: Probability calculation in Q8 and Q10

Quote:
 Originally Posted by yaser Calculating these quantities analytically is not tractable. Estimating them using Monte Carlo methods, i.e., by running many random instances of the problem and averaging, is what we are after here. As you point out, there are many sources of randomness and some will result in significant variation. However, repeating the experiment a large number of times will overcome that variance. The numbers given for this problem were chosen to achieve that.
It should be possible to give an order of magnitude estimate, though if that was easy then you wouldn't have to write the program to answer the problem.

It looks a little similar to the relaxation algorithms for solving partial differential equations. In that case, the theory is well studied.

Even without knowing that, you could find the scaling law. How the number of iterations changes, on the average, with the number of points. But once you have the MC program, it is easy enough to change the number of points.
#6
10-15-2012, 03:04 PM
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153
Re: Probability calculation in Q8 and Q10

Quote:
 Originally Posted by yaser You can compute the total area of misclassification divided by 4 (the total area of the input space). The area of misclassifcation will be that of points for which and give different values. In one of the theory lectures, there will be a figure depicting this area. It is easier to compute it numerically as you did.
The area is a sum of two figures that are either triangles or quadrilaterals. You have to figure out the intersection of each of the lines with the region boundary, then remember which is in and which is out, and compute the area of the triangle or quadrilateral. Getting all the different possible cases right is what would take time. (I didn't try actually doing it, just figuring out how I might do it.)
#7
04-06-2013, 07:12 PM
 mvellon Junior Member Join Date: Apr 2013 Posts: 9
Re: Probability calculation in Q8 and Q10

Your value of g, assuming it doesn't exactly match f, will be a line with a slope and intercept close, though not exactly that of f. Imagine then, two intersecting lines. The area in between the lines correspond to values in X that g misclassifies. The ratio between that area and that bounded by the region [-1,-1]x[1,1] (area 4) will be the probability of error. Alas, calculating the area between the lines is ugly. The problem would have been much simpler if the domain for X had been points inside the circle at (0,0) with radius 1.
#8
09-02-2015, 05:06 AM
 henry2015 Member Join Date: Aug 2015 Posts: 31
Re: Probability calculation in Q8 and Q10

Quote:
 Originally Posted by yaser You can compute the total area of misclassification divided by 4 (the total area of the input space). The area of misclassifcation will be that of points for which and give different values. In one of the theory lectures, there will be a figure depicting this area. It is easier to compute it numerically as you did.
Hi, could you elaborate why we need to divide the total area of misclassification by 4?

If the input space is 2D, I can see that there are two triangles formed by f(x) and g(x), and the total area of the two triangles include all the errors. So I thought the P(f(x) != g(x)) = area of error / total area of input space...no?

#9
09-02-2015, 10:22 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: Probability calculation in Q8 and Q10

Quote:
 Originally Posted by henry2015 Hi, could you elaborate why we need to divide the total area of misclassification by 4?
The total area of the input space is 4 and that corresponds to total probability 1, so one needs to normalize areas by 4 in order to get the corresponding probabilities.
__________________
Where everyone thinks alike, no one thinks very much
#10
09-02-2015, 10:48 PM
 henry2015 Member Join Date: Aug 2015 Posts: 31
Re: Probability calculation in Q8 and Q10

Argh, I see.

Missed the boundaries stated in the problem. My bad.

Thanks.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 10:48 AM.