LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 1 (http://book.caltech.edu/bookforum/forumdisplay.php?f=130)
-   -   Probability calculation in Q8 and Q10 (http://book.caltech.edu/bookforum/showthread.php?t=876)

nellie 07-17-2012 04:24 PM

Probability calculation in Q8 and Q10
 
I correctly approximated the P(f(x) != g(x)) by generating a lot of sample points. However, how do you calculate the probability exactly? Thank you in advance for your help!

yaser 07-17-2012 07:22 PM

Re: Probability calculation in Q8 and Q10
 
Quote:

Originally Posted by nellie (Post 3485)
I correctly approximated the P(f(x) != g(x)) by generating a lot of sample points. However, how do you calculate the probability exactly? Thank you in advance for your help!

You can compute the total area of misclassification divided by 4 (the total area of the input space). The area of misclassifcation will be that of points for which f and g give different values. In one of the theory lectures, there will be a figure depicting this area. It is easier to compute it numerically as you did.

ketchers 10-10-2012 08:19 AM

Re: Probability calculation in Q8 and Q10
 
I would be curious on how to calculate - or estimate - the average number of iterations for convergence - as well as the probability - perhaps these are really the same question in disguise. It is a very simple procedure, but there is enough random elements to it so that it leaves the realm of what I am familiar with calculating:

As I understand it ...

1) choose N random points in the unit square
2) choose a random line

This gives the target

3) start with w(0) = [0;0]
4) run algorithm where in computing w(n+1) a random selection from the mis-classified points is used. (perhaps the random choice here is not-relevant)


A pointer to where a similar calculation is made would be fine. Thanks!

yaser 10-10-2012 08:49 AM

Re: Probability calculation in Q8 and Q10
 
Quote:

Originally Posted by ketchers (Post 6250)
I would be curious on how to calculate - or estimate - the average number of iterations for convergence - as well as the probability

Calculating these quantities analytically is not tractable. Estimating them using Monte Carlo methods, i.e., by running many random instances of the problem and averaging, is what we are after here. As you point out, there are many sources of randomness and some will result in significant variation. However, repeating the experiment a large number of times will overcome that variance. The numbers given for this problem were chosen to achieve that.

gah44 10-15-2012 01:59 PM

Re: Probability calculation in Q8 and Q10
 
Quote:

Originally Posted by yaser (Post 6251)
Calculating these quantities analytically is not tractable. Estimating them using Monte Carlo methods, i.e., by running many random instances of the problem and averaging, is what we are after here. As you point out, there are many sources of randomness and some will result in significant variation. However, repeating the experiment a large number of times will overcome that variance. The numbers given for this problem were chosen to achieve that.

It should be possible to give an order of magnitude estimate, though if that was easy then you wouldn't have to write the program to answer the problem.

It looks a little similar to the relaxation algorithms for solving partial differential equations. In that case, the theory is well studied.

Even without knowing that, you could find the scaling law. How the number of iterations changes, on the average, with the number of points. But once you have the MC program, it is easy enough to change the number of points.

gah44 10-15-2012 02:04 PM

Re: Probability calculation in Q8 and Q10
 
Quote:

Originally Posted by yaser (Post 3490)
You can compute the total area of misclassification divided by 4 (the total area of the input space). The area of misclassifcation will be that of points for which f and g give different values. In one of the theory lectures, there will be a figure depicting this area. It is easier to compute it numerically as you did.

The area is a sum of two figures that are either triangles or quadrilaterals. You have to figure out the intersection of each of the lines with the region boundary, then remember which is in and which is out, and compute the area of the triangle or quadrilateral. Getting all the different possible cases right is what would take time. (I didn't try actually doing it, just figuring out how I might do it.)

mvellon 04-06-2013 06:12 PM

Re: Probability calculation in Q8 and Q10
 
Your value of g, assuming it doesn't exactly match f, will be a line with a slope and intercept close, though not exactly that of f. Imagine then, two intersecting lines. The area in between the lines correspond to values in X that g misclassifies. The ratio between that area and that bounded by the region [-1,-1]x[1,1] (area 4) will be the probability of error. Alas, calculating the area between the lines is ugly. The problem would have been much simpler if the domain for X had been points inside the circle at (0,0) with radius 1.

henry2015 09-02-2015 04:06 AM

Re: Probability calculation in Q8 and Q10
 
Quote:

Originally Posted by yaser (Post 3490)
You can compute the total area of misclassification divided by 4 (the total area of the input space). The area of misclassifcation will be that of points for which f and g give different values. In one of the theory lectures, there will be a figure depicting this area. It is easier to compute it numerically as you did.

Hi, could you elaborate why we need to divide the total area of misclassification by 4?

If the input space is 2D, I can see that there are two triangles formed by f(x) and g(x), and the total area of the two triangles include all the errors. So I thought the P(f(x) != g(x)) = area of error / total area of input space...no?

Thanks in advance.

yaser 09-02-2015 09:22 PM

Re: Probability calculation in Q8 and Q10
 
Quote:

Originally Posted by henry2015 (Post 12020)
Hi, could you elaborate why we need to divide the total area of misclassification by 4?

The total area of the input space is 4 and that corresponds to total probability 1, so one needs to normalize areas by 4 in order to get the corresponding probabilities.

henry2015 09-02-2015 09:48 PM

Re: Probability calculation in Q8 and Q10
 
Argh, I see.

Missed the boundaries stated in the problem. My bad.

Thanks.


All times are GMT -7. The time now is 06:39 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.