
#1




Concepts leading to Q6 and Q8
I would appreciate some clarification with the concepts leading up to Q6 and Q8. I will build it up, as I understand it, starting with a perceptron. Please correct me where I have erred...
A perceptron learns to separate data that is separable. The breakpoint is associated with data that is separable and it represents the value at which the algorithm has no way of shattering (separating) the case. I understood the k=3 for a line (1D) and k=4 for a plane (2D). Positive Rays. (Example 2.2 (p. 43) of the book) With the positive rays there seems to be a similar concept to the 1D perceptron in that the data falls on a line and is separable. To the left of "a" h(x) = 1 and to the right of "a" h(x) = +1. It is clearly separable. The hypotheses are for the location of "a". In the "Positive Intervals" exercise (p. 44 of book, cont. of example 2.2) is where my confusion starts. I understood (from lecture 5) the N+1 regions and the "(N+1) choose 2 + 1" formula. The "choose 2" part refers to the fact that each end point of the interval lands in a region so you need two regions to define your interval. However, I do not see the points as being separable. If they are all on a line, I can not separate with a perceptron the case shown in the book (blue interval in the middle with red regions on either side). I see it as similar to a k = 3 case for case in one dimension. Not separable. Q6 extends the concept of the "positive intervals" example and asks about two intervals (instead of just one interval as in the example). If I can not see the case of just one interval as being linearly separable then I can not see the possibility of being linearly separable with two intervals. Q8 asks to consider M intervals which I can not see either. It is the same visualization issue as with two or one intervals. Q6 and Q8 ask for breakpoints so I guess that my issue is that I am not visualizing properly how a break point can be associated with intervals that lie on a line. I associate this case immediately with a break point of k=3. I hope that my misconception makes sense to you so that you may assist me. Thank you. Juan 
#2




Re: Concepts leading to Q6 and Q8
Just a quick remark to clarify things, and I'll let others discuss other parts. When we look for a break point, the constellations of input points that we consider are not restricted to being separable. In fact, the points are not labelled a priori, so some patterns may make them separable and others may not. The fact that perceptrons and PLA work on separable data does not affect this, as a break point by definition is based on where a model fails to separate the points.
__________________
Where everyone thinks alike, no one thinks very much 
#3




Re: Concepts leading to Q6 and Q8
Professor, thank you for answering. I am starting to see an opening; however, I think that I am mainly confused with the learning objective of the algorithm in examples 2.2, i.e., what exactly is the machine trying to learn? The learning goal of a perceptron algorithm is easy to visualize because the learning that is taking place is easily expressed and recognizable. For example, I could learn how to distinguish a penny from a nickel given a bunch of these coins. I could test different hypotheses on the training set of coins and come up with a super accurate “g” hypothesis based on the size or weight measurements of a training set of coins.
Going back to example 2.2, specifically in the case of the Positive Rays which seems to be the easiest, what is it exactly that we want the machine to learn? We know that we have a constellation of N points that divide a line into N + 1 regions and that we do not know a priori if each point has a value 1 or 1, as you mentioned. Is the objective of learning to determine the location of point “a” so that we can separate the N points into two groups (+1’s and 1’s)? Is this a case similar to the perceptron in which we expect separation to occur? If this is the case, do the hypotheses consist of changing the location of “a” to each of the N+ 1 regions, then at each location of "a" checking each point "x" with the formula sign(xa), and comparing the result of the formula with the value (1 or +1) of each point (supervised learning)? If separation is expected to occur (the goal of learning) but does not happen with the particular set of input points given, have we still learned something? i.e., have we learned that the particular set of input points is not separable? So I guess that I am struggling with trying to understand the most basic issue… what is the machine trying to learn in each of the three examples presented in Example 2.2 of the book (p. 43 and 44)? I think that if I can achieve some insight on this, then I will be able to better visualize the situation. I hope that I am not being too dense here. If I am, please accept my apologies. I am most grateful for this opportunity. I intend to continue working very hard to understand this very captivating field even if I do not perform very well in the quizzes. Thank you. Juan 
#4




Re: Concepts leading to Q6 and Q8
Let's pose this in terms of the coins example. If you are trying to distinguish nickels from pennies, but you represent each coin with only one variable (say the size), positive rays would be a model that tries to learn the 'threshold' size above which you classify the coin as a nickel. Now, let's say you are trying to distinguish nickels from all other coins, and you again represent the coin with just its size. The positive interval model tries to learn the range (lower and upper limit) where you would classify the coin as a nickel as opposed to anything else.
__________________
Where everyone thinks alike, no one thinks very much 
#5




Re: Concepts leading to Q6 and Q8
The "learning" part is about using a particular model to be able to give you a good performance on your sample (read: small ), with the consequence that will also be reduced, based on certain properties and principles explained in the lectures.
Take the example of the convex set. Your points are scattered on a 2D plane, which means we have 2 features (e.g. the diameter and weight of coins, if you want a concrete example). Suppose you want to try to model only a single class of coins (e.g. dimes). You can then bound all the points in the graph that correspond to dimes with a convex set. Say you come up with a triangle; now your model can be described with 3 points  one for each vertex of your triangle. 
#6




Re: Concepts leading to Q6 and Q8
Professor, thank you for relating the coin example to the positive rays and intervals. It clarified what the model is trying to learn.
Ziad, if I understood correctly your example of the convex set, the learning that would take place would be to determine as accurately as possible (Ein = 0) the location of the three vertices of a triangle that only contains dimes within it. If I may I will just create this one extra post so as not to take too much time away from you… I am now better understanding WHAT the “machine” is trying to learn. I still need to clarify HOW it learns as it relates to the learningsetup diagram discussed at the beginning of class. Could we say that positive rays, positive intervals and complex sets represent three distinct and different learning algorithms (or models) that could be used in the learningsetup diagram? If so, then each of these algorithms has its own Hypothesis Set “H”? If this is the case, then what are the possible concrete hypothesis “h” that go into “H” in the coins problem for each of the algorithms? What exactly goes on in the learning process? Thank you. Juan 
#7




Re: Concepts leading to Q6 and Q8
Quote:
I must say that your book is a "must have" as it complements excellently with the lecture videos . I got it in the middle of the 2nd week. I wish I had it since day one. Thank you. Juan 
Thread Tools  
Display Modes  

