Meaning of the variables y1,y2,..yN etc.?
(Note: I sent this question directly to Professor AbuMostafa, and he suggested that I post the question and his reply to the forum)
Question:
>>>
Given the data set
(x1,y1), (x2,y2),....(xN,yN)
where in the case for example of a credit card qualification application the x1 ... xN might mean income, age, debt, etc.
and where Y = h(x1,x2,x3..xN) as Y being a binary function of the vector X to the scalar +1 or 1, with h as a candidate hypotheses,
what do the values y1,y2,y3...yN in the data set stand for, how are they determined, and what is their role in the mapping h as defined above?
<<<
Reply by Professor Yaser S. AbuMostafa:
>>>
The source of confusion is the at boldface x (let's call it xx) stands for a full vector which is the total input (all the information about a particular creditcard applicant), while italic x (let's call it x) stands for a single coordinate (salary or years in residence for example) in the input.
Therefore, xx_1,...,xx_N are different customers, while x_1,...,x_d are coordinates of the same customer. The notation N for number of examples and d for dimensionality of the input space is standard in the course.
With this in mind, y_1,...,y_N are simply the credit behavior of the N different customers (whether each of them was a good or a bad credit customer).
<<<
