LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 1 - The Learning Problem

Thread Tools Display Modes
Prev Previous Post   Next Post Next
Old 03-20-2019, 10:11 PM
Fromdusktilldawn Fromdusktilldawn is offline
Junior Member
Join Date: Sep 2017
Posts: 5
Default The concept "h is fixed before you generate the data set" is extremely vague

Can someone please explain to me the concept of "h is fixed before you generate the data set" as appears on page 22 of the text?

As it stands, this is an extremely vague statement. What does it mean by "fixed", what does it mean by "generate"?

Here is a typical modern machine learning pipeline for most students.

Find some data somewhere, typically Kaggle (you don't generate it whatsoever, someone else does it for you through unknown means)

Observe the data, get a sense of its dimensionality, number of data. If data is too large, cannot even load into a computer. Therefore parameters associated with this data MUST be known in order to do machine learning.

Based on the data, categorize it into a typical problem. For example, classification, prediction, etc.

Pick a hypothesis h known to do well for the problem. Say SVM. Tune the hypothesis h so that it can at least accept the data. For example, the dimensionality of the weights in the hypothesis is obtained from the dimensionality of the data. Otherwise, a dimension mismatch error will be thrown by MATLAB and no machine learning can be done.

Train your hypothesis h, parameterized by the weights w, until h achieves the lowest in-sample error. Call that the final hypothesis g.

Use final hypothesis g on test set.

In this pipeline, data is not generated, it is given. h is not fixed, it is adjusted based on the data (type of data, dimensionality of data). If we do not know the data at all, we cannot possibly construct a hypothesis. It would be akin to using a low-pass filter for 1D signals when your data is actually a continuous stream of 3D video frames. The data must be given prior to constructing h, and h must be adjusted based on the problem at hand. This is not a "before", it is clearly an "after".

Why does it seem that this typical learning pipeline do not fit into the learning model described in the book? What does it mean by "h is fixed before you generate the data set" in a practical sense?
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 01:38 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.