LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 5 - Three Learning Principles (http://book.caltech.edu/bookforum/forumdisplay.php?f=112)

 PCdimension 12-15-2017 08:04 AM

Lets say two reserachers are locked up in 2 separate rooms provided with the same training set. One smart researcher (A) learnt about neural network and SVM, the other B only know about neural network. Lets say the ultimate truth is, neural network is the best model for this particular learning problem and both research A and B submitted the same neural network model.

B happen to have a smaller VC dimension than A as B has a smaller hypothesis test, but both end up choosing the same neural network model as a result.

It looks paradoxical that the less educated researcher B submitted a better model (less VC dimension and requires less number of sample).
______________
Another scenario is that a researcher C had developed a great algorithm to solve a particular learning problem. Later years, countless number of researchers had tried different models but all failed to improve the learning performance. Now the learning problem has grown its VC dimension over time as the total hypothesis space increase. Practically as time pass, the hypothesis will grow to infinity. These all sound paradoxical.

How can we charge for the VC dimension accordingly?

 Burine 12-20-2017 11:00 AM

Quote:
 Originally Posted by PCdimension (Post 12881) ...both research A and B submitted the same neural network model. It looks paradoxical that the less educated researcher B submitted a better model (less VC dimension and requires less number of sample).
B's model cannot be better, because A and B used the same model.

If you mean B happened to choose a simpler network than A did (e.g. less layers), then since A is more educated clearly he would know how to put weights regularization, dropout,... to avoid overfitting.
______________
Quote:
 Originally Posted by PCdimension (Post 12881) Later years, countless number of researchers had tried different models but all failed to improve the learning performance.
On which ground could we assume that countless other models failed while C's model is sub-optimal?

 Yew Lee 01-25-2019 06:46 AM