LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 8

Reply
 
Thread Tools Display Modes
  #1  
Old 11-28-2012, 10:43 AM
ketchers ketchers is offline
Junior Member
 
Join Date: Oct 2012
Posts: 5
Default Sampling and averaging for training/validation?

This is a fairly open ended question so Pointers to literature on this would be great.

Directly attacking Hw8 1 - 3 with Matlab ground my computer to a halt however it is easy to sample the data and then average the models (although exactly what to call the support vectors in this case is unclear). I'm writing this before comparing my results with what libsvm would give so perhaps it is premature.

My questions is:

How does this (training/validating on samples and averaging) affect the underlying math, generalization, Ein, Eout, etc. For example one might train on randomly selected "mini batches" and average the outcomes. One might randomly select validation sets rather than partitioning the data and iterate training on a random training set while validating on a random validation set, etc. Would such a technique work? If so should sampling use replacement or does it matter? How is the math affected, i.e., error, tradeoffs(bias/variance, VC dim, etc), overfitting, etc.

If you were doing this for svms, would you take all svs in your samples for the set of svs, or would you rerun using the svs found along the way as your traing data and then take the svs that were svs for that run?


I assume this has been well thought out since it must occur that data sets are sometimes too large to analyze at once - certainly on generic computers anyway.
Reply With Quote
  #2  
Old 11-28-2012, 02:39 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Sampling and averaging for training/validation?

Subsampled aggregation, such as averaging the hypotheses obtained from training on subsampled data, is practically a very popular approach in large-scale data mining.

For instance, this is an earlier work that couple SVMs.with sabsampled aggregation.

www.springerlink.com/index/0mj1u4f6ph8e07jk.pdf

I don't follow the topic closely, but from my observation it continues to be an interesting direction with the increase of massive data.

Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #3  
Old 11-28-2012, 05:07 PM
ketchers ketchers is offline
Junior Member
 
Join Date: Oct 2012
Posts: 5
Default Re: Sampling and averaging for training/validation?

Thanks for the pointer.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:10 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.