![]() |
#1
|
|||
|
|||
![]()
This is a fairly open ended question so Pointers to literature on this would be great.
Directly attacking Hw8 1 - 3 with Matlab ground my computer to a halt however it is easy to sample the data and then average the models (although exactly what to call the support vectors in this case is unclear). I'm writing this before comparing my results with what libsvm would give so perhaps it is premature. My questions is: How does this (training/validating on samples and averaging) affect the underlying math, generalization, Ein, Eout, etc. For example one might train on randomly selected "mini batches" and average the outcomes. One might randomly select validation sets rather than partitioning the data and iterate training on a random training set while validating on a random validation set, etc. Would such a technique work? If so should sampling use replacement or does it matter? How is the math affected, i.e., error, tradeoffs(bias/variance, VC dim, etc), overfitting, etc. If you were doing this for svms, would you take all svs in your samples for the set of svs, or would you rerun using the svs found along the way as your traing data and then take the svs that were svs for that run? I assume this has been well thought out since it must occur that data sets are sometimes too large to analyze at once - certainly on generic computers anyway. |
#2
|
||||
|
||||
![]()
Subsampled aggregation, such as averaging the hypotheses obtained from training on subsampled data, is practically a very popular approach in large-scale data mining.
For instance, this is an earlier work that couple SVMs.with sabsampled aggregation. www.springerlink.com/index/0mj1u4f6ph8e07jk.pdf I don't follow the topic closely, but from my observation it continues to be an interesting direction with the increase of massive data. Hope this helps.
__________________
When one teaches, two learn. |
#3
|
|||
|
|||
![]()
Thanks for the pointer.
|
![]() |
Thread Tools | |
Display Modes | |
|
|