LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   General Discussion of Machine Learning (http://book.caltech.edu/bookforum/forumdisplay.php?f=105)
-   -   A Modification to the Learning Diagram (http://book.caltech.edu/bookforum/showthread.php?t=384)

DASteines 04-20-2012 05:39 AM

A Modification to the Learning Diagram
 
How does the learning problem change if the training samples are drawn from an indexed set of distributions? That is, suppose our training samples, x and y, are drawn from:

p(x,y,\theta) where \theta = {1,2,...,k}

Suppose I am trying to classify groups of pixels in images. I have 10 images that I can draw groups of pixels from. The images are indexed by theta, with k=10. How do we account for the grouping of the training data? What strategies exist to build a "good" (unbiased) training set in cases like this?

dudefromdayton 04-20-2012 12:04 PM

Re: A Modification to the Learning Diagram
 
There are perhaps additional details I might need to give a correct answer for your situation. But as I understand your problem, I would try to produce a training set that is representative of your images, perhaps sampling from all or from a (ideally) unbiased subset. If your sampling is representative of actual use, your E[in] and E[out] relationships should all hold true.

magdon 04-22-2012 03:20 PM

Re: A Modification to the Learning Diagram
 
This is an interesting example. What you actually describe is a restriction of the paradigm from a general P(x,y) to one that is of the form you mention which arises by mixing 10 different distributions. This additional knowledge about the nature of your problem can inform how to choose your hypothesis set, and one appropriate model is (appropriately) called a mixture model tailored for situations like this.

I did not understand the question about the training data. Typically the training data is given. Or is your task to develop an algorithm to separate the observed 'signal' into the components coming from each image. This is called a source separation problem, and is different from a multi-class problem. In a multi-class problem, each data point belongs to one of the classes and the goal is to determine which.

Quote:

Originally Posted by DASteines (Post 1494)
How does the learning problem change if the training samples are drawn from an indexed set of distributions? That is, suppose our training samples, x and y, are drawn from:

p(x,y,\theta) where \theta = {1,2,...,k}

Suppose I am trying to classify groups of pixels in images. I have 10 images that I can draw groups of pixels from. The images are indexed by theta, with k=10. How do we account for the grouping of the training data? What strategies exist to build a "good" (unbiased) training set in cases like this?



All times are GMT -7. The time now is 03:41 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.