Quote:
Originally Posted by Ubermensch
In the digit recognition example, the pixels of the image are split into two features, density and symmetry, to identify the digits. In case of raster graphics, the pixels are the primary unit of which ML could be done and they are uniform.
But in case of vector graphics like SVG, or coordinate systems, the coordinates and paths are explicit or denoted by a mathematical function. How could the image recognition analogy apply to this problem?
Taking the example of a SVG file, I can parse the coordinates and path transformations but how could I put them into matrix form since each SVG file would have different number of coordinates and path transformations (and of course attributes for color,strokes, etc).

Interesting question and I believe it is an ongoing research issue. Of course, we can try to extract fixedlength feature vectors using ideas similar to those used in pixelbased image learning. Or, there are some learning algorithms that do not necessarily require using fixedlength vectors for representing the data, and can thus be possibly applied to the SVG learning case you are describing. For instance,
* Nearest Neighbor models, which generally only require a suitable similarity function.
* Kernelized Support Vector Machines, which requires the instances to map to a "computable" innerproduct space
Some related discussions can be found in the following workshop:
http://www.dsi.unive.it/~icml2010lngs/
Hope this helps.