Question 2  Method of averaging
Our learning algorithms return a vector of Ws, but our hypotheses are usually functions of those Ws.
Just to clarify, then, if we are averaging hypotheses, say, h1 and h2, our average hypothesis is (h1 + h2) / 2, and not (w1 + w2) / 2 plugged into our hypothesis form of h, even if, say, our h's were of some arbitrary form, say,
h(x) = exp(wT . x) / (1 + exp(wT . x))
