I also obtain 1.42 with the formula you mention which is based on deriving WRT "a" and equalling to zero to minimize. I also calculated 1.4 by finding the "a" for each value of x1 and x2 and then selecting the "a" associated with the smaller mean squared error.

The answer (a = 1.4), which is higher than the case discussed in class, makes sense to me because the line is forced to pass through zero. It can only rotate about the origin. It removes the translational degree of freedom that would otherwise be available when the intercept is included (y = ax + b). In that latter case any two points are used to create the line (the case discussed in class), and there is a high probability of obtaining many more cases with slope close to zero so I would expect "a" to be lower when using y=ax + b (after calculating the average of many cases). It also makes sense that b tends to zero.

Supposing 1.42 is the correct value, should it be [d] or [e]?

To "d", or not to "d", that is the question