I understood why the answer to Q10 is as given.
However, I'm a little confused as the given function is not differentiable and we wont be able to use it for SGD. Can this be overcome, as we did in logistic regression by using the properties of the sigmoid function?

The error function can be softened into a smooth version like the sigmoid was used to soften a hard threshold. One can also keep the functional form and use the right or left derivative here as a fix.