I was a little confused at first by what Q3 was asking, so I thought I would describe my reasoning here, in case anybody else had the same question.
So what the question is saying in layman's terms is:
Quote:
You have a hypothesis function, h(x) that you train on a target function, y=f(x), but it makes an error with probability mu (oh well, at least you tried ). Now, after finding h(x), you apply that to some noisy data set (realworld data).
Now, the probability that h(x) makes an error in noiseless data is mu and the probability that it doesn't is 1mu.
In addition, the probability that you make an error, simply due to noise, is 1lambda, with lambda probability that the noise produces no error.
Therefore, since these are binary functions, the probability that you actually make an error when you apply h(x) to a noisy version of f(x) is: "the probability that there is an error due to noise (1lambda), AND no error due to the "deterministic" error (1u) OR the probability that there is no error due to noise (lambda) AND there is a "deterministic" error (mu).
Note, the probability distributions for "mu" and "lambda" are statistically independent (this is the assumption).
Therefore: P_{error} = P_{noise error}*P_{no mu error} + P_{no noise error}*P_{mu error} = (1lambda)*(1mu) + lambda*mu.

For question 4, you can see if you set lambda = 0.5, that P_{error} reduces to 1/2, and mu drops out. Intuitively, what you are saying if lambda = 0.5 is that your noise is so bad, that half the time, you are making errors. Well, in this situation, you don't expect mu to influence the outcome because your data is already uniformly random.
Please feel free to chime in with corrections and comments as necessary.