Re: question 2
Ok. Great Question. Let's look at this from a point of view of what we already know. i.e the three points in the 'essence of machine learning'.
Problem : Optimal Cycle for Traffic Lights
1. A pattern exists: If you think about it intuitively, there should be some correlation (speaking loosely) between the traffic at a light on the intersection with the amount of time that a car spends in that light. Right? If there are a lot of cars, you'd expect that the light is green for a far greater duration than a light in which there are only one or two cars.
2. We cannot pin in down mathematically: You could, with a great amount of difficulty try and pin in down mathematically, but heck, that's what learning's for right? You make the machine do the dirty work and you collect the check at the end of the day.
3. We have data on it:
It's quite evident that collecting data for this kind of problem isn't too much of a big deal. You could put a bunch of cameras and have some grad students or interns write some computer vision program to see how many cars there are at a light and measure the duration of time that that light is green and so on (this is just one example of how you might go about collecting data  it's eventually up to you).
Problem: Calculating how much time a body takes to fall to the ground
1. A pattern exists :
Newtons laws of motions and all that have showed us that a pattern does indeed exist and that..
2. It can be described mathematically:
Since there is a closed form solution to your problem, wouldn't you rather use that? Learning from data inherently has, associated with it, a generalization error. There are so many things that could possibly go wrong  you might choose the wrong hypothesis set (you might think that the relationship between the time of flight and the height of the object from the ground is linear when it is actually quadratic). Do you see what I am saying? Why would you want to settle for an approximate solution when an exact solution is well within your reach?
