Got it. I guess the learning algorithm cares most about the number of parameters, and forcing normalization would only reduce that by two.
Point of learning is to
not have to guess the target function or even its form, but it's hard to resist micromanaging the process
Thanks!