Quote:
Originally Posted by dvs79
yes, we don't need to:
1. compute delta for the output (because it doesn't need any of the operations, counted as operation in this certain task)
2. compute deltas for constants (because they're constants)
3. compute deltas for input (because they're just features (x), and delta is a derivative of the error with respect to s)
So for computing deltas you only need 3 operations.

Thank you! Your answer preempted one of my questions. Specifically about the calculation of the final layer delta, as technically none of it seems to count as an operation under our definition.