I am interested in implementing a somewhat complex custom tensorflow operation. Let's say (for the purpose of this question) that the operation is similar to performing convolution with stride=2, dilation=2, and padding. Now, to use this op during the training loop, I also have to implement a gradient op.
But the problem is: I do not know how to represent the gradient for this op as a closed form formula. And I can imagine that this would be an issue for most non-trivial custom operations, because calculating the gradient is a non-trivial task (atleast for me).
Is it possible to implement the gradient op by using the fundamental definition of partial derivatives?
f'(x) = [f(x+dx) - f(x)]/f(x)
{for every input parameter x to the custom tensorflow operation)
Seems that this is a more generic approach (compared to the calculation of gradient using the closed form formula).
I am wondering why this sort of an implementation is not available on the internet anywhere? The possible reasons I can think of are:
It might be computationally expensive to calculate the partial derivative for each input variable to the custom op.
There any numerical convergence issues that could occur in this approach?
- Any insights would be helpful!