C++ Finite difference differentiation - design

379 Views Asked by At

let A be:

class A {
  std::vector<double> values_;
public:
  A(const std::vector<double> &values) : values_(values){};

  void bumpAt(std::size_t i, const double &df) {
    values_[i] += df;

  virtual method1();
  virtual method2();
  ...
}

class B : public A {
  overrides methods
  ...
}

for the sake of simplicity consider the function:

double foo(input1, input2, ..., const A &a, const B &b, inputK, ...) {
 /* do complex stuff */
 return ...;
}

we want to differentiate foo() with respect to its arguments. Therefore the first order sensitivity d foo/d a is a std::vector<double> with size equal to a.size(). Same reasoning goes for d foo/d b.

A naive implementation would go as follows:


// compute d foo/d a 
std::vector<double> computeDfDa(input1, input2, ..., const A &a, const B &b, inputK, ..., double da = 1.0){
  std::vector<double> dfda = {};

  auto aUp = a.copy();
  auto aDown = a.copy();
  for (auto i = 0; i < a.size(); ++i) {

      // bump up
      aUp.bumpAt(i, da);
      // bump down
      aDown.bumpAt(i, -da);

      auto up = foo(input1, input2, ..., aUp, b, inputK, ...);
      auto down = foo(input1, input2, ..., aDown, b, inputK, ...);

      auto derivative = (up - down) / 2.0 / da;
      dfda.pushback(derivative);

      // revert bumps
      aUp.bumpAt(i, -da);
      aDown.bumpAt(i, da);
  }
   return dfda;
}

// compute d foo/d b 
std::vector<double> computeDfDb(input1, input2, ..., const A &a, const B &b, inputK, ..., double db = 0.01){
  std::vector<double> dfdb = {};

  auto bUp = b.copy();
  auto bDown = b.copy();
  for (auto i = 0; i < a.size(); ++i) {

      // bump up
      bUp.bumpAt(i, db);
      // bump down
      bDown.bumpAt(i, -db);

      auto up = foo(input1, input2, ..., a, bUp, inputK, ...);
      auto down = foo(input1, input2, ..., a, bDown, inputK, ...);

      auto derivative = (up - down) / 2.0 / db;
      dfdb.pushback(derivative);

      // revert bumps
      bUp.bumpAt(i, -db);
      bDown.bumpAt(i, db);
  }
   return dfdb;
}

This works well however we have basically the same code for computeDfDa() and for computeDfDb().

Is there any design pattern that would allow to have a unique (maybe templated) function that would understand automatically which input to bump?

Please note the position of a and b in the inputs is not commutative.

If the complexity and the number of inputs of foo() are much greater the naive solution would generate a lot of useless code as we'd have to write a computeDfDx() function for every input x of foo().

2

There are 2 best solutions below

3
Louis Go On

Since compute order is the same but iteration loop through different containers, you may refactor this function.

std::vector<double> computeLoop( std::vector<double> &v, std::vector<double> const &computeArg1, std::vector<double> const &computeArg2, double d = 1.0 )
{
  std::vector<double> dfd = {};
  for (auto i = 0; i < v.size(); ++i) {
      // bump up
      v[i] += d;
      auto up = compute(computeArg1, computeArg2);
      v[i] -= d;

      // bump down
      v[i] -= d;
      auto down = compute(computeArg1, computeArg2);
      v[i] += d;

      auto derivative = (up - down) / 2.0 / d;
      dfd.pushback(derivative);
  }
}

Actual call.

auto dfda = computeLoop( a, a, b );
auto dfdb = computeLoop( b, a, b );

Let v pass by reference but it might cause maintenance problem. Because v might be the same reference as computeArg1 or computeArg2, however in computeLoop this is not obvious. Someone might unconsciously break the code in the future.

1
BitTickler On

IMHO, the problem is, that there is a shifting of the level of abstraction.

Those classes A, B are either ...

  1. just vectors in disguise
  2. not vectors but something else entirely.

In case (1), it should be possible to rewrite to a form similar to this:

#include <... the usual suspects ... >

using VecF64 = std::vector<double>;
using VecVecF64 = std::vector<VecF64>;

// foo, dropping allusions of encapsulations. It KNOWS those A, B are vectors.
// Hence we can write it without knowledge of A, B types.
double foo(const VecVecF64& args) {
  return <result-of-complicated-computations>;
}

// With that, it is also easier to write the differentiation function
VecVecF64 fooGradients( const VecVecF64& args, double delta = 0.01) {
  VecVecF64 result;
  result.reserve(args.size());
  // for each arg vector in args, do your thing.
  // .. 
  return result;
}

In case (2), if you are all encapsulated and secretive about the nature of A, B, how do you know the number of doubles, foo can use for its computation? Which leads to the question of the size of your gradient vector for A and renders that bump idea impossible to realize.

My guess is, you have a case 1 kind of problem.