Manual gradient in Stan 2.10

Implementing your own gradient function in Stan.

Here's a summary of how to implement a gradient function manually in Stan 2.10 in order to override autodiff. I wanted to do this because my run-times were long, and I knew I could save some time by hard-coding my gradient function. That being said, autodiff is great, and is almost always fast enough.

Step 1. Create the .stan file which implements the model.

Two reasons to do this:

This will allow you to easily generate a .cpp file that will contain the basic structure you need.
Great for testing to make sure your hacked code gives identical output to what you would get using Stan.

Step 2. Translate the Stan code to C++ code.

Step 3. Code your own gradient function.

I found that it was easiest to write and test my gradient function in R, especially since you may need to be passing in data to do the testing.

Step 4. Modify the C++ file to include your own `log_prob_grad` function.

Define your own log_prob_grad function. This function will return the log probability density lp, and will fill the gradient vector.

template <bool propto, bool jacobian>
    double log_prob_grad(Eigen::VectorXd& params_r,
                         Eigen::VectorXd& gradient,
                         std::ostream* pstream = 0) const {
  ...
  return lp;
}

####Step 5: Specialize, twice

You need specialize the log_prob_grad function twice, once so that it gets called when you sample, and once to call it when you diagnose.

namespace stan {
  namespace model {

    template <bool propto, bool jacobian_adjust_transform>
    double log_prob_grad(const pred_model_namespace::pred_model& model,
                         Eigen::VectorXd& params_r,
                         Eigen::VectorXd& gradient,
                         std::ostream* msgs = 0) {

      double lp = model.template log_prob_grad<propto, jacobian_adjust_transform>(params_r, gradient, msgs);
      return lp;

    }

    template <bool propto, bool jacobian_adjust_transform>
    double log_prob_grad(const pred_model_namespace::pred_model& model,
                         std::vector<double>& params_r,
                         std::vector<int>& params_i,
                         std::vector<double>& gradient,
                         std::ostream* msgs = 0) {

      Eigen::VectorXd evec_params_r(params_r.size());
      Eigen::VectorXd evec_gradient(params_r.size());

      for (int i=0; i<params_r.size(); i++) evec_params_r[i] = params_r[i];
      double lp = model.template log_prob_grad<propto, jacobian_adjust_transform>(evec_params_r, evec_gradient, msgs);

      gradient.resize(params_r.size());
      for (int i=0; i<params_r.size(); i++) gradient[i] = evec_gradient[i];
      return lp;

    }

  }
}

#####Includes Usually you would just need

#include <stan/model/model_header.hpp>

Remove that line and replace it with

#include <iostream>
#include <sstream>
#include <utility>
#include <vector>
#include <typeinfo>

#include <boost/exception/all.hpp>

#include <stan/model/prob_grad.hpp>
#include <stan/prob/distributions.hpp>
#include <stan/math/matrix.hpp>
#include <stan/math.hpp>
#include <stan/io/dump.hpp>
#include <stan/io/reader.hpp>
#include <stan/io/writer.hpp>
#include <stan/io/csv_writer.hpp>

You will also need to add

#include <stan/gm/command.hpp>

right before your call to main (after you specialize your functions, so they are recognized).

#####Stay away from the agrad namespace! You need to avoid calling anything that refers to the agrad namespace, which includes distributions, as well as the functions that constrain your variables. Usually the parameters are stored as duals, but we want to work with doubles. Stan thinks that anything that is a double is a constant. Make use of the code already in the Stan library and modify as needed.

Step 5. Testing!

Test thoroughly.

Step 6. But why isn't it faster...

My first implementations of my log_prob_grad functions are never faster than autodiff. Make it work, then make it fast. Profile, modify, repeat. Once it's as fast as it can be, you may be able to use the OpenMP parallel pragma directive. If you do, make sure you tell Eigen note to parallelize to avoid conflicts (unless you really know what you are doing).

Sample file

I need to make a simple example to demonstrate the above set of steps, but for now, here is an example of a C++ file generated by Stan that I modified to hard-code my gradient function. Just look at the framework, not the bulk of the log_prob_grad function.

Spatio-temporal veg prediction model C++ code

andydawson/gradient.md