Here's a summary of how to implement a gradient function manually in Stan 2.10 in order to override autodiff. I wanted to do this because my run-times were long, and I knew I could save some time by hard-coding my gradient function. That being said, autodiff is great, and is almost always fast enough.
Two reasons to do this:
- This will allow you to easily generate a .cpp file that will contain the basic structure you need.
- Great for testing to make sure your hacked code gives identical output to what you would get using Stan.
I found that it was easiest to write and test my gradient function in R, especially since you may need to be passing in data to do the testing.
Define your own log_prob_grad
function. This function will return the log probability density lp, and will fill the gradient vector.
template <bool propto, bool jacobian>
double log_prob_grad(Eigen::VectorXd& params_r,
Eigen::VectorXd& gradient,
std::ostream* pstream = 0) const {
...
return lp;
}
####Step 5: Specialize, twice
You need specialize the log_prob_grad function twice, once so that it gets called when you sample, and once to call it when you diagnose.
namespace stan {
namespace model {
template <bool propto, bool jacobian_adjust_transform>
double log_prob_grad(const pred_model_namespace::pred_model& model,
Eigen::VectorXd& params_r,
Eigen::VectorXd& gradient,
std::ostream* msgs = 0) {
double lp = model.template log_prob_grad<propto, jacobian_adjust_transform>(params_r, gradient, msgs);
return lp;
}
template <bool propto, bool jacobian_adjust_transform>
double log_prob_grad(const pred_model_namespace::pred_model& model,
std::vector<double>& params_r,
std::vector<int>& params_i,
std::vector<double>& gradient,
std::ostream* msgs = 0) {
Eigen::VectorXd evec_params_r(params_r.size());
Eigen::VectorXd evec_gradient(params_r.size());
for (int i=0; i<params_r.size(); i++) evec_params_r[i] = params_r[i];
double lp = model.template log_prob_grad<propto, jacobian_adjust_transform>(evec_params_r, evec_gradient, msgs);
gradient.resize(params_r.size());
for (int i=0; i<params_r.size(); i++) gradient[i] = evec_gradient[i];
return lp;
}
}
}
#####Includes Usually you would just need
#include <stan/model/model_header.hpp>
Remove that line and replace it with
#include <iostream>
#include <sstream>
#include <utility>
#include <vector>
#include <typeinfo>
#include <boost/exception/all.hpp>
#include <stan/model/prob_grad.hpp>
#include <stan/prob/distributions.hpp>
#include <stan/math/matrix.hpp>
#include <stan/math.hpp>
#include <stan/io/dump.hpp>
#include <stan/io/reader.hpp>
#include <stan/io/writer.hpp>
#include <stan/io/csv_writer.hpp>
You will also need to add
#include <stan/gm/command.hpp>
right before your call to main (after you specialize your functions, so they are recognized).
#####Stay away from the agrad
namespace!
You need to avoid calling anything that refers to the agrad
namespace, which includes distributions, as well as the functions that constrain your variables. Usually the parameters are stored as duals, but we want to work with doubles. Stan thinks that anything that is a double is a constant. Make use of the code already in the Stan library and modify as needed.
Test thoroughly.
My first implementations of my log_prob_grad
functions are never faster than autodiff. Make it work, then make it fast. Profile, modify, repeat. Once it's as fast as it can be, you may be able to use the OpenMP parallel pragma directive. If you do, make sure you tell Eigen note to parallelize to avoid conflicts (unless you really know what you are doing).
I need to make a simple example to demonstrate the above set of steps, but for now, here is an example of a C++ file generated by Stan that I modified to hard-code my gradient function. Just look at the framework, not the bulk of the log_prob_grad function.