Skip to content

Instantly share code, notes, and snippets.

@harrism
Last active December 1, 2021 11:46
Show Gist options
  • Save harrism/118609d98e0ed185846ce8688802ecab to your computer and use it in GitHub Desktop.
Save harrism/118609d98e0ed185846ce8688802ecab to your computer and use it in GitHub Desktop.
Using Thrust for the Taylor series approximation of 1 / (1-x). Demonstrates evaluating recurrences in parallel with a scan using Thrust running on CUDA.

Evaluating Recurrences with Thrust

Using Thrust for the Taylor series approximation of 1 / (1-x). Demonstrates evaluating recurrences in parallel with a scan using Thrust running on CUDA.

When the Taylor series expansion can be factored into a linear recurrence (not sure if this is always the case or not), you can use a scan to evaluate it. See Section 1.4 of Guy Blelloch's Prefix Sums and Their Applications (required reading for parallel programmers!).

The example used here is 1/(1-x). It's expansion is 1 + x + x^2 + ..., which can be truncated and factored into x(x(x+1)+1)+1 (4 terms shown). Blelloch shows how to create the operator for such a recurrence and explains the associativity requirements for the operator.

If you only need the final value and not the intermediate values you can replace the scan with a reduce (also pointed out by Blelloch!).

On its own, this example does not utilize a GPU very well. (64 terms is plenty for this example, and 64 threads is not a lot of work for the GPU!) But if you need to compute a LOT of small expansions, it could be a good approach. Expanding this example to evaluate the recurrence (say 32 terms) around many different values of x in parallel is straightforward. Simply change the initialization of the input to include a different value of x at each set of 32 inputs, and then use thrust::inclusive_scan_by_key() (or thrust::reduce_by_key()) instead of thrust::inclusive_scan. This will better utilize your GPU. :) (Note you will also need to add an operator to evaluate key equality.)

In any case, it's a cool trick.

Compilation

Tested with CUDA 9.

nvcc --std=c++11 --expt-extended-lambda Recurrence.cu -o recurrence
[mharris@ivb123 test]$ ./recurrence .10101010
(0.1010101000, 1.0000000000)
(0.0102030403, 1.1010101000)
(0.0010306101, 1.1112131403)
(0.0001041020, 1.1122437504)
(0.0000105154, 1.1123478525)
(0.0000010622, 1.1123583678)
(0.0000001073, 1.1123594300)
(0.0000000108, 1.1123595373)
(0.0000000011, 1.1123595481)
(0.0000000001, 1.1123595492)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
(0.0000000000, 1.1123595493)
[mharris@ivb123 test]$ ./recurrence 0.55
(0.5500000000, 1.0000000000)
(0.3025000000, 1.5500000000)
(0.1663750000, 1.8525000000)
(0.0915062500, 2.0188750000)
(0.0503284375, 2.1103812500)
(0.0276806406, 2.1607096875)
(0.0152243523, 2.1883903281)
(0.0083733938, 2.2036146805)
(0.0046053666, 2.2119880743)
(0.0025329516, 2.2165934408)
(0.0013931234, 2.2191263925)
(0.0007662179, 2.2205195159)
(0.0004214198, 2.2212857337)
(0.0002317809, 2.2217071535)
(0.0001274795, 2.2219389345)
(0.0000701137, 2.2220664139)
(0.0000385625, 2.2221365277)
(0.0000212094, 2.2221750902)
(0.0000116652, 2.2221962996)
(0.0000064158, 2.2222079648)
(0.0000035287, 2.2222143806)
(0.0000019408, 2.2222179093)
(0.0000010674, 2.2222198501)
(0.0000005871, 2.2222209176)
(0.0000003229, 2.2222215047)
(0.0000001776, 2.2222218276)
(0.0000000977, 2.2222220052)
(0.0000000537, 2.2222221028)
(0.0000000295, 2.2222221566)
(0.0000000163, 2.2222221861)
(0.0000000089, 2.2222222024)
(0.0000000049, 2.2222222113)
(0.0000000027, 2.2222222162)
(0.0000000015, 2.2222222189)
(0.0000000008, 2.2222222204)
(0.0000000004, 2.2222222212)
(0.0000000002, 2.2222222217)
(0.0000000001, 2.2222222219)
(0.0000000001, 2.2222222221)
(0.0000000000, 2.2222222221)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
(0.0000000000, 2.2222222222)
#include <thrust/scan.h>
#include <thrust/device_vector.h>
#include <thrust/fill.h>
#include <vector>
#include <algorithm>
#include <cstdlib>
int main(int argc, char **argv)
{
using T = double;
using T2 = double2;
T x = .0123456789;
int numTerms = 64;
if (argc > 1) x = atof(argv[1]);
thrust::device_vector<T2> input(numTerms);
thrust::device_vector<T2> output(numTerms);
T2 init { x, 1 };
thrust::fill(input.begin(), input.end(), init);
// This custom operator is what does the magic of evaluating the recurrence.
// Different recurrences require different operators.
auto op = [] __device__ (T2 l, T2 r) { return T2 { l.x * r.x,
l.y * r.x + r.y }; };
thrust::inclusive_scan(thrust::device, input.begin(), input.end(),
output.begin(), op);
// copy to STL vector
std::vector<T2> result(numTerms);
thrust::copy(output.begin(), output.end(), result.begin());
std::for_each(result.begin(), result.end(), [] (T2 x) {
printf("(%0.10f, %0.10f)\n", x.x, x.y);
});
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment