Skip to content

Instantly share code, notes, and snippets.

@Houstonwp
Last active January 22, 2021 03:39
Show Gist options
  • Save Houstonwp/b905920baa48891117ce9523420c1ab2 to your computer and use it in GitHub Desktop.
Save Houstonwp/b905920baa48891117ce9523420c1ab2 to your computer and use it in GitHub Desktop.
library(data.table)

q <- c(0.001,
       0.002,
       0.003,
       0.003,
       0.004,
       0.004,
       0.005,
       0.007,
       0.009,
       0.011)

w <- c(0.05,
       0.07,
       0.08,
       0.10,
       0.14,
       0.20,
       0.20,
       0.20,
       0.10,
       0.04)

P <- 100
S <- 25000
r <- 0.02

dt <- as.data.table(cbind(q,w))

npv <- function(cf, r, S, P) {
  cf[, inforce := shift(cumprod(1 - q - w), fill = 1)
  ][, lapses := inforce * w
  ][, deaths := inforce * q
  ][, claims := deaths * S
  ][, premiums := inforce * P
  ][, ncf := premiums - claims
  ][, d := (1/(1+r))^(.I)
  ][, sum(ncf*d)]
}

npv(dt,r,S,P)
#> [1] 50.32483

microbenchmark::microbenchmark(npv(dt,r,S,P))
#> Unit: milliseconds
#>              expr    min      lq     mean  median      uq    max neval
#>  npv(dt, r, S, P) 2.5791 2.71035 2.964293 2.85625 3.10385 6.0357   100

Created on 2021-01-15 by the reprex package (v0.3.0)

@lewisfogden
Copy link

I don't understand why the mean is so much higher in our benchmarks than the median

I've been running them in a Ubuntu VM on Windows 10 (using WSL) - the 10 manual runs I did for nim indicated that there was a spike every so often (e.g. some background operation running). The mean is skewed by that high 5989 maximum - it is contributing ~5989/100 = 60 to the mean, which would be around 8 microsecs otherwise.

If you are doing homogeneous operations then the median is probably a more useful indicator purely for benchmarking - if it was at scale and you wanted to test a variety of policies then agree mean (or a truncated mean) would be better to indicate your production run time.

I had a look at the Julia Benchmarks - they roughly tie up to your table. LuaJIT (which I've never heard of) gives Julia a good run for its money on the JIT front. Downside on these is that its probably harder to interface them into Python if you are building a production pipeline.

@paddyhoran
Copy link

paddyhoran commented Jan 20, 2021

Here is the Rust implementation for reference:

pub fn npv(mortality_rates: &Vec<f64>, lapse_rates: &Vec<f64>, interest_rate: f64, sum_assured: f64, premium: f64, init_pols: f64, term: Option<usize>) -> f64 {

    let term = term.unwrap_or_else(|| mortality_rates.len());
    let mut result = 0.0;
    let mut inforce = init_pols;
    let v: f64 = 1.0 / (1.0 + interest_rate);

    for (t, (q, w)) in mortality_rates.iter().zip(lapse_rates).enumerate() {
        let no_deaths = if t < term {inforce * q} else {0.0};
        let no_lapses = if t < term {inforce * w} else {0.0};
        let premiums = inforce * premium;
        let claims = no_deaths * sum_assured;
        let net_cashflow = premiums - claims;
        result += net_cashflow * v.powi(t as i32);
        inforce = inforce - no_deaths - no_lapses;
    }

    result
}

And the benchmark:

npv                     time:   [362.23 ns 364.30 ns 366.35 ns]
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

The reason there are three measurements above is that it's showing the upper and lower ranges. The Rust benchmarking system is a statistical one.

I wrote this fast so I may have made a mistake. I also did not try to optimize it so there's some performance gain to be had for sure. Even though it looks serial I'm sure the compiler is auto-vectorizing it.

@alecloudenback
Copy link

alecloudenback commented Jan 21, 2021

Nice, @paddyhoran. I have not used rust so played with it here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=255a33e84c70e3bcd43258ddbaa66f27 Looks like the answer matches the other above.

Also looks like your algorithm for the problem is more efficient overall. It's over 2x faster than the prior Julia fastest Julia version above. Code and benchmark:

function npv3(q,w,P,S,r,term=nothing)
	term = isnothing(term) ? length(q) + 1 : term + 1
	inforce = 1.0
	result = 0.0
	
	for (t,(q,w)) in enumerate(zip(q,w))
		deaths = t < term ? inforce * q : 0.0
		lapses = t < term ? inforce * w : 0.0
		premiums = inforce * P
		claims = deaths * S
		ncf = premiums - claims
		result += ncf * (1 / ( 1 + r)) ^ (t-1)
		inforce = inforce - deaths - lapses
	end
	
  	return result
end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     135.023 ns (0.00% GC)
  median time:      135.262 ns (0.00% GC)
  mean time:        137.513 ns (0.00% GC)
  maximum time:     1.327 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     873

@alecloudenback
Copy link

alecloudenback commented Jan 22, 2021

Updated Rankings for the "Life Modeling Problem":

code algorithm relative absolute timing (mean)
Julia accumulating for loop 1x 135 nanoseconds
Rust accumulating for loop 3x 360 nanoseconds
Julia vector 4x 500 nanoseconds
Python numpy vector 155x 21 microseconds
R vectorized vector 520x 70 microseconds
R data.table data.table vector 22000x 3 milliseconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment