Checking for exact equality of FPs
require(dplyr)
DF = data.frame(a=seq(0, 1, by=0.2), b=1:2)
merge(data.frame(a=0.6), DF, all.x=TRUE)
# a b
# 1 0.6 NA
left_join(data.frame(a=0.6), DF)
# Joining by: "a"
# a b
# 1 0.6 NA
Yes floating point match is hard! But that's really not an answer.
This post, in fact, the entire series in that blog, is an excellent read about the ways one can overcome such surprises. It also talks about how using tolerance
is rubbish. There is not really one perfect answer to this issue (including the one provided in that blog) - which'll become also obvious by reading the comments under this link.
What we do in data.table is to round off the last 2 bytes by default for numeric comparisons (with an option to not do this if you really wish so - by doing setNumericRounding(0L)
). This is just another way to tackle this problem. This is plentiful sufficient unless we deal with really large numerics. Personally I've not seen a floating point number that huge and with decimal places that's of any use.. ex: 123456789987654321.12345. We recommend using bit64::integer64
for really large numerics.
Like I said, this is just another way of attempting to avoid surprises like the case above. But it's essential to not let it slide by saying floating point math is hard, IMHO.
getNumericRounding() # [1] 2
DT = data.table(DF, key="a")
DT[.(0.6)]
# a b
# 1: 0.6 2
setNumericRounding(0L) # no rounding
DT[.(0.6)]
# a b
# 1: 0.6 NA
I am having an issue in this regard, I have to use signif() on one of the two floating point columns I use for merge otherwise merge misses some matches.