Skip to content

Instantly share code, notes, and snippets.

@mchav
Created January 17, 2025 15:34
Show Gist options
  • Save mchav/81ae6972bf0a338932594442f2106360 to your computer and use it in GitHub Desktop.
Save mchav/81ae6972bf0a338932594442f2106360 to your computer and use it in GitHub Desktop.
Non-allocating pearson's correlation
correlation :: T.Text -> T.Text -> DataFrame -> Maybe Double
correlation first second df = do
(UnboxedColumn (f :: VU.Vector a)) <- getColumn first df
(UnboxedColumn (s :: VU.Vector b)) <- getColumn second df
Refl <- testEquality (typeRep @a) (typeRep @Double)
Refl <- testEquality (typeRep @b) (typeRep @Double)
let n = VG.length f
let
go (-1) acc = acc
go i (mX :: Double, mY :: Double) = go (i - 1) (mX + f VU.! i, mY + s VU.! i)
let (mX, mY) = go (n - 1) (0, 0)
let
go' (-1) acc = acc
go' i (cov, varX, varY) = go' (i - 1) (cov + (x' * y'), varX + (x' * x'), varY + (y' * y'))
where x' = f VU.! i - mX
y' = s VU.! i - mY
let (cov, varX, varY) = go' (n - 1) (0, 0, 0)
return $ if n == 0 then 0 else (cov / fromIntegral n) / sqrt ((varX / fromIntegral n) * (varY / fromIntegral n))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment