Skip to content

Instantly share code, notes, and snippets.

View mchav's full-sized avatar

Michael Chavinda mchav

View GitHub Profile

Dealing with nulls as the schema evolves

Often times your data is split into a training set and test set. Columns with missing values in the training set might not have missing values in the test set (and vice versa). So, if you write a function to clean your training data it might crash on your test data. This is a problem. The software is naive to the fact that missingness is ubiquitous and should be dealt with gracefully. There are a number of ways to solve this problem.

You can see the code smell in the Synthesis.hs example. We create an uber dataframe with train and test just to approximate the final schema, then split them up again.

Read everything as Maybe a

When users read CSVs (or any file format for that matter) we can default to assuming there is missingness everywhere. The user can then just work with Maybe a as is (never unwrapping concrete types) or always deal with missingness before.

@mchav
mchav / Warning.hs
Last active February 3, 2026 16:09
RULE left-hand side too complicated to desugar
Optimised lhs: letrec {
$dCTuple9_aUpZa
:: (Columnable' Double, ColumnifyRep RUnboxed Double,
VUM.Unbox Double, () :: Constraint,
(Real Double, Fractional Double), SBoolI True, SBoolI True,
SBoolI False, SBoolI True)
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=False,
Value=True, ConLike=True, WorkFree=True, Expandable=True,
@mchav
mchav / EqSatDynamic.hs
Last active January 23, 2026 20:36
Now dynamic
#!/usr/bin/env cabal
{- cabal:
build-depends: base >= 4, dataframe, hegg, text
-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE TypeSynonymInstances, FlexibleInstances #-}
{-# LANGUAGE TypeApplications #-}
module EGGP where
@mchav
mchav / Titanic.hs
Last active January 17, 2026 03:42
{-# LANGUAGE NumericUnderscores #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeApplications #-}
import qualified Data.Text as T
import qualified DataFrame as D
import qualified DataFrame.Functions as F
import Data.Char
( ifThenElse
(geq (lit (1.6)) (col @Double "petal.length"))
( ifThenElse
(leq 0.4 (col @Double "petal.width"))
(lit ("Setosa"))
( ifThenElse
(leq 0.3 (col @Double "petal.width"))
(lit ("Setosa"))
( ifThenElse
(leq 0.2 (col @Double "petal.width"))
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BangPatterns #-}
module Main where
import qualified Data.ByteString.Char8 as B
import Data.Char (isSpace)
import System.Environment (getArgs)
-- Run commands:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE NumericUnderscores #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
import Control.Monad (when)
import qualified Data.ByteString.Lazy as L
import Data.List (foldl')
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE ExplicitNamespaces #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
-- Sort of like Java's step builder pattern.
-- https://medium.com/@castigliego/step-builder-pattern-3bcac4eaf9e8
-- A type that encodes that Y doesn't exist.
data NeedY
-- A type that encodes that a chart is ready.
data Ready
data Plot (s :: *) where