Often times your data is split into a training set and test set. Columns with missing values in the training set might not have missing values in the test set (and vice versa). So, if you write a function to clean your training data it might crash on your test data. This is a problem. The software is naive to the fact that missingness is ubiquitous and should be dealt with gracefully. There are a number of ways to solve this problem.
You can see the code smell in the Synthesis.hs example. We create an uber dataframe with train and test just to approximate the final schema, then split them up again.
When users read CSVs (or any file format for that matter) we can default to assuming there is missingness everywhere. The user can then just work with Maybe a as is (never unwrapping concrete types) or always deal with missingness before.