Data lakes based on Parquet files are increasingly popular, but data often lacks detailed statistics, so estimating them from metadata alone has become an interesting problem. The diversity of data types, encodings, and compression methods in these files offers a number of possibilities, but here we will focus on min/max values.
Parquet files break their rows into rowgroups and record metadata for each. Minimum and maximum values for each column are included to allow data skipping, but they offer some insight into the NDV.