Count empty values in csv file, grouped by column with pandas
Reference: Data Cleaning with Python and Pandas: Detecting Missing Values
- Python 3.6
- pip
-
Place csv file into your local computer
-
Ensure that csv file contains header row
-
Install requirements by:
$ pip3 install -r requirements.txt
$ cat data/sample.csv
PID,ST_NUM,ST_NAME,OWN_OCCUPIED,NUM_BEDROOMS,NUM_BATH,SQ_FT
100001000,104,PUTNAM,Y,3,1,1000
100002000,197,LEXINGTON,N,3,1.5,
100003000,,LEXINGTON,N,,1,850
100004000,,BERKELEY,12,1,,700
100005000,203,BERKELEY,Y,3,2,1600
100006000,207,BERKELEY,Y,,1,800
100007000,100,WASHINGTON,,2,HURLEY,950
100008000,213,TREMONT,Y,1,1,
100009000,215,TREMONT,Y,,2,1800
$ python3 app.py --path data/sample.csv --type=csv
File type is [csv]
Sample data:
PID ST_NUM ST_NAME OWN_OCCUPIED NUM_BEDROOMS NUM_BATH SQ_FT
0 100001000 104.0 PUTNAM Y 3.0 1 1000.0
1 100002000 197.0 LEXINGTON N 3.0 1.5 NaN
2 100003000 NaN LEXINGTON N NaN 1 850.0
3 100004000 NaN BERKELEY 12 1.0 NaN 700.0
4 100005000 203.0 BERKELEY Y 3.0 2 1600.0
Rate of empty values, grouped by column.
Left is column name, right is rate of empty values (range 0.0 ~ 1.0):
ST_NUM 0.222222
OWN_OCCUPIED 0.111111
NUM_BEDROOMS 0.333333
NUM_BATH 0.111111
SQ_FT 0.222222
dtype: float64
$ pip3 install pyinstaller
$ pip3 install -r requirements.txt
$ pyinstaller --onefile \
--hiddenimport "fsspec.implementations" \
--hiddenimport "fsspec.implementations.local" \
app.py
...
# Execute program
$ ./dist/app --path <your-csv-file> --type <csv-or-tsv>