There has been more interest in this project than I first anticitpated, so I've moved this notebook to a full repository. The notebook in this gist will not be updated - all future changes will take place on the repo. You can fork the repo, submit pull requests, or open issues.
Last active
March 26, 2020 11:54
-
-
Save gschivley/09257d239a88fcbd8981ca5e0589321e to your computer and use it in GitHub Desktop.
FERC714_exploration.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: ferc-data | |
channels: | |
- conda-forge | |
dependencies: | |
- python=3.7 | |
- numpy | |
- pandas=0.25.* | |
- pip | |
# - matplotlib=3.* | |
- joblib | |
- xlrd | |
# GIS dependencies from conda-forge in case ppl start using shapefiles | |
- conda-forge::fiona | |
- conda-forge::geopandas | |
- conda-forge::shapely | |
- conda-forge::descartes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is great work Greg! I think the summary stats from the cleaning look reasonable and inline with what we saw from EIA-930. You find 97.3% of the demand values appear good (looking at output from
summary_df.describe()
). We find 2.2% of values are missing in the EIA-930 database and 0.5% are anomalous = 97.3% good values.In your summary stats, you have 0.0% missing, that is impressive.
If you are using the values for creating average profiles, I think this should be fine. We imputed in our work because we need continuous time series for use in models.