Last active
December 20, 2020 22:12
-
-
Save amotl/9fc67b696cbab9f0667be60de4dcf2be to your computer and use it in GitHub Desktop.
Investigate problems when scanning huge directory tree of DWD CDC HTTP server
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Investigate problems when scanning huge directory | |
| tree of DWD CDC HTTP server. | |
| This repro will reveal that fsspec seems to randomly | |
| include a single folder within its results list:: | |
| python fsspec-dwd.py | grep -v zip | |
| The result varies between single-line outputs of e.g.: | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/1996 | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/2010 | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/2004 | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/1996 | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/2006 | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/2000 | |
| https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/1999 | |
| """ | |
| import os | |
| from fsspec.implementations.http import HTTPFileSystem | |
| def process(url): | |
| fs = HTTPFileSystem() | |
| files = fs.find(url) | |
| for name in files: | |
| print(name) | |
| if __name__ == "__main__": | |
| large_folder = "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/" | |
| url = os.environ.get("DWD_URL", large_folder) | |
| process(url) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment