Skip to content

Instantly share code, notes, and snippets.

@mahmoud
Last active November 13, 2017 04:01
Show Gist options
  • Save mahmoud/7ce4ea6557dd52ff28f2f31ccab6b086 to your computer and use it in GitHub Desktop.
Save mahmoud/7ce4ea6557dd52ff28f2f31ccab6b086 to your computer and use it in GitHub Desktop.

Code review notes

Mahmoud Hashemi, October 2017

git ref 03c67b7f015c7f939a2f8122583dee1b38e64a56

download_epa.sh

  • could be ported python and perhaps parameterized (i.e., not hardcoding 2016 data)

construct_database.py

  • expects a specific file, might be better to merge this functionality and fetching the data into a single CLI frontend
  • Do we want to add a bit more logic for format testing the input files? Like basically, do the column names and number of columns match up?
  • I might move some local variables like chunk/chunksize values to globals or even arguments.
  • only adds a single index, something to keep an eye on perhaps
  • line 140 has an assert with a try/except, better to use a simple if statement, especially since the error is not fatal.
  • line 262 is another instance of catching an AssertionError. Another reason to avoid asserts for production requirements is that supposedly some folks run python with the -O parameter, "optimizing" it, and making it skip assertions.
  • Nice messaging, especially the note about how messages from the program will begin with arrows, helps disambiguate streams a bit.
  • Have you tried wrapping the whole operation in a transaction? I see right now there's no automatic rollback on cancellation or failure, meaning that the database can be left in a bad state. At the very least we'll need a drop_schema.py or something to reset the db. Sometimes having a transaction can speed things up, because autocommit is slow. Don't forget that flush() is a thing, too.
  • This tool could be put into the package and exposed as a separate entrypoint, as well.

setup.py

  • The _about.py loading approach is a fine one, I think, but maybe make the path less relative. Right now it relies on setup.py being run with the project directory as the working directory. Simply make it __file__ relative.
  • setuptools might also be good to add to setup_requires and/or install_requires
  • Definitely want to add at least some loose version requirements to the requirements.

cmgroup.py

  • For __repr__ and friends, it's good practice for refactorability and inheritability to reference self.__class__.__name__ instead of hardcoding the type name. (e.g., line 63)
  • It's interesting how the str and repr differ. Remember __repr__ is meant to be round-trippable, and right now the env isn't in there, so the signatures don't match.
  • For to_json() and to_html(), my expectation would be that these would return JSON and HTML text strings, respectively, similar to how to_dict() returns a dict. Right now they output files. If that's the desired behavior, consider changing the names to export_*() or save_*().

hypertext.py

  • get_notes() could simply be return cmg.info.get('notes', '')
  • line 64 in info_to_context doesn't need to be an .update(). simply do ret[key] = info[key]

logconf.py

  • If this program is meant for python3 and python3 only, as the setup.py suggests, then no future import unicode_literals is necessary.
  • Design-wise, it's a good idea to write some stuff to stdout to keep the user informed on program execution, but to keep a much more complete record in a single file you can kick over when a crash or bug occurs. This means redirecting sys.stdout on subprocesses, and writing arguments and so forth, as well.

ops.py

  • Maybe want to optionally enable sqlalchemy query logs in debug mode.
  • Would avoid using __call__ unless necessary. Even when necessary, better to have a normal method name (e.g., get_results()) and do __call__ = get_results in the class definition.

Tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment