Mahmoud Hashemi, October 2017
git ref 03c67b7f015c7f939a2f8122583dee1b38e64a56
- could be ported python and perhaps parameterized (i.e., not hardcoding 2016 data)
- expects a specific file, might be better to merge this functionality and fetching the data into a single CLI frontend
- Do we want to add a bit more logic for format testing the input files? Like basically, do the column names and number of columns match up?
- I might move some local variables like chunk/chunksize values to globals or even arguments.
- only adds a single index, something to keep an eye on perhaps
- line 140 has an assert with a try/except, better to use a simple if statement, especially since the error is not fatal.
- line 262 is another instance of catching an AssertionError. Another
reason to avoid asserts for production requirements is that
supposedly some folks run python with the
-O
parameter, "optimizing" it, and making it skip assertions. - Nice messaging, especially the note about how messages from the program will begin with arrows, helps disambiguate streams a bit.
- Have you tried wrapping the whole operation in a transaction? I see right now there's no automatic rollback on cancellation or failure, meaning that the database can be left in a bad state. At the very least we'll need a drop_schema.py or something to reset the db. Sometimes having a transaction can speed things up, because autocommit is slow. Don't forget that flush() is a thing, too.
- This tool could be put into the package and exposed as a separate entrypoint, as well.
- The
_about.py
loading approach is a fine one, I think, but maybe make the path less relative. Right now it relies on setup.py being run with the project directory as the working directory. Simply make it__file__
relative. - setuptools might also be good to add to setup_requires and/or install_requires
- Definitely want to add at least some loose version requirements to the requirements.
- For
__repr__
and friends, it's good practice for refactorability and inheritability to referenceself.__class__.__name__
instead of hardcoding the type name. (e.g., line 63) - It's interesting how the str and repr differ. Remember
__repr__
is meant to be round-trippable, and right now theenv
isn't in there, so the signatures don't match. - For
to_json()
andto_html()
, my expectation would be that these would return JSON and HTML text strings, respectively, similar to howto_dict()
returns a dict. Right now they output files. If that's the desired behavior, consider changing the names toexport_*()
orsave_*()
.
- get_notes() could simply be return cmg.info.get('notes', '')
- line 64 in info_to_context doesn't need to be an .update(). simply do
ret[key] = info[key]
- If this program is meant for python3 and python3 only, as the setup.py suggests, then no future import unicode_literals is necessary.
- Design-wise, it's a good idea to write some stuff to stdout to keep the user informed on program execution, but to keep a much more complete record in a single file you can kick over when a crash or bug occurs. This means redirecting sys.stdout on subprocesses, and writing arguments and so forth, as well.
- Maybe want to optionally enable sqlalchemy query logs in debug mode.
- Would avoid using
__call__
unless necessary. Even when necessary, better to have a normal method name (e.g.,get_results()
) and do__call__ = get_results
in the class definition.
- Automated integration test would be ideal.
- Once APIs settle down a bit, would be good to get coverage in here.
- Look into tox, add some packaging testing a la https://github.com/python-hyper/hyperlink/blob/2bc8dff95e8b1394c35bde1b5d40ee78e42c92df/tox.ini#L19
- Get some CI going on (if only for unit tests and packaging tests)