Skip to content

Instantly share code, notes, and snippets.

View inchoate's full-sized avatar

Jason Vertrees inchoate

View GitHub Profile
@inchoate
inchoate / realmassive_ingest.md
Last active July 9, 2020 14:52
JSON Structure to Easily Import Data into RealMassive

RealMassive Simplified Data Ingestion Format

  • Version: v0.0.1-pre. This is a proposal. We encourage feedback.
  • Last Edited: 2017-02-28

Here we document our file format for importing CRE data into RealMassive. Third party data providers will be interested in this because, if they:

  • export their data into this format;
  • provide RealMassive with a publicly accessible file via URI;
  • keep that file up to date
@inchoate
inchoate / zeep_failed_parse_response.log
Created February 27, 2017 15:12
Zeep v1.1.10 fails to parse this response
This file has been truncated, but you can view the full file.
In [7]: %paste
"""
Testing with Berkshire-Hathaway
"""
"""
Colliers API Testing
"""
import zeep
@inchoate
inchoate / zeep_parsing_failure.log
Created February 22, 2017 23:12
Zeep WSDL Parsing Error
```
17:09 $ python -mzeep 'https://listings.colliershub.com/Services/API/Currencies.svc?wsdl'
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/vertrees/.virtualenvs/colliers_testing/lib/python2.7/site-packages/zeep/__main__.py", line 86, in <module>
main(args)
File "/Users/vertrees/.virtualenvs/colliers_testing/lib/python2.7/site-packages/zeep/__main__.py", line 75, in main
@inchoate
inchoate / including_external_package_in_dataflow.md
Last active February 2, 2024 11:40
Adding an extra package to a Python Dataflow project to run on GCP

The Problem

The documentation for how to deploy a pipeline with extra, non-PyPi, pure Python packages on GCP is missing some detail. This gist shows how to package and deploy an external pure-Python, non-PyPi dependency to a managed dataflow pipeline on GCP.

TL;DR: You external package needs to be a python (source/binary) distro properly packaged and shipped alongside your pipeline. It is not enough to only specify a tar file with a setup.py.

Preparing the External Package

Your external package must have a proper setup.py. What follow is an example setup.py for our ETL package. This is used to package version 1.1.1 of the etl library. The library requires 3 native PyPi packages to run. These are specified in the install_requires field. This package also ships with custom external JSON data, declared in the package_data section. Last, the setuptools.find_packages function searches for all available packages and returns that

@inchoate
inchoate / service-checklist.md
Last active November 3, 2017 15:37 — forked from acolyer/service-checklist.md
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

Basic tenets

  • Does the design expect failures to happen regularly and handle them gracefully?
  • Have we kept things as simple as possible?
@inchoate
inchoate / git patterns.md
Created June 14, 2016 18:38 — forked from wayspurrchen/git patterns.md
Useful Git Techniques

History

Show file at certain commit

git show <hash>:<file>

Show history of a file

git log -p <filename>