Skip to content

Instantly share code, notes, and snippets.

View mjhea0's full-sized avatar

Michael Herman mjhea0

View GitHub Profile
@dannguyen
dannguyen / README.md
Last active July 29, 2025 14:26
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

@dmvaldman
dmvaldman / promisesEM.md
Last active June 1, 2024 00:20
Promises as EventEmitters

Promises as EventEmitters

I was trying to understand JavaScript Promises by using various libraries (bluebird, when, Q) and other async approaches.

I read the spec, some blog posts, and looked through some code. I learned how to

@dannguyen
dannguyen / EXAMPLE_WATSON_API_README.md
Last active November 23, 2020 13:32
Transcribing ProPublica podcast with Python and Watson Speech to Text API

Using IBM Watson Speech to Text API to translate a ProPublica podcast

An example of using the Watson Speech to Text API to translate a podcast from ProPublica: How a Reporter Pierced the Hype Behind Theranos

This is just a simpler demo of the same technique I demonstrate to make automated video supercuts in this repo: https://github.com/dannguyen/watson-word-watcher

The transcription takes just a few minutes (less if you parallelize the requests to IBM) and is free...but it isn't perfect by any means. It doesn't fare super well on proper nouns:

  • Charles Ornstein's last name is transcribed as Orenstein
  • John Carreyrou's last name becomes John Kerry Roo
@cludden
cludden / howto-installing-vault-on-aws-linux.md
Created February 3, 2016 00:30
HOWTO: Installing Vault on AWS Linux

HOWTO: Installing Vault On AWS Linux

This is quick howto for installing vault on AWS Linux, mostly to remind myself. At the end of this tutorial, you'll have a working vault server, using s3 for the backend, self signed certificates for tls, and supervisord to ensure that the vault server is always running, and starts on reboot.

Setting up S3

First things first, let's set up an s3 bucket to use as the storage backend for our s3 instance.

  1. From the AWS Mangement Console, go to the S3 console.

  2. Click on the Create Bucket button

@nepsilon
nepsilon / python-better-flow-control.md
Last active July 8, 2016 06:20
Better Flow Control with Python — First published in fullweb.io issue #32

Better Flow Control with Python

I recently interviewed 4 developers for a Python programming position They all knew how to use requests, call APIs and worked either with Django or Flask, but I saw all of them ignoring most of Python’s specific control flow.

Here are two of them, try/except/else/finally and for/else:

try: 
    # What you want to do, which might
@ossanna16
ossanna16 / Beginner-friendly Python Open Source Projects
Last active October 25, 2024 09:57
This is a list of beginner-friendly Python open source projects. I'm always looking for new projects to add to my list, if you have an idea please tweet me at @ossanna16 :)
* OpenHatch - https://openhatch.org/search/?q=&language=Python
* PyLadies - https://github.com/pyladies
* New Coder - https://github.com/econchick/new-coder
* Django Girls - https://github.com/DjangoGirls
* Matplotlib - https://github.com/matplotlib/matplotlib
* Hylang - http://docs.hylang.org/en/latest/, https://github.com/hylang/hy
* Open Slides (Django) - http://openslides.org/
* Zeeguu - https://zeeguu.unibe.ch
* Project Jupyter - https://github.com/jupyter
* nbgrader - https://github.com/jupyter/nbgrader
@anqxyr
anqxyr / archived
Last active July 5, 2018 15:08
Create EPUB files with Python
The gist that used to be here has since been implemented as a complete pip-installable package: https://github.com/anqxyr/mkepub
This notice is left here as a courtesy to the people who starred/bookmarked this gist in the past.
@nepsilon
nepsilon / postgres-import-export-csv.md
Last active July 29, 2024 01:26
Importing and Exporting CSV files with PostgreSQL — First published in fullweb.io issue #19

Importing and exporting CSV files with PostgreSQL

Let’s see how to use PostgreSQL to import and export CSV files painlessly with the COPY command.

Import CSV into table t_words:

COPY t_words FROM '/path/to/file.csv' DELIMITER ',' CSV;

You can tell quote char with QUOTE and change delimiter with DELIMITER.

@ourmaninamsterdam
ourmaninamsterdam / LICENSE
Last active October 15, 2025 12:17
Arrayzing - The JavaScript array cheatsheet
The MIT License (MIT)
Copyright (c) 2015 Justin Perry
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
@dannguyen
dannguyen / _README.md
Last active January 20, 2016 07:16
Scripts to autodownload and organize the California kindergarten immunization data files

Fetching and collating the California Kindergarten immunization data in Python and Bash

by Dan Nguyen @dancow

tl;dr: a quick example of practicing reproducible data journalism, and somewhat timely given the recent school vaccination law signed by California Gov. Jerry Brown

These are scripts that are part of the mundaneprogramming.github.io repo for SRCCON 2015 and will soon have their own entry/explanation on that site. They aren't meant to be best/canonical practices (e.g. I felt like using csv.DictWriter so there it is), nor do I guarantee that they work. But you're free to run them to see what happens. All they currently do is download the relevant spreadsheets and compile them into a file, which ends up being one of the most tedious parts of the entire investigation due to how the [files are organized on the home