Skip to content

Instantly share code, notes, and snippets.

View shentonfreude's full-sized avatar

Chris Shenton shentonfreude

View GitHub Profile

Tesseract notes

Output to stdout:

tesseract doc.png stdout

Output to stdout, and assume only digits:

@shentonfreude
shentonfreude / cac-uninstall-mac.sh
Created April 4, 2018 12:56
Uninstall CAC PIV card software per MilitaryCAC.org: useful when reinstalling
#!/bin/bash
# Automate the CAC Uninstall doc on MilitaryCAC.org for Macs
# http://militarycac.org/macuninstall.htm
RM="echo rm "
RM="rm"
declare -a OPENS=(
"/Applications/Utilities/Centrify/SmartCardTool"
"/Applications/Utilities/Centrify/SmartCardAssist"
@shentonfreude
shentonfreude / boto_debug_log_config.py
Created September 20, 2017 19:24
AWS boto debug logging config suggested by AWS
import boto3
import botocore
from time import gmtime, strftime
FORMAT = "[%(levelname)s %(asctime)s %(filename)s:%(lineno)s - %(funcName)21s() ] %(message)s"
DATEFMT= strftime("%%Y%m%dT%H%M%SZ", gmtime()) logging.basicConfig(level=logging.DEBUG, format=FORMAT, datefmt=DATEFMT)
logger = logging.getLogger(__name__)
boto3_log = logging.getLogger("boto3")
boto3_log.setLevel(logging.DEBUG)
@shentonfreude
shentonfreude / s3upload.js
Last active July 14, 2022 19:43
Use NodeJS AWS SDK to upload a file to S3 with server-side encryption; uses environment for AWS creds
var AWS = require('aws-sdk');
var fs = require('fs');
var bucketName = 'my-bucket-name';
var bucketRegion = 'us-gov-west-1';
var file = 'PDFSCANS/AGILE_5_page.pdf';
var key = 'doc_pdf/CHRISJSnocreds.pdf';
AWS.config.update({
region: bucketRegion
@shentonfreude
shentonfreude / elasticsearch_delete_all_docs.py
Created August 15, 2017 21:02
Delete all docs from AWS Elasticsearch Service -- useful for development
"""Get all doc ids and delete them to provide a clean index for demos, etc."""
import os
import sys
from aws_requests_auth.aws_auth import AWSRequestsAuth
from elasticsearch import Elasticsearch, RequestsHttpConnection
try:
es_index = sys.argv[1]
@shentonfreude
shentonfreude / perl-http-get-range.pl
Created March 16, 2017 19:29
Verify we can read an HTTP Range of bytes with Perl's LWP library
#!/usr/bin/env perl
# Read a range of bytes from HTTP file
use LWP;
my $url = 'http://example.com/alex.jpg';
my $browser = LWP::UserAgent->new;
my @ns_headers = ('Range' => 'bytes=0-80');
my $response = $browser->get($url, @ns_headers);
print "Resp content:", $response->content;
@shentonfreude
shentonfreude / s3_copy_cachecontrol_size_matters.py
Created January 23, 2017 19:23
Setting CacheControl in S3 copy requires MetadataDirective=REPLACE .... or not, depending on size
#!/usr/bin/env python
#
# If you do not set MetadataDirective=REPLACE the managed copy will NOT set the
# CacheControl on the copy, nor will it warn you your attributes are being
# ignored. That is, unless the source is large enough to trigger the
# MultipartUpload, which will in fact set the CacheControl. This behavior
# difference based on size is -- to say the least -- surprising.
# See kyleknap's comment: https://github.com/aws/aws-cli/pull/1188
#
# Note that when using REPLACE (on "small" files) it will replace all metadata
@shentonfreude
shentonfreude / create_et_preset.py
Created November 3, 2016 16:33
Create AWS ElasticTranscoder preset that preserves frames per second based on their 1080p preset.
# Create AWS ElasticTranscoder preset based on AWS 1080p with 'auto' framerate.
# This uses same output FPS as incoming, so we don't up/down sample.
import boto3
et = boto3.client('elastictranscoder')
res = et.read_preset(Id='1351620000001-000001') # Generic 1080p
preset = res['Preset']
del preset['Type']
@shentonfreude
shentonfreude / dynamodb_update_item.py
Created July 19, 2016 14:58
DynamoDB update_item() is convoluted, requiring placeholders and name maps to avoid 572 reserved keywords; this seems a pretty concise approach to mapping a dict to names it will accept.
def asset_update(asset_id, state, extra={}):
asset = extra.copy()
asset.update({'dt': datetime.now().isoformat(),
'state': state})
# We need update_item placeholder and values like:
# UpdateExpression='SET dt = :dt, #state = :state' # 'state' is a reserved word
# ExpressionAttributeNames={'#state': 'state'} # proxy for reserved word 'state'
# ExpressionAttributeValues={':dt': datetime.now().isoformat(), ':state': 'uploaded'}
# We can't use "reserved words" like 'state' use EAN for *all* names,
@shentonfreude
shentonfreude / aws_cloudsearch_field.py
Created April 19, 2016 17:54
Get the endpoints by the human name, then search on a single field with structured query
aws_region = config['AWS']['region']
search_domain_name = config['AWS']['cloudsearch_domain_name'] # e.g., avail-search-dev
cs = boto3.client("cloudsearch")
cs_domain = cs.describe_domains(DomainNames=[search_domain_name])['DomainStatusList'][0]
cs_doc_ep = cs_domain['DocService']['Endpoint']
cs_doc = boto3.client('cloudsearchdomain',
endpoint_url='https://' + cs_doc_ep,
region_name=aws_region)
cs_search_ep = cs_domain['SearchService']['Endpoint']
cs_search = boto3.client('cloudsearchdomain',