Skip to content

Instantly share code, notes, and snippets.

View duncangh's full-sized avatar
💭
👨‍💻

Graham Duncan duncangh

💭
👨‍💻
View GitHub Profile
@duncangh
duncangh / tfpdf.py
Created December 5, 2018 02:39 — forked from bllchmbrs/tfpdf.py
TF IDF Explained in Python Along with Scikit-Learn Implementation
from __future__ import division
import string
import math
tokenize = lambda doc: doc.lower().split(" ")
document_0 = "China has a strong economy that is growing at a rapid pace. However politically it differs greatly from the US Economy."
document_1 = "At last, China seems serious about confronting an endemic problem: domestic violence and corruption."
document_2 = "Japan's prime minister, Shinzo Abe, is working towards healing the economic turmoil in his own country for his view on the future of his people."
document_3 = "Vladimir Putin is working hard to fix the economy in Russia as the Ruble has tumbled."
@duncangh
duncangh / gmailstats.gs
Created February 23, 2019 05:02 — forked from dalehamel/gmailstats.gs
Basic Gmail statistics
// To load this script, open a Google Sheet (yeah, weird I know), then select "Tools->Script Editor"
// From there, past this content into a script. You can set up triggers to run this script every day at a certain time
// by selecting Resources -> Triggers.
// I recommend you set the trigger to every 5-10 minutes. This will let the batches complete. If the trigger is too infrequent, it wont have time to finish.
// https://developers.google.com/apps-script/reference/gmail/
// For large inboxes (the ones you want to analyze) gmail will rate limit you.
// They recommend adding a sleep to call less, but then your exec time will be too long.
// To solve this, we run in batches. This is the batch size. Decrease this if exec time is too long.

Preface

This article walks you through an example of deploying a Python 3.6 application that uses Pandas and AWS S3 on AWS Lambda using Boto3 in Python in 2018. No shell, no bash, no web console, everything is automated in Python. The previous article of a Hello World example can be found here.

Again, the reason to use Python Boto3 to interact with AWS is that,

  1. I'm more familiar with Python than Bash, which means a Python script can be more flexible and powerful than Bash for me.
  2. I'm not a fun of the AWS web console. It might be easier to do certain things, but it is definitely not automated.

Introduction

Real-time Grid Component with Laravel, Vue.js, Vuex & Socket.io (Google Docs-like Functionality)

Motivation

The exercise of writing this tutorial -- as well as recording it as a screencast -- has helped me better understand the concepts behind a couple of my favorite open source tools. Both the tutorial and screencast will be of personal use in the future as references. If they are of help to others, that will be great too.

I love Google Docs' real-time, multi-user interactive capability, and I've have always been a fan of spreadsheets. I wanted to see if I could replicate that type of functionality. What I've done is taken the basic Vue.js Grid Component example and altered it a bit so that when a user clicks on a cell, that cell becomes highlighted or "active", not just in the user's browser but in any browser instance cur

@duncangh
duncangh / 2019_race_results.json
Last active January 5, 2021 00:18
Catch the Leprechan Race Results Data
{"headings":[{"key":"race_placement","name":"Place","style":"place"},{"key":"bib_num","name":"Bib","align":"right","style":"bib"},{"key":"name","name":"Name"},{"key":"gender","name":"Gender"},{"key":"city","name":"City"},{"key":"age","name":"Age","align":"right"},{"key":"age_performance_percentage","tooltip":"This shows how well you performed based on your age. Higher numbers are better, with 100% being the best.","name":"Age\nPercentage","align":"right"},{"key":"division_place","nonSortable":true,"name":"Division\nPlace","align":"right","style":"place"},{"key":"division","nonSortable":true,"name":"Division"}],"resultSet":{"setInfo":{"individual_result_set_id":146666,"race_category_id":228991,"individual_result_set_name":"5K","public_results":"T","results_source_name":"GO Race Productions","results_source_url":null,"preliminary_results":"F","individual_result_set_deleted":"F","disable_division_placement_calc":"T","pace_type":"T","result_questions_url":null,"hide_splits_in_results":"F","hide_event_names":"F",
import token
import tokenize
from six.moves import cStringIO as StringIO
import json
from pandas.io.json import json_normalize
import requests
import pandas as pd
def __fix_lazy_json(in_text):
@duncangh
duncangh / readme.md
Last active September 12, 2024 13:26
Scroll to bottom of *infinite* timeline Javascript

Autoscroll Twitter Timeline

Want to quickly find your regrettable 10 year old tweets without having to learn how to use the twitter API or scroll manually through the infinite ether of the bad takes you've tweeted throughout the ages?

Then this is for you, ya lazy human. Just paste in the console and watch those hot takes go by.

function autoScrolling() {
   window.scrollTo(0,document.body.scrollHeight);
}
@duncangh
duncangh / s3_to_pandas.py
Created April 7, 2019 03:01 — forked from jaklinger/s3_to_pandas.py
Read CSV (or JSON etc) from AWS S3 to a Pandas dataframe
import boto3
import pandas as pd
from io import BytesIO
bucket, filename = "bucket_name", "filename.csv"
s3 = boto3.resource('s3')
obj = s3.Object(bucket, filename)
with BytesIO(obj.get()['Body'].read()) as bio:
df = pd.read_csv(bio)
@duncangh
duncangh / pandas_s3_streaming.py
Created April 7, 2019 03:03
Streaming pandas DataFrame to/from S3 with on-the-fly processing and GZIP compression
def s3_to_pandas(client, bucket, key, header=None):
# get key using boto3 client
obj = client.get_object(Bucket=bucket, Key=key)
gz = gzip.GzipFile(fileobj=obj['Body'])
# load stream directly to DF
return pd.read_csv(gz, header=header, dtype=str)
def s3_to_pandas_with_processing(client, bucket, key, header=None):
@duncangh
duncangh / ec2_instance_connections.py
Last active May 12, 2019 00:22
RDS SQLAlchemy Connection Strings (if all of your databases are postgreSQL)
# I'm a one trick pony, but I can do ec2 conns too
import boto3
ec2 = boto3.client('ec2')
instances : list = ec2.describe_instances()['Reservations']
def _make_string_from_instance(instance):
key_name = instance['KeyName']
public_dns_name = instance['NetworkInterfaces'][0]['Association']['PublicDnsName']