Skip to content

Instantly share code, notes, and snippets.

View alecbw's full-sized avatar

Alec Barrett-Wilsdon alecbw

View GitHub Profile
How to search in all countries *but* the US (or any other for that matter)?
Linkedin Country codes: https://developer.linkedin.com/docs/reference/country-codes#
Linkedin faceted search url format: %5B"ca%3A0"%2C"au%3A0"%2C"es%3A0"%5D
Decoded URL: ["ca:0","au:0","es:0"]
=> Complete list for injection in url (remove the country you want to exclude):
["ae:0","ar:0","at:0","au:0","be:0","br:0","ca:0","ch:0","cl:0","cn:0","co:0","cz:0","de:0","dk:0","es:0","fi:0","fr:0","fx:0","gb:0","gr:0","hk:0","hr:0","hu:0","id:0","ie:0","il:0","in:0","is:0","it:0","jp:0","lb:0","lu:0","lv:0","ma:0","mc:0","mx:0","my:0","nl:0","no:0","nz:0","oo:0","pe:0","ph:0","pk:0","pl:0","pr:0","pt:0","py:0","qa:0","ro:0","ru:0","sa:0","se:0","sg:0","sk:0","th:0","tr:0","tw:0","ua:0","us:0","uy:0","ve:0","vn:0","yu:0","za:0"]
@ian-whitestone
ian-whitestone / notes.md
Last active March 1, 2023 01:45
Best practices for presto sql

Presto Specific

  • Don’t SELECT *, Specify explicit column names (columnar store)
  • Avoid large JOINs (filter each table first)
    • In PRESTO tables are joined in the order they are listed!!
    • Join small tables earlier in the plan and leave larger fact tables to the end
    • Avoid cross joins or 1 to many joins as these can degrade performance
  • Order by and group by take time
    • only use order by in subqueries if it is really necessary
  • When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
@alphapapa
alphapapa / fitness.org
Last active February 19, 2025 04:32
An Emacs food/weight/workout tracker self-contained in a single Org file

Plots

/home/me/org/double-plot.png

Tasks

@steinwaywhw
steinwaywhw / AWS Lambda: Hello World.md
Last active March 8, 2022 10:40
An extremely simple AWS Lambda example in Python 3.

Preface

In general, AWS services can be accessed using

  1. AWS web interface,
  2. API libraries in a programming language, such as boto3 for Python 3,
  3. AWS command-line interface, i.e. awscli.

I opted for the API library since it is

from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
import zipfile,os
def proxy_chrome(PROXY_HOST,PROXY_PORT,PROXY_USER,PROXY_PASS):
manifest_json = """
{
@a-hisame
a-hisame / gzip_s3_and_json_py3.py
Created July 3, 2018 10:41
To use gzip file between python application and S3 directly for Python3
#!/usr/bin/python
# -*- coding: utf-8 -*-
'''To use gzip file between python application and S3 directly for Python3.
Python 2 version - https://gist.github.com/a-hisame/f90815f4fae695ad3f16cb48a81ec06e
'''
import io
import gzip
import json
@michael-erasmus
michael-erasmus / README.md
Last active October 19, 2021 17:59
Speeding up the deletion of an S3 bucket with millions of nested files

I had a really interesting journey today with a thorny little challenge I had while trying to delete all the files in a s3 bucket with tons of nested files. The bucket path (s3://buffer-data/emr/logs/) contained log files created by ElasticMapReduce jobs that ran every day over a couple of years (from early 2015 to early 2018).

Each EMR job would run hourly every day, firing up a cluster of machines and each machine would output it's logs. That resulted thousands of nested paths (one for each job) that contained thousands of other files. I estimated that the total number of nested files would be between 5-10 million.

I had to estimate this number by looking at samples counts of some of the nested directories, because getting the true count would mean having to recurse through the whole s3 tree which was just too slow. This is also exactly why it was challenging to delete all the files.

Deleting all the files in a s3 object like this is pretty challenging, since s3 doesn't really work like a true f

@brandedoutcast
brandedoutcast / spam-domains
Last active July 1, 2023 02:28
Spam domains that plague my email
jmails.info
sacustomerdelight.co.in
extrobuzzapp.com
ixigo.info
offer4uhub.com
netecart.com
101coupon.in
freedealcode.in
bankmarket.in
hotoffers.co.in
@DavidWells
DavidWells / serverless.yml
Created September 15, 2017 05:39
DynamoDB custom index serverless.yml example
service: service-name
provider:
name: aws
runtime: nodejs6.10
functions:
myfunc:
handler: handler.myfunc
@siliconvallaeys
siliconvallaeys / Populate Sheets With AdWords Data.js
Last active August 4, 2022 09:14
Populate Google Sheet With Custom AdWords Data
/*
// AdWords Script: Put Data From AdWords Report In Google Sheets
// --------------------------------------------------------------
// Copyright 2017 Optmyzr Inc., All Rights Reserved
//
// This script takes a Google spreadsheet as input. Based on the column headers, data filters, and date range specified
// on this sheet, it will generate different reports.
//
// The goal is to let users create custom automatic reports with AdWords data that they can then include in an automated reporting
// tool like the one offered by Optmyzr.