Skip to content

Instantly share code, notes, and snippets.

View iconara's full-sized avatar
🤖

Theo iconara

🤖
View GitHub Profile

Theo Tolv

theo@iconara.net | linkedin.com/in/theotolv | github.com/iconara

About me

I've spent 15+ years at the intersection of distributed systems, big data, and analytics — building, breaking, and scaling things on AWS as both a customer and an employee. I'm at my best when bridging the gap between deep technical work and strategic product thinking, whether that's partnering with customers to solve gnarly analytics problems or helping teams ship the right thing, the right way. I'm a strong believer in test driven development and humane code reviews.

Experience

@iconara
iconara / run-query.sh
Created November 10, 2020 13:37
Run Athena queries with aws-cli
#!/usr/bin/env bash
region=us-east-1
query='SELECT NOW()'
output_location="s3://aws-athena-query-results-1234567890-$region/"
query_execution_id=$(aws athena start-query-execution \
--region "$region" \
--query-string "$query" \
--result-configuration "OutputLocation=$output_location" \
@iconara
iconara / InputStreamResponseTransformer.java
Last active November 5, 2020 16:51
S3 GetObject InputStreamResponseTransformer using AWS SDK for Java v2
// this is an attempt to create a synchronous InputStream from a call to
// S3AsyncClient#getObject using a blocking queue.
//
// the purpose is to be able to make many S3 operations asynchronously, but
// at the same time be able to pass off some results to threads and into
// code that expects InputStream or Reader, like a Commons CSV.
public class InputStreamResponseTransformer extends InputStream implements AsyncResponseTransformer<GetObjectResponse, InputStream>, Subscriber<ByteBuffer> {
private static final ByteBuffer END_MARKER = ByteBuffer.allocate(0);
@iconara
iconara / athena-metadata-parser.rb
Last active August 21, 2025 06:54
Athena metadata file parser
#!/usr/bin/env ruby
# This code parses the .csv.metadata files written by Athena and produces a
# structure similar to what you get from the GetQueryResults API call.
#
# I have reverse engineered the format and I'm not sure about all the details,
# but it seems to correspond to the GetQueryResults API call well. Some things,
# like nullability, the difference between name and label, and the schema_name
# and table_name fields, I haven't been able to figure out because they seem
# not to be used, or never takes any other values.
@iconara
iconara / pg.sql
Last active October 1, 2018 14:48
Useful PostgreSQL queries
-- Connection limits by role
SELECT rolname, rolconnlimit
FROM pg_roles
WHERE rolconnlimit <> -1;
-- Change connection limit for a role
ALTER USER $role WITH CONNECTION LIMIT 64;
-- Current activity
SELECT *
@iconara
iconara / validate-table.rb
Last active June 14, 2018 13:23
Quick and dirty script to find spurious files in the prefix of a Glue table
require 'aws-sdk-glue'
require 'aws-sdk-s3'
def split_s3_uri(s3_uri)
s3_uri.match(%r{\As3://(.+?)/(.+)\z}).to_a.drop(1)
end
database, table_name = ARGV.take(2)
glue = Aws::Glue::Client.new
@iconara
iconara / create-external-schema.sql
Last active September 7, 2017 14:57
Redshift Spectrum cheat sheet
-- this creates a schema called "name_of_schema_in_redshift" in Redshift,
-- that works as an alias for the Athena/Glue database "name_of_database_in_glue".
CREATE EXTERNAL SCHEMA name_of_schema_in_redshift
FROM DATA CATALOG
DATABASE 'name_of_database_in_glue'
REGION 'us-east-1'
IAM_ROLE 'arn:aws:iam::456064453472:role/xyz';
@iconara
iconara / dot-generator.rb
Last active September 14, 2017 07:13
Visualize EC2 security group dependencies
require 'aws-sdk-ec2'
ec2 = Aws::EC2::Client.new
response = ec2.describe_security_groups
puts('digraph securitygroups {')
loop do
response.security_groups.each do |security_group|
@iconara
iconara / auto-add-partitions.sql
Created September 6, 2017 13:59
Athena cheat sheet
-- Discovers all partitions of a table if they use Hive's partitioning format (e.g. partition0=abc/partition1=def)
MSCK REPAIR TABLE tablename;
@iconara
iconara / cbck.sh
Last active December 6, 2016 09:49
Check Cassandra backup integrity
#!/bin/bash
function log() {
logger -st "cbck[$$]" "$@"
}
function check_failed() {
log -p user.err "Check failed: $1"
exit 1
}