Skip to content

Instantly share code, notes, and snippets.

View mattgillard's full-sized avatar

Matt Gillard mattgillard

View GitHub Profile
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active October 23, 2025 02:15
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@lukeplausin
lukeplausin / bash_aws_jq_cheatsheet.sh
Last active January 4, 2025 17:42
AWS, JQ and bash command cheat sheet. How to query, cut and munge things in JSON generally.
#ย Count total EBS based storage in AWS
aws ec2 describe-volumes | jq "[.Volumes[].Size] | add"
#ย Count total EBS storage with a tag filter
aws ec2 describe-volumes --filters "Name=tag:Name,Values=CloudEndure Volume qjenc" | jq "[.Volumes[].Size] | add"
# Describe instances concisely
aws ec2 describe-instances | jq '[.Reservations | .[] | .Instances | .[] | {InstanceId: .InstanceId, State: .State, SubnetId: .SubnetId, VpcId: .VpcId, Name: (.Tags[]|select(.Key=="Name")|.Value)}]'
# Wait until $instance_id is running and then immediately stop it again
aws ec2 wait instance-running --instance-id $instance_id && aws ec2 stop-instances --instance-id $instance_id
#ย Get 10th instance in the account
@crawles
crawles / Spark Dataframe Cheat Sheet.py
Last active April 26, 2022 03:09 — forked from evenv/Spark Dataframe Cheat Sheet.py
Cheat sheet for Spark Dataframes (using Python)
# A simple cheat sheet of Spark Dataframe syntax
# Current for Spark 1.6.1
# import statements
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark.sql.functions import *
#creating dataframes
df = sqlContext.createDataFrame([(1, 4), (2, 5), (3, 6)], ["A", "B"]) # from manual data
@xenithorb
xenithorb / kmod-signer.sh
Last active November 16, 2021 07:49
Akmod kernel module auto-signer for kernel upgrades on Fedora
#!/bin/bash
#
# /etc/kernel/postinst.d script to sign akmods kmods after kernel upgrade
#
# Author: Michael Goodwin Date: 2016-09-21
# 1. Copy this script to /etc/kernel/postinst.d/ and `chmod +x` it
#
# 2. Create signing keys (store these somewhere useful and safe):
# $ mkdir -p /etc/pki/tls/private/mok
@ddgenome
ddgenome / aws-creds.bash
Last active February 5, 2024 19:32
Fetch AWS STS keys and set environment variables
#!/bin/bash
# Fetch 24-hour AWS STS session token and set appropriate environment variables.
# See http://docs.aws.amazon.com/cli/latest/reference/sts/get-session-token.html .
# You must have jq installed and in your PATH https://stedolan.github.io/jq/ .
# Add this function to your .bashrc or save it to a file and source that file from .bashrc .
# https://gist.github.com/ddgenome/f13f15dd01fb88538dd6fac8c7e73f8c
#
# usage: aws-creds MFA_TOKEN [OTHER_AWS_STS_GET-SESSION-TOKEN_OPTIONS...]
function aws-creds () {
local pkg=aws-creds
@leonardofed
leonardofed / README.md
Last active November 13, 2025 21:53
A curated list of AWS resources to prepare for the AWS Certifications


A curated list of AWS resources to prepare for the AWS Certifications

A curated list of awesome AWS resources you need to prepare for the all 5 AWS Certifications. This gist will include: open source repos, blogs & blogposts, ebooks, PDF, whitepapers, video courses, free lecture, slides, sample test and many other resources.


module.exports = function (context, data) {
context.res = {
body: parseQuery(data.form)
};
context.done();
};
function parseQuery(qstr) {
@p3t3r67x0
p3t3r67x0 / openssl_commands.md
Last active May 15, 2025 17:31
Some list of openssl commands for check and verify your keys

openssl

Install

Install the OpenSSL on Debian based systems

sudo apt-get install openssl
@ololobus
ololobus / Spark+ipython_on_MacOS.md
Last active October 3, 2025 16:28
Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112

For older versions of Spark and ipython, please, see also previous version of text.

Install Java Development Kit

@roachhd
roachhd / README.md
Last active November 8, 2025 17:17
EMOJI cheatsheet ๐Ÿ˜›๐Ÿ˜ณ๐Ÿ˜—๐Ÿ˜“๐Ÿ™‰๐Ÿ˜ธ๐Ÿ™ˆ๐Ÿ™Š๐Ÿ˜ฝ๐Ÿ’€๐Ÿ’ข๐Ÿ’ฅโœจ๐Ÿ’๐Ÿ‘ซ๐Ÿ‘„๐Ÿ‘ƒ๐Ÿ‘€๐Ÿ‘›๐Ÿ‘›๐Ÿ—ผ๐Ÿ”ฎ๐Ÿ”ฎ๐ŸŽ„๐ŸŽ…๐Ÿ‘ป

EMOJI CHEAT SHEET

Emoji emoticons listed on this page are supported on Campfire, GitHub, Basecamp, Redbooth, Trac, Flowdock, Sprint.ly, Kandan, Textbox.io, Kippt, Redmine, JabbR, Trello, Hall, plug.dj, Qiita, Zendesk, Ruby China, Grove, Idobata, NodeBB Forums, Slack, Streamup, OrganisedMinds, Hackpad, Cryptbin, Kato, Reportedly, Cheerful Ghost, IRCCloud, Dashcube, MyVideoGameList, Subrosa, Sococo, Quip, And Bang, Bonusly, Discourse, Ello, and Twemoji Awesome. However some of the emoji codes are not super easy to remember, so here is a little cheat sheet. โœˆ Got flash enabled? Click the emoji code and it will be copied to your clipboard.

People

:bowtie: ๐Ÿ˜„