Skip to content

Instantly share code, notes, and snippets.

@rampage644
rampage644 / spark_etl_resume.md
Created September 15, 2015 18:02
Spark ETL resume

Introduction

This document describes sample process of implementing part of existing Dim_Instance ETL.

I took only Clound Block Storage source to simplify and speedup the process. I also ignnored creation of extended tables (specific for this particular ETL process). Below are code and final thoughts about possible Spark usage as primary ETL tool.

TL;DR

Implementation

Basic ETL implementation is really straightforward. The only real problem (I mean, really problem) is to find correct and comprehensive Mapping document (description what source fields go where).

@kleem
kleem / README.md
Last active February 22, 2023 09:52
WordNet noun graph

This experiment converts an SQL version of WordNet 3.0 into a graph, using the python library graph-tool. In order to create a taxonomical structure, only noun synsets, hyponym links and hypernym links are considered.

The result of the conversion is saved as GraphML, then rendered as the following hairball:

WordNet 3.0 taxonomy as a graph

Since the graph can be considered a tangled tree, i.e. a tree in which some nodes have multiple parents, two untangled versions (using longest and shortest paths) are also provided as GraphML. Only a few links are lost (about 2%), making the tree a good approximation of the noun taxonomy graph.

@jboner
jboner / latency.txt
Last active August 6, 2025 09:53
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD