Skip to content

Instantly share code, notes, and snippets.

View RamGhadiyaram's full-sized avatar
💭
I may be slow to respond.

RamGhadiyaram

💭
I may be slow to respond.
  • United States of America
View GitHub Profile
@mark05e
mark05e / apache-superset-on-windows10.md
Last active September 28, 2024 10:02
Installing Apache Superset on Windows 10

Installing Apache Superset on Windows 10

⚠️ WARN: This doc might be outdated. Use with caution. Only tested with Python v3.7

🙋‍♂️ INFO: If you have fixes/suggestions to for this doc, please comment below.

🌟 STAR: This doc if you found this document helpful.


@rampage644
rampage644 / spark_etl_resume.md
Created September 15, 2015 18:02
Spark ETL resume

Introduction

This document describes sample process of implementing part of existing Dim_Instance ETL.

I took only Clound Block Storage source to simplify and speedup the process. I also ignnored creation of extended tables (specific for this particular ETL process). Below are code and final thoughts about possible Spark usage as primary ETL tool.

TL;DR

Implementation

Basic ETL implementation is really straightforward. The only real problem (I mean, really problem) is to find correct and comprehensive Mapping document (description what source fields go where).

@stauntmaster
stauntmaster / HDFSWriting
Last active July 4, 2016 09:35
benchmarking in terms of directly writing to hdfs, appending to an existing file in hdfs, and locally writing the file and uploading it to hdfs.
package org.shiftehfar.reza.benchmark;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
@ryanlecompte
ryanlecompte / gist:5746241
Last active October 31, 2019 05:05
Bounded priority queue in Scala
import scala.collection.mutable
/**
* Bounded priority queue trait that is intended to be mixed into instances of
* scala.collection.mutable.PriorityQueue. By default PriorityQueue instances in
* Scala are unbounded. This trait modifies the original PriorityQueue's
* enqueue methods such that we only retain the top K elements.
* The top K elements are defined by an implicit Ordering[A].
* @author Ryan LeCompte ([email protected])
*/