Skip to content

Instantly share code, notes, and snippets.

View BBischof's full-sized avatar

Bryan Bischof BBischof

  • current: Theory Ventures | prev: Hex, Weights and Biases, Stitch Fix, Blue Bottle, QuasiCoherent Labs, IBM
  • Berkeley, California
  • X @bebischof
View GitHub Profile
#Asim Quotes
- "We keep saying that laziness is happening but we dont really have any proof" -Asim, "No, I can prove that my laziness is happening..."
- "If there is any motivation, money is one" -Asim
- "Ideas are worthless" -Asim
- "Capstone projects are harder than starting a company[sic]" -Asim
- "Any time you see penalties, that is a signal that there is a business there." -Asim
- "Your cat will never turn into a toaster." -Asim
- "You can't lick a volume." -Asim
@BBischof
BBischof / .block
Last active July 31, 2016 00:36
Matrix of Grouped Barcharts with Aggregate Scatterplots
license: gpl-3.0

Data Engineering Capstone Project -- Bryan Bischof

Dec. 22, 2015

Project Description

Aspera's ASCP is a transfer protocol that is especially useful for large data transfers over suboptimal networks. In particular, ASCP is a UDP based transfer with guarenteed delivery. FASPstream is a version of ASCP specifically for streaming data transfers. During a transfer of these types, a log file is produced that contains time-series data for

  • bandwidth
  • retransmission rate

Rough project steps

  • I wget-ed all the articles from 2015 into a directory,
  • use find | grep | awk to create a list of paths to files, save list to var
  • loop over list of files and use cat | grep | sed to parse the files output to new files
  • loop over new files use cat to concatenate files with parts into single transcripts

Bash Commands Run

Data Engineering Capstone Project -- Bryan Bischof

Dec. 17, 2015

Project Description

Given unstructured log data from Aspera's ASCP transfer, one needs to parse these logs, and store them to a large key-value store(currently Redis). The current solution is a Python script that runs a series of regexes, and is deployed on Spark to a Mesos cluster for analysis. However, this script is highly inefficient and isn't designed to interact with a lambda architecture. In particular, it doesn't connect to a permanent data store, and second, it doesn't accept incoming streams, only batch upload and processing.

This project is to rewrite this script to do three things:

  • pure scala implementation of these hundred-so regexs
@BBischof
BBischof / numeralDecoderPuzzle
Created December 6, 2013 00:41
Some little programming puzzle I found. Apparently Facebook asked this sometime. A message containing letters from A-Z is being encoded to numbers using the following mapping: a -> 1, b -> 2, ... etc. How many decodings of some given number. Takes an input of numeral characters.
import sys
### Some little programming puzzle I found. Apparently Facebook asked this sometime.
###
### A message containing letters from A-Z is being encoded to numbers using the following mapping: a -> 1, b -> 2, ... etc. How many decodings of some given number.
###
### Takes an input of numeral characters.
input = sys.argv[1]