Skip to content

Instantly share code, notes, and snippets.

View yannick's full-sized avatar
🏗️

Yannick Koechlin yannick

🏗️
View GitHub Profile

The Tamedia Data Analytics Team (TDA) works with all these companies to build advanced data products and services. We work with cutting edge technologies to go from research prototypes into production systems at a rapid pace. Our systems serve more than 40% of the attributable swiss internet traffic and reach more than 85% of the swiss population on a monthly basis. If you are as much as a data nerd as we are, you realize that this is an unique opportunity to join us as a

Data Engineer

You will closely work with the Zurich based part of the team: Data Scientists, Engineers and Product Managers to build a proprietary high-performance data analytics platform. Our platform is based on the concept of real-time stream processing and built to scale. We use kubernetes and a variety of different languages ( Scala, Go, D, C, Python, Ruby, Lua etc)

{"startTime":null,"endTime":null,"long":8.562987,"lat":47.2958735,"radius":500,"id":"geo-test-thalwil-1","campaign":"geo-test-thalwil"}
{"startTime":1504802700,"endTime":1504817100,"long":6.1342113,"lat":46.1931206,"radius":500,"id":"geo-hc-genf-2","campaign":"geo-hc-genf"}
{"startTime":1504889100,"endTime":1504903500,"long":9.8241635,"lat":46.7985466,"radius":500,"id":"geo-hc-davos-3","campaign":"geo-hc-davos"}
{"startTime":1504889100,"endTime":1504903500,"long":7.1533186,"lat":46.8173482,"radius":500,"id":"geo-hc-fribourg-4","campaign":"geo-hc-fribourg"}
{"startTime":1504889100,"endTime":1504903500,"long":8.5816299,"lat":47.4418433,"radius":500,"id":"geo-hc-kloten-5","campaign":"geo-hc-kloten"}
{"startTime":1504889100,"endTime":1504903500,"long":8.9613706,"lat":46.0267925,"radius":500,"id":"geo-hc-lugano-6","campaign":"geo-hc-lugano"}
{"startTime":1504889100,"endTime":1504903500,"long":7.7839049,"lat":46.9360929,"radius":500,"id":"geo-hc-langnau-7","campaign":"geo-hc-langnau"}
{"startTime":1504975500,"endTi
import sys
import shutil
import struct
import random
import time
import datetime
import zlib
from io import BytesIO
import io
#!/usr/bin/env bash
brew install [email protected]
export build="h2o-kafka"
git clone https://github.com/yannick/h2o.git $build
cd $build
git checkout --track -b features/log_to_kafka origin/features/log_to_kafka
mkdir build
mkdir installed
cd build
listen: 8080
hosts:
"127.0.0.1.xip.io:8080":
paths:
/:
mruby.handler: |
html = "hello world, i'm running h2o on #{OS.sysname}"
Proc.new do |env|
@yannick
yannick / kiss.py
Created May 4, 2017 12:21
keep it simple stream counting
intervall = 60
unique_ids = {}
last_flush = 0
for msg in stream:
unique_ids[msg["uid"]] = True #KISS
if (msg["ts"] - intervall > last_flush):
uniques = len( unique_ids.keys()) #ouch... but works
print( "\t".join( ( msg["ts"], uniques ) ))
last_flush = msg["ts"]
unique_ids = {}
@yannick
yannick / PKBUILD
Created April 27, 2017 05:46
h2o pkgbuild
pkgname=h2o-future
pkgver=r4027.137ebf9f
pkgrel=1
pkgdesc="Optimized HTTP server with support for HTTP/1.x and HTTP/2. git version with extra gems"
arch=('i686' 'x86_64')
depends=('libuv' 'libyaml' 'wslay' 'zlib' 'sqlite3' 'librdkafka-git')
makedepends=('cmake' 'libtool' 'make' 'pkg-config' 'ruby')
url="https://github.com/h2o/h2o"
license=('MIT')
source=("$pkgname"::'git+https://github.com/tamediadigital/h2o.git#branch=distribution'
import std.stdio;
import mruby;
import mruby.compile;
import mruby.value;
import mruby.data;
import mruby.mrb_class;
import mruby.value;
import std.string;
import std.conv;
#mruby ghetto PORT from https://ruby-doc.org/stdlib-1.9.3/libdoc/cgi/rdoc/CGI.html#method-c-parse
module CGI
def self.parse(query)
params = {}
query.split(/[&;]/).each do |pairs|
key, value = pairs.split('=',2).collect{|v| CGI::unescape(v) }
if key && value
params.has_key?(key) ? params[key].push(value) : params[key] = [value]
elsif key
params[key]=[]

dynamic configuration.

as discussed in slack i've written down my thoughts on a more dynamic config and what it could be good for to open the discussion. depending on the inputs i'd be happy to research more.

rationale

as more and more setups use cluster orchestration frameworks (such as e.g. kubernetes), h2o makes an excellent candidate for a loadbalancer / ingress controler. however in this type of setup, whenever a new new service is deployed the loadbalancer needs to be quickly updated