Skip to content

Instantly share code, notes, and snippets.

Migrate from Facebook scribe to Apache Flume (Part II)

In last article we talked about how to setup flume and write files HDFS. This article, we begin to change flume to write file in scribe like style category. Multiplexing Way?

The first thought is using source multiplex to distribute log to different destination. Flume distribute log events by event header. So we google to find out which field in header is referring to scribe header.

https://apache.googlesource.com/flume/+/d66bf94b1dd059bc7e4b1ff332be59a280498077/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/ScribeSource.java

category in header will refer to scribe category. So we try to use multiplexing source:

@dqtweb
dqtweb / gist:30a0598875703e73431a85bc7a5c9267
Created October 25, 2019 07:15 — forked from mikeyk/gist:1329319
Testing storage of millions of keys in Redis
#! /usr/bin/env python
import redis
import random
import pylibmc
import sys
r = redis.Redis(host = 'localhost', port = 6389)
mc = pylibmc.Client(['localhost:11222'])
@dqtweb
dqtweb / export-pyspark-schema-to-json.py
Created January 16, 2020 07:54 — forked from stefanthoss/export-pyspark-schema-to-json.py
Export/import a PySpark schema to/from a JSON file
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)
# Write the schema
with open("schema.json", "w") as f:
@dqtweb
dqtweb / ajax.js
Created January 26, 2020 12:09 — forked from xeoncross/ajax.js
Simple, cross-browser Javascript POST/GET xhr request object. Supports request data and proper AJAX headers.
/**
* IE 5.5+, Firefox, Opera, Chrome, Safari XHR object
*
* @param string url
* @param object callback
* @param mixed data
* @param null x
*/
function ajax(url, callback, data, x) {
try {
SELECT
type
, count(*)
, count(DISTINCT u)
, count(CASE WHEN plat=1 THEN u ELSE NULL END)
, count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL END)
, count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
, count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
FROM
t
@dqtweb
dqtweb / gist:dffc0d141b6efe4cdc2760ff435866a6
Created April 7, 2020 03:57
javascript: sort array of objects
// example, we want to sort list of objects based on an object's attribute. ex: "age"
var peopleList = [{"name": "Laura", "age":5}, {"name": "Peppa", "age": 40}, {"name": "Josh", "age":22}];
var comparision = function(a, b) { return a.age > b.age ? 1 : -1; };
@dqtweb
dqtweb / utf8csv.py
Created April 10, 2020 02:31
Add UTF-8 support to csv content file
# so that you can open with excel
def addUTF8Bom(filename):
f = codecs.open(filename, 'r', 'utf-8')
content = f.read()
f.close()
f2 = codecs.open(filename, 'w', 'utf-8')
f2.write(u'\ufeff')
f2.write(content)
f2.close()
@dqtweb
dqtweb / jupyter.service
Created April 26, 2020 10:30 — forked from r-darwish/jupyter.service
Jupyterlab as a systemd service
[Unit]
Description=Jupyter Notebook
[Service]
Type=simple
ExecStart=/home/roeyd/Notebook/.env/bin/jupyter lab --port 9090
WorkingDirectory=/home/roeyd/Notebook
[Install]
WantedBy=default.target
@dqtweb
dqtweb / client.js
Created March 3, 2023 22:48 — forked from naoki-sawada/client.js
Simple socket.io room and auth example
const io = require('socket.io-client');
const socket = io('http://localhost:3000', {
transportOptions: {
polling: {
extraHeaders: {
'Authorization': 'Bearer abc',
},
},
},
@dqtweb
dqtweb / http-benchmark.md
Created January 2, 2024 04:47 — forked from denji/http-benchmark.md
HTTP(S) Benchmark Tools / Toolkit for testing/debugging HTTP(S) and restAPI (RESTful)