Skip to content

Instantly share code, notes, and snippets.

View onesuper's full-sized avatar

Dreamsome onesuper

View GitHub Profile
@onesuper
onesuper / KafkaProducer.java
Created February 24, 2017 02:34 — forked from yaroncon/KafkaProducer.java
Kafka producer, with Kafka-Client and Avro
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
public class RedisClient {
private JedisPool pool;
@Inject
public RedisClient(Settings settings) {
try {
pool = new JedisPool(new JedisPoolConfig(), settings.get("redis.host"), settings.getAsInt("redis.port", 6379));
} catch (SettingsException e) {
// ignore
@onesuper
onesuper / spark-env.sh
Created October 11, 2016 06:16 — forked from berngp/spark-env.sh
Spark Env Shell for YARN - Vagrant Hadoop 2.3.0 Cluster Pseudo distributed mode.
#!/usr/bin/env bash
# This file contains environment variables required to run Spark. Copy it as
# spark-env.sh and edit that to configure Spark for your site.
#
# The following variables can be set in this file:
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
# we recommend setting app-wide options in the application's driver program.
@onesuper
onesuper / backup.sh
Created October 9, 2016 03:22 — forked from nherment/backup.sh
Backup and restore an Elastic search index (shamelessly copied from http://tech.superhappykittymeow.com/?p=296)
#!/bin/bash
# herein we backup our indexes! this script should run at like 6pm or something, after logstash
# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
# compress the data files, create a restore script, and push it all up to S3.
TODAY=`date +"%Y.%m.%d"`
INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/"
BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
BACKUPDIR="/mnt/es-backups/"
YEARMONTH=`date +"%Y-%m"`
@onesuper
onesuper / yg-env.sh
Last active September 18, 2016 01:15
yg-env.sh
#!/usr/bin/env sh
#
# $ echo 'source path_to_env.sh' >> ~/.bashrc
# $ yg_install
KAFKA_HOST=bj2-storm03:9092
ES_HOST=bj2-storm03:9200
ZK_HOST=bj2-storm03:2181,bj2-storm04:2181,bj2-storm05:2181
STORM_LOG_HOME='/usr/local/storm-default/logs/'
TODAY=$(date '+%Y-%m-%d')
@onesuper
onesuper / 词性标记.md
Created August 10, 2016 02:22 — forked from luw2007/词性标记.md
词性标记: 包含 ICTPOS3.0词性标记集、ICTCLAS 汉语词性标注集、jieba 字典中出现的词性、simhash 中可以忽略的部分词性

词的分类

  • 实词:名词、动词、形容词、状态词、区别词、数词、量词、代词
  • 虚词:副词、介词、连词、助词、拟声词、叹词。

ICTPOS3.0词性标记集

n 名词

nr 人名

@onesuper
onesuper / tuning_storm_trident.asciidoc
Created July 5, 2016 07:41 — forked from mrflip/tuning_storm_trident.asciidoc
Notes on Storm+Trident tuning

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
@onesuper
onesuper / kafka.md
Created June 30, 2016 08:46 — forked from ashrithr/kafka.md
kafka introduction

Introduction to Kafka

Kafka acts as a kind of write-ahead log (WAL) that records messages to a persistent store (disk) and allows subscribers to read and apply these changes to their own stores in a system appropriate time-frame.

Terminology:

  • Producers send messages to brokers
  • Consumers read messages from brokers
  • Messages are sent to a topic
@onesuper
onesuper / add_del_blank.rb
Created December 7, 2015 06:19
add or del blanks between Chinese and English words.
#!/usr/bin/env ruby
# Usage: deal_blanks.rb input.txt >out.txt
def isEn(char)
/\w/.match(char) != nil
end
File.open(ARGV[0], "r") do |file|
blanks_del = 0
import random
class Pool:
def __init__(self, names):
self.names = names
def pick(self):
if len(self.names) == 0: