Skip to content

Instantly share code, notes, and snippets.

View mrflip's full-sized avatar

Philip (flip) Kromer mrflip

View GitHub Profile
sudo -u hdfs hadoop fs -mkdir -p \
/tmp /tmp/mapred/system \
/user/root /user/chimpy \
$HADOOP_LOG_DIR/yarn-apps \
$HADOOP_BULK_DIR/yarn-staging/history/done_intermediate \
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging \
/user/hive/warehouse
sudo -u hdfs hadoop fs -chmod -R 1777 \
/tmp /tmp/mapred/system \
@mrflip
mrflip / gist:e3f3999af39bb9986475
Created November 19, 2014 12:19
smeared_lipstick.txt
14/11/19 10:08:12 INFO listeners.LipstickPPNL: --- Init TBPPNL ---
2014-11-19 10:08:12,865 [main] INFO com.netflix.lipstick.Main - build version: 0.6-SNAPSHOT, ts: 2014-11-19T09:02Z, git: 6324e130b4df844930f33cc7fb08aed94d07e76e
2014-11-19 10:08:12,865 [main] INFO com.netflix.lipstick.Main - Logging error messages to: /home/chimpy/book/code/pig_1416391692863.log
2014-11-19 10:08:13,245 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/chimpy/.pigbootup not found
2014-11-19 10:08:13,740 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-11-19 10:08:13,741 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://nn:8020
2014-11-19 10:08:14,215 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: jt:8021
2014-11-19 10:08:14,273 [main] INFO org.apache.hadoop.conf.Configuration.deprecati
@mrflip
mrflip / 2014 TED w Friday.md
Last active December 24, 2016 03:10
Notes from the 2014 TED conference

TED 2014 Friday

Friday mid-Morning: Onward (final session)

Andrew Solomon, author

  • Reports on experience of people in extreme circumstances
  • Avoidance and Endurance
  • Take traumas and make them part of who you'll be
  • Mother of a child due to rape: I think of him (rapist) with pity -- he has a beautiful daughter he doesn't know, and I do, and so I’m the lucky one
@mrflip
mrflip / tuning_storm_trident.asciidoc
Last active October 8, 2024 15:18
Notes on Storm+Trident tuning

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
@mrflip
mrflip / KafkaState.java
Created June 29, 2013 20:06
Trident Kafka State
package com.infochimps.storm.trident;
import kafka.javaapi.producer.Producer;
import kafka.javaapi.producer.ProducerData;
import kafka.message.Message;
import kafka.serializer.Encoder;
import kafka.producer.ProducerConfig;
import backtype.storm.task.IMetricsContext;
import storm.trident.operation.TridentCollector;
@mrflip
mrflip / 20130416-todo.md
Last active March 18, 2025 10:38
Elasticsearch Tuning Plan

Next Steps

  • Measure time spend on index, flush, refresh, merge, query, etc. (TD - done)
  • Take hot threads snapshots under read+write, read-only, write-only (TD - done)
  • Adjust refresh time to 10s (from 1s) and see how load changes (TD)
  • Measure time of a rolling restart doing disable_flush and disable_recovery (TD)
  • Specify routing on query -- make it choose same node for each shard each time (MD)
  • GC new generation size (TD)
  • Warmers
  • measure before/after of client query time with and without warmers (MD)

Performance Qualification

Identify all reasons why (eg) Elasticsearch cannot provide acceptable performance for standard requests and Qualifying load. The "Qualifying load" for each performance bound is double (or so) the worst-case scenario for that bound across all our current clients.

  • Performance
    • bandwidth (rec/s, MB/s) and latency (s) for 100B, 1kB, 100kB records
    • under read, write, read/write
    • in degraded state: a) loss of one/two servers and recovering; b) elevated packet latency + drop rate between "regions"
    • High concurrency
  • keepalive
@mrflip
mrflip / Solid.md
Last active December 15, 2015 17:09
Notes for 2013 spec

Things

"Big Five" == Elasticsearch, Storm, Kafka, HBase, wukong decorators

  • Faster chef convergence (custom packages; local physical cluster)
  • Centralized log archiving
  • Performance qualification of
  • Visibility and request manipulation
  • Metarepo (deb/rpm, gem, egg, maven)
@mrflip
mrflip / bdb_2012-fixes.sql
Last active December 15, 2015 01:09
correct a couple errors in the 2012 Baseball Databank (the 'January 9, 3:00 pm' release)
-- correct a couple errors in the 2012 Baseball Databank (the 'January 9, 3:00 pm' release)
UPDATE `master` SET `playerID` = 'baezjo01' WHERE `lahmanID` = 460 AND `playerID` = 'baezda01';
UPDATE `master` SET `bbrefID` = 'snydech03' WHERE `lahmanID` = 19419 AND `playerID` = 'snydech03';
UPDATE `master` SET `bbrefID` = 'gilgahu01' WHERE `lahmanID` = 19417 AND `playerID` = 'gilgahu01';
UPDATE `AwardsPlayers` SET `playerID` = 'braunry02' WHERE `playerID` = 'braunry01' AND `awardID` = 'Silver Slugger' AND yearID = 2012 AND `lgID` = 'NL';
UPDATE `AwardsPlayers` SET `playerID` = 'brechha01' WHERE `playerID` = 'Brecheen' AND `awardID` = 'Baseball Magazine All-Star';
package com.infochimps.kafka.consumers;
import java.io.IOException;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.util.HashMap;
import java.util.List;
import java.util.Map;