Skip to content

Instantly share code, notes, and snippets.

#!/bin/sh
# A shell script you can give to customers to check if all their repair sessions are complete.
# Change SYSTEM_LOG_PATH to the directory containing system.log
if [ "x$SYSTEM_LOG_PATH" = "x" ]; then
SYSTEM_LOG_PATH=.
fi
LOG_FILE_PATTERN=*.log*
NEW_SESSION_PATH=/tmp/new-session
COMPLETED_SESSION_PATH=/tmp/completed-successfully
= Using Cassandra for large data sets (lots of data per node) =
This page aims to to give some advice as to the issues one may need to consider when using Cassandra for large data sets (meaning hundreds of gigabytes or terabytes per node). The intent is not to make original claims, but to collect in one place some issues that are operationally relevant. Other parts of the wiki are highly recommended in order to fully understand the issues involved.
This is a work in progress. If you find information out of date (e.g., a JIRA ticket referenced has been resolved but this document has not been updated), please help by editing or e-mailing cassandra-user.
Note that not all of these issues are specific to Cassandra. For example, any storage system is subject to the trade-offs of cache sizes relative to active set size, and IOPS will always be strongly correlated with the percentage of requests that penetrate caching layers. Also of note, the more data stored per node, the more data will have to be streamed in
wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
sudo tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
cd /usr/local
sudo ln -s apache-maven-3.0.5 maven
mkdir libext
cd libext
wget http://extjs.com/deploy/ext-2.2.zip
export DSE_LIB=/usr/share/dse
Part 1: The Data Model is Dead, Long Live the Data Model:
- http://www.youtube.com/watch?v=px6U2n74q3g
- http://www.slideshare.net/patrickmcfadin/the-data-model-is-dead-long-live-the-data-model
Part 2: Become a Super Modeler:
- http://www.youtube.com/watch?v=qphhxujn5Es
- http://www.slideshare.net/patrickmcfadin/become-a-super-modeler
Part 3: The World's Next Top Data Model
- http://www.youtube.com/watch?v=HdJlsOZVGwM
@jeromatron
jeromatron / SimpleClient.java
Created May 30, 2013 17:26
Basic encode/decode of composite values with DataStax Java driver
package org.mostlyharmless;
import com.datastax.driver.core.*;
import org.apache.cassandra.db.marshal.*;
import java.util.ArrayList;
import java.util.List;
public class SimpleClient {
Class Name | Shallow Heap | Retained Heap | Percentage
-----------------------------------------------------------------------------------------
java.lang.Object[340186] @ 0x639a368b0 | 1,360,760 | 7,754,588,120 | 98.24%
|- org.apache.cassandra.db.Row @ 0x5fae884a8 | 24 | 34,704 | 0.00%
|- org.apache.cassandra.db.Row @ 0x625f9eb88 | 24 | 34,704 | 0.00%
|- org.apache.cassandra.db.Row @ 0x605d82328 | 24 | 34,704 | 0.00%
|- org.apache.cassandra.db.Row @ 0x6888426f0 | 24 | 34,704 | 0.00%
|- org.apache.cassandra.db.Row @ 0x689d73a00 | 24 | 34,704 | 0.00%
|- org.apache.cassandra.db.Row @ 0x68c49c360 | 24 | 34,704 | 0.00%
|- org.apache.cassandra.db.Row @ 0x68d5ff600 | 24 | 34,704 | 0.00%
//
// Old Hadoop API
//
public org.apache.hadoop.mapred.InputSplit[] getSplits(JobConf jobConf, int numSplits) throws IOException
{
//TaskAttemptContext tac = new TaskAttemptContext(jobConf, new TaskAttemptID());
TaskAttemptContext tac = new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(jobConf, new TaskAttemptID());
List<org.apache.hadoop.mapreduce.InputSplit> newInputSplits = this.getSplits(tac);
org.apache.hadoop.mapred.InputSplit[] oldInputSplits = new org.apache.hadoop.mapred.InputSplit[newInputSplits.size()];
@jeromatron
jeromatron / gist:1128625
Created August 5, 2011 21:57
jmx/jconsole shell function
# jmx/jconsole tunneling shell function - add to .profile
#
# usage: jmx [keypair] [remote host] [remote port]
# example: jmx /keys/keypair.pem [email protected] 8080
function jmx() {
keypair=$1
remote_host=$2
remote_port=$3
proxy_port=10999
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1300 for user-log deletion with retainTimeStamp:1301508459323
2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1277 for user-log deletion with retainTimeStamp:1301508459323
2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1144 for user-log deletion with retainTimeStamp:1301508459323
2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_2777 for user-log deletion with retainTimeStamp:1301508459323
2011-03-29 18:07:39,969 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1298 for user-log deletion with retainTimeStamp:1301508459323
2011-03-29 18:07:39,969 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_2771 for user-log deletion with retainTimeStamp:1301508459323
2011-03-29 18:07:39,969 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_2011030