Skip to content

Instantly share code, notes, and snippets.

wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
sudo tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
cd /usr/local
sudo ln -s apache-maven-3.0.5 maven
mkdir libext
cd libext
wget http://extjs.com/deploy/ext-2.2.zip
export DSE_LIB=/usr/share/dse
= Using Cassandra for large data sets (lots of data per node) =
This page aims to to give some advice as to the issues one may need to consider when using Cassandra for large data sets (meaning hundreds of gigabytes or terabytes per node). The intent is not to make original claims, but to collect in one place some issues that are operationally relevant. Other parts of the wiki are highly recommended in order to fully understand the issues involved.
This is a work in progress. If you find information out of date (e.g., a JIRA ticket referenced has been resolved but this document has not been updated), please help by editing or e-mailing cassandra-user.
Note that not all of these issues are specific to Cassandra. For example, any storage system is subject to the trade-offs of cache sizes relative to active set size, and IOPS will always be strongly correlated with the percentage of requests that penetrate caching layers. Also of note, the more data stored per node, the more data will have to be streamed in
#!/bin/sh
# A shell script you can give to customers to check if all their repair sessions are complete.
# Change SYSTEM_LOG_PATH to the directory containing system.log
if [ "x$SYSTEM_LOG_PATH" = "x" ]; then
SYSTEM_LOG_PATH=.
fi
LOG_FILE_PATTERN=*.log*
NEW_SESSION_PATH=/tmp/new-session
COMPLETED_SESSION_PATH=/tmp/completed-successfully
#!/bin/sh
# A shell script you can give to customers to check if all their repair sessions are complete.
# Change SYSTEM_LOG_PATH to the directory containing system.log
if [ "x$SYSTEM_LOG_PATH" = "x" ]; then
SYSTEM_LOG_PATH=.
fi
LOG_FILE_PATTERN=*.log*
NEW_SESSION_PATH=/tmp/new-session
COMPLETED_SESSION_PATH=/tmp/completed-successfully
#!/bin/sh
# A shell script you can give to customers to check if all their repair sessions are complete.
# Change SYSTEM_LOG_PATH to the directory containing system.log
if [ "x$SYSTEM_LOG_PATH" = "x" ]; then
SYSTEM_LOG_PATH=.
fi
LOG_FILE_PATTERN=*.log*
NEW_SESSION_PATH=/tmp/new-session
COMPLETED_SESSION_PATH=/tmp/completed-successfully
@jeromatron
jeromatron / user-defined-compaction.md
Last active June 17, 2024 16:45
Using jmxsh to compact specfic sstables

Sometimes you'll want to compact specific sstables and you won't want to major compact. You may not have the ability to run jconsole and want to use a command-line tool. There's a ticket (CASSANDRA-10660) to be able to do this with nodetool, but in the meantime, you can use something like jmxsh (or my mirror for when google pulls the plug on google code).

The jmx mbean to do user defined compaction is in the org.apache.cassandra.db package in the CompactionExecutor and the method is called forceUserDefinedCompaction. You'll need to get a list of sstables that you want to compact.

Start jmxsh with java -jar jmxsh-R5.jar -h localhost -p 7199:

jeremy$ java -jar jmxsh-R5.jar -h localhost -p 7199
jmxsh v1.0, Tue Jan 22 10:23:12 CST 2008
@jeromatron
jeromatron / CHANGES.txt
Created November 27, 2015 03:41
2.1.5.469 CHANGES.txt
2.1.6
* Warn on misuse of unlogged batches (CASSANDRA-9282)
* Failure detector detects and ignores local pauses (CASSANDRA-9183)
* Add utility class to support for rate limiting a given log statement (CASSANDRA-9029)
* Add missing consistency levels to cassandra-stess (CASSANDRA-9361)
* Fix commitlog getCompletedTasks to not increment (CASSANDRA-9339)
* Fix for harmless exceptions logged as ERROR (CASSANDRA-8564)
* Delete processed sstables in sstablesplit/sstableupgrade (CASSANDRA-8606)
* Improve sstable exclusion from partition tombstones (CASSANDRA-9298)
* Validate the indexed column rather than the cell's contents for 2i (CASSANDRA-9057)
@jeromatron
jeromatron / dependency-tree.txt
Created January 6, 2016 19:11
tinkerpop dependency tree
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Apache TinkerPop
[INFO] Apache TinkerPop :: Gremlin Shaded
[INFO] Apache TinkerPop :: Gremlin Core
[INFO] Apache TinkerPop :: Gremlin Test
[INFO] Apache TinkerPop :: Gremlin Groovy
[INFO] Apache TinkerPop :: Gremlin Groovy Test
@jeromatron
jeromatron / multidc-best-practices.md
Last active September 6, 2017 19:19
multi datacenter best practices
  • application considerations
    • application servers usually in each DC talking to local C* nodes
    • DCAwareRoundRobin load balancing policy (Java, but each language driver has analog)
      • along with LOCAL_* consistency levels, keeps you within a DC
      • can enable using remote host(s) for fault tolerance
  • when adding a dc
    • don't forget
      • open relevant ports across data centers
      • C* nodes in cluster fully connected
  • inter dc bandwidth can take ingest rate

graph-examples

This repo contains a collection of graph examples. The intent is to provide more complete and extensive examples than what is reasonable to include in DataStax documentation or blogposts. This will include DataStax Graph Loader mapping scripts, schemas, example traversals, things to try in DataStax Studio, and application code examples. Feel free to use and modify any of these for your own purposes. There is no warranty or implied official support, but hopefully the examples will be useful as a starting point to show various ways of loading and experimenting with graph data. And if you see anything that could be improved or added, issue reports and pull requests are always welcome!

Download DSE (includes DSE Graph), DataStax Studio, and the DSE Graph Loader at https://academy.datastax.com/downloads.

Additional datasets

For additional interesting datasets, you might consider the following resources: