find /var/lib/cassandra/data/ -type f | grep -v -- -ib- | grep -v "/snapshots"
The version numbers, to date are:
-149.8935557,61.21759217,Starbucks - AK - Anchorage 00001,"601 West Street_601 West 5th Avenue_Anchorage, Alaska 99501_907-277-2477" | |
-149.9054948,61.19533942,Starbucks - AK - Anchorage 00002,"Carrs-Anchorage #1805_1650 W Northern Lights Blvd_Anchorage, Alaska 99503_907-339-0500" | |
-149.7522,61.2297,Starbucks - AK - Anchorage 00003,"Elmendorf AFB_Bldg 5800 Westover Avenue_Anchorage, Alaska 99506" | |
-149.8643361,61.19525062,Starbucks - AK - Anchorage 00004,"Fred Meyer - Anchorage #11_1000 E Northern Lights Blvd_Anchorage, Alaska 995084283_907-264-9600" | |
-149.8379726,61.13751355,Starbucks - AK - Anchorage 00005,"Fred Meyer - Anchorage #656_2300 Abbott Road_Anchorage, Alaska 99507_907-365-2000" | |
-149.9092788,61.13994658,Starbucks - AK - Anchorage 00006,"Fred Meyer - Anchorage (Dimond) #71_2000 W Dimond Blvd_Anchorage, Alaska 995151400_907-267-6700" | |
-149.7364877,61.19533265,Starbucks - AK - Anchorage 00007,"Safeway-Anchorage #1817_7731 E Northern Lights Blvd_Anchorage, Alaska 99504_907-331-1700" | |
-149.8211,61.2156 |
''' | |
This script performs efficient concatenation of files stored in S3. Given a | |
folder, output location, and optional suffix, all files with the given suffix | |
will be concatenated into one file stored in the output location. | |
Concatenation is performed within S3 when possible, falling back to local | |
operations when necessary. | |
Run `python combineS3Files.py -h` for more info. | |
''' |
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000 |
find /var/lib/cassandra/data/ -type f | grep -v -- -ib- | grep -v "/snapshots"
The version numbers, to date are:
goto https://console.aws.amazon.com/s3/home?region=us-west-2#
create a bucket: awscostice
add policy more or less like this:
{
"Version": "2008-10-17",
"Id": "Policy1335892530063",
"Statement": [
Request Rate and Performance Considerations
AWS S3 Developer Guide (API Version 2006-03-01)
How do I ingest a large number of small files from S3? My job looks like it's stalling.
Databricks Cloud support forum thread
What is the best way to ingest and analyze a large S3 dataset?
Databricks Cloud support forum thread
val s3Paths = "s3://yourbucket/path/to/file1.txt,s3://yourbucket/path/to/directory" | |
val pageLength = 100 | |
val key = "YOURKEY" | |
val secret = "YOUR_SECRET" | |
import com.amazonaws.services.s3._, model._ | |
import com.amazonaws.auth.BasicAWSCredentials | |
import com.amazonaws.services.s3.model.ObjectListing | |
import scala.collection.JavaConverters._ | |
import scala.io.Source |
/** | |
* This file contains the core idea of wrapping an underlying OutputFormat with an OutputFormat | |
* with an augmented key that writes to partitions using MultipleOutputs (or something similar) | |
*/ | |
package model.hadoop | |
import model.hadoop.HadoopIO.MultipleOutputer | |
import model.hadoop.HadoopIO.MultipleOutputer._ | |
import org.apache.hadoop.io.{DataInputBuffer, NullWritable} |
#!/bin/bash | |
# Checks active, non-spot, EC2 instances against the list reserved instances | |
# in order to determine which availability-zone/instance-type combinations | |
# are paying for on-demand prices (that could benefit from reserved instance | |
# purchase), or current reservations that aren't being used. | |
# | |
# Dependencies: | |
# * awscli (http://aws.amazon.com/cli/) | |
# * jq (http://stedolan.github.io/jq/) |