Mike Atlas mikeatlas

Keybase proof

I hereby claim:

I am mikeatlas on github.
I am mikeatlas (https://keybase.io/mikeatlas) on keybase.
I have a public key whose fingerprint is 083F C7DB EE6C 4D77 D06A 30CF 4FAA AD15 C512 DC33

To claim this, I am signing this object:

Original idea from Transfer files from an FTP server to S3 by "Hack N Cheese".

I moved roughly a terrabyte in less than an hour. Granted, I couldn't take advantage of lftp's --parallel=30 switch due to my ftp source limiting me to one connection at a time, but use-pget-n=N did seem to help out.

Get a fast Ubuntu 14.4 EC2 box on Amazon for temporary usage (I went with m1.xlarge) so data tranfers aren't limited by your local bandwidth at least. I also attached a fat 2TB EBS volume and symlinked it to /bigdisk, and made sure the EBS volume was deleted after I terminated this EC2 box. I hope lftp 2.6.4 is available as a stable package by the next time I attempt this.
Build lftp 2.6.4+ (Not easy to compile, so read the INSTALL file and plow through all your missing dependencies - you'll also need to re-run `sudo ./configure && su

Getting GeoMesa 1.0 to work on Cloudera CDH 5.3 with Accumulo 1.6

by @mikeatlas

Thanks goes out to @manasdebashiskar for helping me work through all these steps!

Getting GeoMesa to work on Accumulo 1.6 using Cloudera's CDH 5.3 is not any less easy than getting it to work on the officially supported version of Accumulo 1.5.x, but here are the steps you can take to make it happen.

First, you will need to setup an Accumulo 1.6 cluster in CDH. This requires you create a Zookeeper cluster, an HDFS cluster, and a Hadoop MapReduce cluster. For my purposes, I have the following setup (yours may differ as you see fit/need):

3-host Zookeeper cluster, each running Ubuntu 12.02 (ami-018dd631 EC2 image) on t2.medium instances

	Latency Comparison Numbers
	--------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns
	Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
	Read 4K randomly from SSD* 150,000 ns 0.15 ms

	<html lang="en">
	<head>
	<meta charset="utf-8">

	<title>2D Picking with canvas</title>
	<meta name="description" content="">
	<meta name="author" content="Yannick Assogba">
	<script src="//rawgit.com/mrdoob/stats.js/master/build/stats.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/dat-gui/0.5/dat.gui.js"></script>


	// Fix up our CSV data
	var fs = require('fs');
	var csv = require('csv');
	var allWPIsInputFs = fs.createReadStream('./input-data/all-wpi-lat-long.csv');
	var mappedWPIsInputFs = fs.createReadStream('./input-data/mapped-wpi-lat-lons.csv');

	var output = fs.createWriteStream('./input-data/all-wpi-lat-long_trim.csv');

	var parser = csv.parse({delimiter: ','});

	<?xml version="1.0" encoding="UTF-8"?>
	<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>geomesa-gdelt</groupId>
	<artifactId>geomesa-gdelt-accumulo1.5</artifactId>
	<name>GeoMesa GDELT</name>
	<version>1.0-SNAPSHOT</version>

	<?xml version="1.0" encoding="UTF-8"?>
	<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<repositories>
	<repository>
	<id>maven2-repository.dev.java.net</id>
	<name>Java.net repository</name>

	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	~ Copyright 2014 Commonwealth Computer Research, Inc.
	~
	~ Licensed under the Apache License, Version 2.0 (the "License");
	~ you may not use this file except in compliance with the License.
	~ You may obtain a copy of the License at
	~
	~ http://www.apache.org/licenses/LICENSE-2.0
	~