Eugene Cheipesh echeipesh

S3 Catalog Refactor Notes and Benchmarks

This summary is the result of work done to migrate S3 catalog away from using Kryo for record serialization because of issues described here: locationtech/geotrellis#1138

We have chosen Apache Avro as the alternative. In parallel ongoing work has motivated a closer look at S3 catalog performance and benchmarking. The main question is: is it possible to use S3 as a datasource in web request-response cycle? Original tests where not encouraging with a representative query taking ~20s to complete.

Serialization Story

There has been some understandable confusion from the phrase "Kryo is out!" So it's worth while to go over the different serializations going on in GeoTrellis and Spark:

	// an attempt to switch on both type parameter and some value
	abstract class Location {
	def apply[K: TypeTag](layer: LayerId): String
	}

	object Location {
	def apply(f: PartialFunction[(Type, LayerId), String]) =
	new Location {
	def apply[K: TypeTag](id: LayerId): String = {
	f((typeOf[K], id))

	package org.example

	import java.util.concurrent.Executors
	import scala.concurrent.duration._
	import scala.concurrent.ExecutionContext.Implicits.global
	import scalaz._
	import scalaz.concurrent._
	import scalaz.stream._
	import scalaz.stream.async._

	package org.example

	import shapeless._
	import shapeless.ops.hlist._


	object Min extends Poly1 {
	implicit def tuple[C] = at[((C, C), Ordering[C])]{ case ((c1, c2), ord) => ord.min(c1, c2) }
	}

	trait Baz[T] {
	def apply(): String
	}

	object Baz {
	implicit def forInt = new Baz[Int]{def apply() = "Default Baz for Int"}
	}

	trait BarMagnet
	object BarMagnet{

	<!DOCTYPE html>
	<html>
	<head>
	<title>Leaflet Layers Control Example</title>
	<meta charset="utf-8" />

	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<link rel="stylesheet" href="//cdn.leafletjs.com/leaflet-0.7.3/leaflet.css" />
	<script src="//cdn.leafletjs.com/leaflet-0.7.3/leaflet.js"></script>
	<style>

	trait Magic {
	type FlagType
	val flag: Flag
	}

	case class Box(m: Magic) {
	def get: m.FlagType = m.flag
	}

	object Main {

	import com.scalakata._

	@instrument class Playground {
	trait Constructor[T] {
	type MetadataType
	type ResultType
	val stubMeta: MetadataType

	def make(thing: T, meta: MetadataType): ResultType
	}

	// ask java to serialize this suspectObject to force an error
	// in conjunction with javaOptions += "-Dsun.io.serialization.extendedDebugInfo=true" is even more useful


	def ser(suspectObject: Any) = {
	val os = new java.io.ByteArrayOutputStream
	new java.io.ObjectOutputStream(os).writeObject(suspectObject)
	os.toByteArray()
	}