mavencode01’s gists

mavencode01 / build.sbt

Created August 26, 2016 08:22 — forked from HeartSaVioR/build.sbt

build.sbt for Spark 1.4.0 + HBase CDH 5.2.0 + elasticsearch-spark 2.1.0 + and so on

	import AssemblyKeys._

	name := "elasticsearch-spark-project"

	version := "1.0-SNAPSHOT"

	scalaVersion := "2.10.5"

	libraryDependencies ++= Seq(
	"org.apache.spark" %% "spark-core" % "1.4.0" % "provided",

mavencode01 / elasticsearch.yml

Created August 26, 2016 11:15

	##################### ElasticSearch Configuration Example #####################

	# This file contains an overview of various configuration settings,
	# targeted at operations staff. Application developers should
	# consult the guide at <http://elasticsearch.org/guide>.
	#
	# The installation procedure is covered at
	# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html>.
	#
	# ElasticSearch comes with reasonable defaults for most settings,

mavencode01 / SparkCopyPostgres.scala

Created September 8, 2016 15:24 — forked from longcao/SparkCopyPostgres.scala

COPY Spark DataFrame rows to PostgreSQL (via JDBC)

	import java.io.InputStream

	import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
	import org.apache.spark.sql.{ DataFrame, Row }

	import org.postgresql.copy.CopyManager
	import org.postgresql.core.BaseConnection

	val jdbcUrl = s"jdbc:postgresql://..." // db credentials elided
	val connectionProperties = {

mavencode01 / Spark-tips

Last active October 13, 2016 12:46

Spark tips

	1. Issue with Spark scratch space growing too much and running out of disk space eventually lead to failed job

	Solution:
	Removing "org.apache.spark.serializer.KryoSerializer" seems to solve the problem
	//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")


	## Your executor needs memory,
	One of the reason you get OOM exception is because the partition data your executor needs to process is
	more than what you have provided.

mavencode01 / SparkTest.scala

Last active September 19, 2016 15:22

Spark running out of Memory without caching

	package com.mavencode.clustering


	import java.util.Properties

	import com.typesafe.config.ConfigFactory
	import org.apache.log4j.{Level, LogManager}
	import org.apache.spark.SparkConf
	import org.apache.spark.rdd.RDD
	import org.apache.spark.sql.{SaveMode, SparkSession}

mavencode01 / SparkSQLJira.scala

Created February 9, 2017 19:56 — forked from marmbrus/SparkSQLJira.scala

	package com.databricks.spark.jira

	import scala.io.Source

	import org.apache.spark.rdd.RDD

	import org.apache.spark.sql._
	import org.apache.spark.sql.functions._
	import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider}

mavencode01 / SparkSQLJira.scala

Created February 9, 2017 19:56 — forked from marmbrus/SparkSQLJira.scala

	package com.databricks.spark.jira

	import scala.io.Source

	import org.apache.spark.rdd.RDD

	import org.apache.spark.sql._
	import org.apache.spark.sql.functions._
	import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider}

mavencode01 / start_docker_registry.bash

Created March 30, 2017 22:00 — forked from PieterScheffers/start_docker_registry.bash

Start docker registry with letsencrypt certificates (Linux Ubuntu)

	#!/usr/bin/env bash

	# install docker
	# https://docs.docker.com/engine/installation/linux/ubuntulinux/

	# install docker-compose
	# https://docs.docker.com/compose/install/

	# install letsencrypt
	# https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04

mavencode01 / start_docker_registry.bash

Created March 30, 2017 22:00 — forked from PieterScheffers/start_docker_registry.bash

Start docker registry with letsencrypt certificates (Linux Ubuntu)

	#!/usr/bin/env bash

	# install docker
	# https://docs.docker.com/engine/installation/linux/ubuntulinux/

	# install docker-compose
	# https://docs.docker.com/compose/install/

	# install letsencrypt
	# https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04

mavencode01 / docker-deploy.sh

Created November 1, 2017 16:31 — forked from ajbrown/docker-deploy.sh

EC2 Deployment ServiceUpdate

	#!/usr/bin/env bash

	# Deploy a new image to an ECS service by creating a new task revision
	# specifying a container repoository tag, and updating the service to use the new revision.
	#
	# Note: Your application's container MUST be the first container in the task revision.

	#The tag to deploy. Specify as the first cli argument
	TAG=$1

Philip K. Adetiloye mavencode01