Skip to content

Instantly share code, notes, and snippets.

View mavencode01's full-sized avatar
🏠
Working from home

Philip K. Adetiloye mavencode01

🏠
Working from home
View GitHub Profile
@mavencode01
mavencode01 / build.sbt
Created August 26, 2016 08:22 — forked from HeartSaVioR/build.sbt
build.sbt for Spark 1.4.0 + HBase CDH 5.2.0 + elasticsearch-spark 2.1.0 + and so on
import AssemblyKeys._
name := "elasticsearch-spark-project"
version := "1.0-SNAPSHOT"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.0" % "provided",
##################### ElasticSearch Configuration Example #####################
# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at <http://elasticsearch.org/guide>.
#
# The installation procedure is covered at
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html>.
#
# ElasticSearch comes with reasonable defaults for most settings,
@mavencode01
mavencode01 / SparkCopyPostgres.scala
Created September 8, 2016 15:24 — forked from longcao/SparkCopyPostgres.scala
COPY Spark DataFrame rows to PostgreSQL (via JDBC)
import java.io.InputStream
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.{ DataFrame, Row }
import org.postgresql.copy.CopyManager
import org.postgresql.core.BaseConnection
val jdbcUrl = s"jdbc:postgresql://..." // db credentials elided
val connectionProperties = {
@mavencode01
mavencode01 / Spark-tips
Last active October 13, 2016 12:46
Spark tips
1. Issue with Spark scratch space growing too much and running out of disk space eventually lead to failed job
Solution:
Removing "org.apache.spark.serializer.KryoSerializer" seems to solve the problem
//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
## Your executor needs memory,
One of the reason you get OOM exception is because the partition data your executor needs to process is
more than what you have provided.
@mavencode01
mavencode01 / SparkTest.scala
Last active September 19, 2016 15:22
Spark running out of Memory without caching
package com.mavencode.clustering
import java.util.Properties
import com.typesafe.config.ConfigFactory
import org.apache.log4j.{Level, LogManager}
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{SaveMode, SparkSession}
package com.databricks.spark.jira
import scala.io.Source
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider}
package com.databricks.spark.jira
import scala.io.Source
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider}
@mavencode01
mavencode01 / start_docker_registry.bash
Created March 30, 2017 22:00 — forked from PieterScheffers/start_docker_registry.bash
Start docker registry with letsencrypt certificates (Linux Ubuntu)
#!/usr/bin/env bash
# install docker
# https://docs.docker.com/engine/installation/linux/ubuntulinux/
# install docker-compose
# https://docs.docker.com/compose/install/
# install letsencrypt
# https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04
@mavencode01
mavencode01 / start_docker_registry.bash
Created March 30, 2017 22:00 — forked from PieterScheffers/start_docker_registry.bash
Start docker registry with letsencrypt certificates (Linux Ubuntu)
#!/usr/bin/env bash
# install docker
# https://docs.docker.com/engine/installation/linux/ubuntulinux/
# install docker-compose
# https://docs.docker.com/compose/install/
# install letsencrypt
# https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04
@mavencode01
mavencode01 / docker-deploy.sh
Created November 1, 2017 16:31 — forked from ajbrown/docker-deploy.sh
EC2 Deployment ServiceUpdate
#!/usr/bin/env bash
# Deploy a new image to an ECS service by creating a new task revision
# specifying a container repoository tag, and updating the service to use the new revision.
#
# Note: Your application's container MUST be the first container in the task revision.
#The tag to deploy. Specify as the first cli argument
TAG=$1