Last active
May 7, 2023 15:45
-
-
Save dacr/c56f604b9486461708b2a16e74b8e766 to your computer and use it in GitHub Desktop.
Feed elasticsearch with almost 20 years of chicago crimes (using spark). / published by https://github.com/dacr/code-examples-manager #385ba213-e769-499c-92ae-3f63cfb72d15/ebc1b9c37aaafb3304faed44448e29616a08d1e3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// summary : Feed elasticsearch with almost 20 years of chicago crimes (using spark). | |
// keywords : scala, elasticsearch, feed, chicago, crimes, bigdata, spark | |
// publish : gist | |
// authors : David Crosson | |
// license : Apache NON-AI License Version 2.0 (https://raw.githubusercontent.com/non-ai-licenses/non-ai-licenses/main/NON-AI-APACHE2) | |
// id : 385ba213-e769-499c-92ae-3f63cfb72d15 | |
// created-on : 2019-11-02T21:23:37Z | |
// managed-by : https://github.com/dacr/code-examples-manager | |
// execution : scala 2.12 ammonite script (http://ammonite.io/) - run as follow 'amm scriptname.sc' | |
// spark 2.4.4 is only for scala 2.12, 2.5.x will bring scala 2.13 support | |
import $ivy.`org.apache.spark::spark-sql:2.4.4` | |
//import $ivy.`org.elasticsearch::elasticsearch-spark-20:7.3.2` // not yet available for scala 2.12 !!! | |
import org.apache.spark.sql._ | |
/* | |
Fill elasticsearch with ~19 years of chicago crimes data : | |
`curl -L https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD -o crimes.csv` | |
*/ | |
val spark = | |
SparkSession.builder() | |
.master("local[*]") | |
.getOrCreate() | |
spark.conf.set("spark.sql.session.timeZone", "America/Chicago") | |
def sc = spark.sparkContext | |
val crimesCSV = | |
spark.read.format("csv") | |
.option("sep", ",") | |
.option("inferSchema", "true") | |
.option("header", "true") | |
.option("timestampFormat", "MM/d/yyyy hh:mm:ss a") | |
.load("crimes.csv") | |
println(crimesCSV.count()) | |
crimesCSV.printSchema() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment