Last active
February 3, 2026 20:21
-
-
Save dacr/c56f604b9486461708b2a16e74b8e766 to your computer and use it in GitHub Desktop.
Feed elasticsearch with almost 20 years of chicago crimes (using spark). / published by https://github.com/dacr/code-examples-manager #385ba213-e769-499c-92ae-3f63cfb72d15/841edc39f2365b7f1aed8727140c66b29452622a
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // summary : Feed elasticsearch with almost 20 years of chicago crimes (using spark). | |
| // keywords : scala, elasticsearch, feed, chicago, crimes, bigdata, spark | |
| // publish : gist | |
| // authors : David Crosson | |
| // license : Apache License Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0.txt) | |
| // id : 385ba213-e769-499c-92ae-3f63cfb72d15 | |
| // created-on : 2019-11-02T21:23:37Z | |
| // managed-by : https://github.com/dacr/code-examples-manager | |
| // execution : scala 2.12 ammonite script (http://ammonite.io/) - run as follow 'amm scriptname.sc' | |
| // spark 2.4.4 is only for scala 2.12, 2.5.x will bring scala 2.13 support | |
| import $ivy.`org.apache.spark::spark-sql:2.4.4` | |
| //import $ivy.`org.elasticsearch::elasticsearch-spark-20:7.3.2` // not yet available for scala 2.12 !!! | |
| import org.apache.spark.sql._ | |
| /* | |
| Fill elasticsearch with ~19 years of chicago crimes data : | |
| `curl -L https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD -o crimes.csv` | |
| */ | |
| val spark = | |
| SparkSession.builder() | |
| .master("local[*]") | |
| .getOrCreate() | |
| spark.conf.set("spark.sql.session.timeZone", "America/Chicago") | |
| def sc = spark.sparkContext | |
| val crimesCSV = | |
| spark.read.format("csv") | |
| .option("sep", ",") | |
| .option("inferSchema", "true") | |
| .option("header", "true") | |
| .option("timestampFormat", "MM/d/yyyy hh:mm:ss a") | |
| .load("crimes.csv") | |
| println(crimesCSV.count()) | |
| crimesCSV.printSchema() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment