This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" | |
function usage() { | |
echo "Usage: | |
${0##*/} [-h][-n=APP][-r=SECONDS][-b=DATETIME][-s=STATES][-o=ORDER] | |
Options: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM jupyter/minimal-notebook:1386e2046833 | |
# ----------------------------------------------------------------------------- | |
# --- Constants | |
# ----------------------------------------------------------------------------- | |
USER $NB_USER | |
WORKDIR /home/$NB_USER |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Airflow Worker ๊ฐ ์คํ๋ Kubernetes Pod ์ ์ค์ ํฉ๋๋ค. | |
executorConfig = ExecutorBuilder( | |
image = "dask-py38", | |
resource = { memory: "80960Mi", cpu: "32" }, | |
resourceCapacityType = "SPOT", | |
resourceNodeSelector = { "compute-type": "airflow-cpu-intensive", ... }, | |
notebookCustomPackages = ["pandas==1.2.3", "pyarrow==3.0.0"], | |
notebookKernel = "python38", | |
... | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql.functions import * | |
from pyspark.sql.types import * | |
# ํ์ฌ ๋๋ ํ ๋ฆฌ์ CSV ํ์ผ์ ๋ค์ด๋ฐ์ ํ ์๋ ์ฝ๋๋ฅผ ์คํํฉ๋๋ค. | |
# ํด๋น ํ์ผ์ ํ์ฅ์๋ `.csv` ๋ก ๋์ด์์ผ๋, ์ค์ ๋ก ๋ฐ์ดํฐ์ ๊ตฌ๋ถ์๋ `\t` (ํญ) ์ ๋๋ค | |
# DataBricks ๋ก ์ค์ตํ๋ค๋ฉด ๊ฒฝ๋ก๋ฅผ "/FileStore/tables/marketing_campaign.csv" ๋ก ๋ณ๊ฒฝํฉ๋๋ค | |
df = spark.read.load("./marketing_campaign.csv", | |
format="csv", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df.printSchema() # ์คํค๋ง, ์ฆ ๋ฐ์ดํฐ์ ํํ๋ฅผ ๋ณด์ฌ์ค๋๋ค. | |
root | |
|-- ID: integer (nullable = true) | |
|-- Year_Birth: integer (nullable = true) | |
|-- Education: string (nullable = true) | |
|-- Marital_Status: string (nullable = true) | |
|-- Income: integer (nullable = true) | |
|-- Kidhome: integer (nullable = true) | |
|-- Teenhome: integer (nullable = true) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df.count() # ๋ก๋ฉํ ๋ฐ์ดํฐ์ ์ซ์๋ฅผ ์ผ ํ ์ถ๋ ฅํฉ๋๋ค | |
df.show() # ๋ฐ์ดํฐ๋ฅผ ์ผ๋ถ ์ฝ์์ ์ถ๋ ฅํฉ๋๋ค. | |
df.toPandas() # PySpark ์์ ์ฌ์ฉํ ์ ์๋ ํจ์๋ก, Jupyter ์์ ๋ฐ์ดํฐ๋ฅผ ํธํ๊ฒ ๋ณผ ์ ์์ต๋๋ค. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# df.count() ์ ๊ฒฐ๊ณผ | |
2240 | |
# df.toPandas() ์ ๊ฒฐ๊ณผ (์ผ๋ถ Row, Column ์๋ต) | |
ID Year_Birth Education Marital_Status Income Kidhome Teenhome Dt_Customer Recency MntWines ... NumWebVisitsMonth AcceptedCmp3 AcceptedCmp4 AcceptedCmp5 AcceptedCmp1 AcceptedCmp2 Complain Z_CostContact Z_Revenue Response | |
0 5524 1957 Graduation Single 58138.0 0 0 04-09-2012 58 635 ... 7 0 0 0 0 0 0 3 11 1 | |
1 2174 1954 Graduation Single 46344.0 1 1 08-03-2014 38 11 ... 5 0 0 0 0 0 0 3 11 0 | |
2 4141 1965 Graduation Together 71613.0 0 0 21-08-2013 26 426 ... 4 0 0 0 0 0 0 3 11 0 | |
3 6182 1984 Graduation Together 26646.0 1 0 10-02-2014 26 11 ... 6 0 0 0 0 0 0 3 11 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ์ปฌ๋ผ์ ์ ํํ๊ณ ์ด๋ฆ์ ๋ณ๊ฒฝํฉ๋๋ค. | |
# SQL ์ SELECT 'ID' as id, 'Year_Birth' as 'year_birth'... ๊ณผ ๋์ผํฉ๋๋ค. | |
dfSelected = df.select( | |
col("ID").alias("id"), | |
col("Year_Birth").alias("year_birth"), | |
col("Education").alias("education"), | |
col("Kidhome").alias("count_kid"), | |
col("Teenhome").alias("count_teen"), | |
col("Dt_Customer").alias("date_customer"), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# dfSelected.count() ์ ๊ฒฐ๊ณผ | |
2240 | |
# dfSelected.printSchema() | |
root | |
|-- id: integer (nullable = true) | |
|-- year_birth: integer (nullable = true) | |
|-- education: string (nullable = true) | |
|-- count_kid: integer (nullable = true) | |
|-- count_teen: integer (nullable = true) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# df.rdd.id() ์คํ ๊ฒฐ๊ณผ | |
<bound method RDD.id of MapPartitionsRDD[25] at javaToPython at NativeMethodAccessorImpl.java:0> | |
# dfSelected.rdd.id() ์คํ ๊ฒฐ๊ณผ | |
<bound method RDD.id of MapPartitionsRDD[31] at javaToPython at NativeMethodAccessorImpl.java:0> |