This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# dfConverted.select("education").limit(5).explain("extended") | |
== Analyzed Logical Plan == | |
education: string | |
GlobalLimit 5 | |
+- LocalLimit 5 | |
+- Project [education#238] | |
+- Project [id#236, year_birth#237, education#238, count_kid#239, count_teen#240, date_customer#241, days_last_login#242, add_months(to_date('date_customer, Some(d-M-yyyy)), 72) AS date_joined#257] | |
+- Project [ID#16 AS id#236, Year_Birth#17 AS year_birth#237, Education#18 AS education#238, Kidhome#21 AS count_kid#239, Teenhome#22 AS count_teen#240, Dt_Customer#23 AS date_customer#241, Recency#24 AS days_last_login#242] | |
+- Relation[ID#16,Year_Birth#17,Education#18,Marital_Status#19,Income#20,Kidhome#21,Teenhome#22,Dt_Customer#23,Recency#24,MntWines#25,MntFruits#26,MntMeatProducts#27,MntFishProducts#28,MntSweetProducts#29,MntGoldProds#30,NumDealsPurchases#31,NumWebPurchases#32,NumCatalogPurchases#33,NumStorePurchases#34,NumWebVisitsMonth#35,AcceptedCmp3#36,AcceptedCmp4#37,AcceptedCmp5#38,AcceptedCmp1#39,... 5 more fie |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df.explain("cost") | |
df.explain("codegen") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark.sparkContext._conf.get('spark.default.parallelism') | |
200 # ์ถ๋ ฅ ๊ฒฐ๊ณผ, Spark ์ค์ ์ ๋ฐ๋ผ 200 ์ด ์๋ ๊ฐ์ผ ์ ์์ต๋๋ค. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
print(f"Partition Count of Dataframe df:\t\t{df.rdd.getNumPartitions()}") | |
print(f"Partition Count of Dataframe dfSelected:\t{dfSelected.rdd.getNumPartitions()}") | |
print(f"Partition Count of Dataframe dfConverted:\t{dfConverted.rdd.getNumPartitions()}") | |
# ์ถ๋ ฅ ๊ฒฐ๊ณผ | |
Partition Count of Dataframe df: 1 | |
Partition Count of Dataframe dfSelected: 1 | |
Partition Count of Dataframe dfConverted: 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# repartition ํจ์๋ฅผ ํตํด ํํฐ์ ์ซ์๋ฅผ 1 -> 5 ๋ก ๋๋ฆฝ๋๋ค. | |
dfPartitioned = dfConverted.repartition(5) | |
print(f"Partition Count of Dataframe dfPartitioned:\t{dfPartitioned.rdd.getNumPartitions()}") | |
Partition Count of Dataframe dfPartitioned: 5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dfConverted.repartition(col("id")) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql.functions import * | |
from pyspark.sql.types import * | |
from pyspark.sql import Row | |
# DataBricks ๋ก ์ค์ตํ๋ค๋ฉด ๊ฒฝ๋ก๋ฅผ "/FileStore/tables/marketing_campaign.csv" ๋ก ๋ณ๊ฒฝํฉ๋๋ค | |
df = spark.read.load("./marketing_campaign.csv", | |
format="csv", | |
sep="\t", | |
inferSchema="true", | |
header="true") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 'collect()' ๋ Executor ์์ ํ์ผ ๋ด์ ๋ฐ์ดํฐ๋ฅผ ์ฝ์ด Driver ๋ก ์ ์กํ๋ Action ์ ๋๋ค. | |
# ๋ง์ฝ cache() ๋ฑ์ ํตํด ์บ์ฑ๋์ด ์๋ค๋ฉด ๋ฉ๋ชจ๋ฆฌ์์ ๋ฐ์ดํฐ๋ฅผ ์ฐพ์ ๋ณด๋ผ ์ ์์ต๋๋ค. | |
collected = dfPartitioned.collect() | |
# type(collected) ์ ์คํ ๊ฒฐ๊ณผ | |
list | |
# collected[0] ์ ์คํ ๊ฒฐ๊ณผ | |
Row(id=7196, year_birth=1950, education='PhD', count_kid=1, count_teen=1, date_customer='08-02-2014', days_last_login=20, date_joined=datetime.date(2020, 2, 8)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql import Row | |
missing_days = 10 | |
# Spark ์ Row ๋ read-only ์ ๋๋ค. ๋ฐ๋ผ์ Python ์์ ๋ณ๊ฒฝํ๊ธฐ ์ํด Dict ๋ก ๋ณ๊ฒฝ ํ ๋ค์ Row ๋ก ๋๋๋ฆฝ๋๋ค. | |
# ํจ์จ์ ์ธ ๋ฐฉ๋ฒ์ด ์๋๋ฉฐ, ๋ด๋ถ ๋์์ ์ดํด๋ฅผ ๋๊ธฐ ์ํด ๋ง๋ ์ฝ๋์ ๋๋ค. | |
def updateDaysLastLogin(row): | |
parsed = row.asDict() | |
parsed['days_last_login'] = parsed['days_last_login'] + missing_days |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark.driver.cores # Driver ์์ ์ฌ์ฉํ CPU Core ์ซ์ | |
spark.driver.memory # Driver ์์ ์ฌ์ฉํ ๋ฉ๋ชจ๋ฆฌ GiB |