Skip to content

Instantly share code, notes, and snippets.

@satendrakumar
Last active October 21, 2020 14:48
Show Gist options
  • Save satendrakumar/e518a8a29a191e2c4efbb56ead20f80c to your computer and use it in GitHub Desktop.
Save satendrakumar/e518a8a29a191e2c4efbb56ead20f80c to your computer and use it in GitHub Desktop.
Add file name as Spark DataFrame column
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
object DataFrameWithFileNameApp extends App {
val spark: SparkSession =
SparkSession
.builder()
.appName("DataFrameApp")
.config("spark.master", "local[*]")
.getOrCreate()
val csvFileDF = spark.read.format("csv")
.option("header", "true")
.option("delimiter", ",")
.load("""src/test/resources/csv/emp.csv""")
spark.udf.register("get_file_name", (path: String) => path.split("/").last.split("\\.").head)
csvFileDF.withColumn("fileName", callUDF("get_file_name", input_file_name())).show()
spark.close()
}
//emp.csv
id,name,age,salary
1,jon,26,12.2
2,sam,29,24.4
3,rom,21,2.5
<=============output============>
+---+----+---+------+--------+
| id|name|age|salary|fileName|
+---+----+---+------+--------+
| 1| jon| 26| 12.2| emp|
| 2| sam| 29| 24.4| emp|
| 3| rom| 21| 2.5| emp|
+---+----+---+------+--------+
//sbt dependecies
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.1.0"
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment