The following is a list of Hadoop properties for Spark to use HDFS more effective.
spark.hadoop.
-prefixed Spark properties are used to configure a Hadoop Configuration that Spark broadcast to tasks. Use spark.sparkContext.hadoopConfiguration
to review the properties.
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
=2
Read Google Cloud Storage Connector for Spark and Hadoop
Read Hadoop-AWS module
fs.s3a.impl
= org.apache.hadoop.fs.s3a.S3AFileSystemfs.s3a.multiobjectdelete.enable
=false
fs.s3a.fast.upload
=true
fs.s3a.endpoint
fs.s3a.access.key
fs.s3a.secret.key
fs.s3a.path.style.access
=true