Skip to content

Instantly share code, notes, and snippets.

@saswata-dutta
Created August 26, 2020 15:04
Show Gist options
  • Save saswata-dutta/7e897f92f4839601660a6b1f03a87e91 to your computer and use it in GitHub Desktop.
Save saswata-dutta/7e897f92f4839601660a6b1f03a87e91 to your computer and use it in GitHub Desktop.
path="s3://path-to-file/"
sc = spark.sparkContext
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration
fs = FileSystem.get(URI(path), Configuration())
files = fs.listStatus(Path(path))
file_status = [(file.getPath().toString(), file.getPath().toString().split("/")[-1]) for file in files]
file_status.sort(key = lambda tup: tup[1], reverse= True)
print(file_status[0][0])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment