Skip to content

Instantly share code, notes, and snippets.

@kumar-de
Last active September 2, 2020 21:45
Show Gist options
  • Save kumar-de/18769e453ef9c808c64c8543b38a03b8 to your computer and use it in GitHub Desktop.
Save kumar-de/18769e453ef9c808c64c8543b38a03b8 to your computer and use it in GitHub Desktop.
access HDFS from Spark
val conf = sc.hadoopConfiguration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)
val exists = fs.exists(new org.apache.hadoop.fs.Path("/path/on/hdfs")) // File or directory
val sequenceFiles1 = htu.getDFSCluster.getFileSystem.listStatus(new Path(outputFileDirforCrashes)).filter(_.isDirectory).map(_.getPath.toString)
        sequenceFiles1.foreach(dirPath=>{
            val files = htu.getDFSCluster.getFileSystem.listStatus(new Path(dirPath)).filter(_.isFile).map(_.getPath.toString)
            files.foreach(filePath=>{
                print(filePath)
            })
        })
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment