Skip to content

Instantly share code, notes, and snippets.

@RaMSFT
Created October 26, 2021 11:26
Show Gist options
  • Select an option

  • Save RaMSFT/b843e738f3050d64cb5b70a6e684efd9 to your computer and use it in GitHub Desktop.

Select an option

Save RaMSFT/b843e738f3050d64cb5b70a6e684efd9 to your computer and use it in GitHub Desktop.
## Provide mount with directory where the files exists
mount_path = '/mnt/<mount name>/<directory>'
spark.sql(f"create table flights_data_2 using csv location '{mount_path}/*.csv' options(header 'true', inferSchema 'true', sep ',')")
## run a group by command on registered table
resultdf = spark.sql("select input_file_name() as filename, count(*) from flights_data_2 group by filename")
resultdf.display()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment