Skip to content

Instantly share code, notes, and snippets.

@amalgjose
Created September 11, 2024 20:59
Show Gist options
  • Save amalgjose/9454d7e134755df43e5440ceccf57ec5 to your computer and use it in GitHub Desktop.
Save amalgjose/9454d7e134755df43e5440ceccf57ec5 to your computer and use it in GitHub Desktop.
Python program to download files from S3 to DBFS in Databricks
import os
import boto3
aws_access_key = ""
aws_secret_key = ""
bucket_name = ""
region_name = ""
# Path of the files in S3 bucket
prefix_path = ""
# Path to be written in DBFS
destination_path_prefix = ""
s3_client = boto3.client('s3', aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key, region_name= region_name)
response = s3_client.list_objects_v2(
Bucket=bucket_name,
Prefix=prefix_path
)
print(response)
for obj in response['Contents']:
key = obj['Key']
print(key)
local_path = f"/tmp/{os.path.basename(key)}"
s3_client.download_file(bucket_name, key, local_path)
move_status = dbutils.fs.mv(f"file://{local_path}", f"{destination_path_prefix}{os.path.basename(key)}", True)
print("Source -->%s, Destination--> %s, Status=%s" %(key, os.path.join(destination_path_prefix,os.path.basename(key)), move_status))
@bluekaf
Copy link

bluekaf commented Dec 1, 2024

Thanks so much .testing purpose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment