Last active
April 17, 2023 16:45
-
-
Save headquarters/ff03131fd44053cf57bb580bb33da920 to your computer and use it in GitHub Desktop.
List S3 objects (ChatGPT generated)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
def list_s3_objects(bucket_name, prefix='', page_size=100, start_after='', max_pages=10): | |
""" | |
Lists all objects in an S3 bucket, optionally filtered by a prefix, and paginates results. | |
Returns a list of dictionaries containing object metadata. | |
Args: | |
bucket_name (str): Name of the S3 bucket. | |
prefix (str): Prefix to filter objects by (default ''). | |
page_size (int): Maximum number of objects to return per page (default 100). | |
start_after (str): Object key to start listing after (default ''). | |
max_pages (int): Maximum number of pages to retrieve (default 10). | |
Returns: | |
list: List of dictionaries containing object metadata. | |
""" | |
s3 = boto3.client('s3') | |
paginator = s3.get_paginator('list_objects_v2') | |
page_iterator = paginator.paginate( | |
Bucket=bucket_name, | |
Prefix=prefix, | |
PaginationConfig={ | |
'PageSize': page_size, | |
'StartingToken': start_after, | |
} | |
) | |
objects = [] | |
page_count = 0 | |
for page in page_iterator: | |
objects.extend(page.get('Contents', [])) | |
page_count += 1 | |
if page_count >= max_pages: | |
break | |
return objects |
One bug here: "start after" is a different parameter for which key you want to start listing everything after; starting token is a place to start pagination based on a previous NextToken sent back.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Example use:
Generated by ChatGPT. Here's it's explanation:
"In this example, list_s3_objects lists all objects in the S3 bucket whose name is specified by the bucket_name parameter, filtered by the prefix specified by the prefix parameter. It retrieves up to page_size objects per page, starting with the object whose key is specified by the start_after parameter (if any), and returns up to max_pages pages of results. The function returns a list of dictionaries, where each dictionary contains metadata about an S3 object (such as its key, size, and last modified date)."