Skip to content

Instantly share code, notes, and snippets.

@mark-cooper
Last active July 30, 2025 04:31
Show Gist options
  • Save mark-cooper/a76ee5bda0ae9a7f67ceeedfb022c890 to your computer and use it in GitHub Desktop.
Save mark-cooper/a76ee5bda0ae9a7f67ceeedfb022c890 to your computer and use it in GitHub Desktop.
// Let's make `parseManifest` accept a callback that takes a `ManifestEntry` (very javascript =).
// It just processes lines into manifest entries and hands them off.
func parseManifest(ctx context.Context, manifestBody string, processEntry func(ManifestEntry)) error {
dec := json.NewDecoder(strings.NewReader(manifestBody))
for {
var e ManifestEntry
if err := dec.Decode(&e); err == io.EOF {
break
} else if err != nil {
return fmt.Errorf("failed to decode manifest entry: %w", err)
}
processEntry(e)
}
return nil
}
// In handler, update parseManifest
// Use a waitgroup and goroutines to process each file -> csv conversion and upload separately
// Something like:
var wg sync.WaitGroup
semaphore := make(chan struct{}, 10) // Limit to 10 goroutines at a time
err := parseManifest(ctx, manifest, func(entry ManifestEntry) bool {
wg.Add(1)
go func(e ManifestEntry) {
defer wg.Done()
semaphore <- struct{}{}
defer func() { <-semaphore }()
// download file and convert to csv (could potentially be separate operations ...?)
csv, err := getExportDataFile(ctx, e.DataFileS3Key)
if err != nil {
log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err)
return // probs need to handle an issue better
}
// do something for filename ...
filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
// preferably upload without writing to disk first ...
if err := writeCSVFile(filename, csv); err != nil {
log.Printf("Failed to write CSV for %s: %v", e.DataFileS3Key, err)
return // probs need to handle an issue better
}
log.Printf("Successfully processed %s", e.DataFileS3Key)
}(entry)
return true
})
wg.Wait()

I don't dislike the export arn stuff. It's likely ok. Maybe could use some extra validation around the date (be sure we're getting this right) and id (format?).

The only alternative that springs to mind would be to do something with the checksum table export function. It has the arn:

exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefix)

So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting? And presumably that exportArn is usable for what you're doing ...

Questions re: CSV function.

How many files and how large are the files in the worst case? Because lots of copying into memory may need to process export files by line, append to tmp file and upload when done ...

Template

Missing the conditions stuff for IMAGE_URI. Also CSV function doesn't need the DDB envvar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment