TIL this script helped me figure out that
- Given a previewable attachment (video, pdf) has 2 preprocessed variants
- After being attached, these are enqueued [ActiveStorage::AnalyzeJob, ActiveStorage::TransformJob, ActiveStorage::TransformJob]
- All 3 jobs will (I assume?) separately download the video from storage to a tmp file
- Running asynchronously, a race condition results: both transform jobs will see no blob.preview_image exists yet and will BOTH create one
- Since a blob only has one preview_image attached, one preview image (and any variants associated with it) will be orphaned
- As a result when the attachment is purged, those orphaned blobs and variant record will fail to be purged
Not awesome things that happen in this case:
- If a video is large, downloading it 3 times is wasteful
- Transforming each variant in a separate job creates a race condition as both jobs will see no preview_image exists yet and will both create one
- Orphaned blobs and variant records are left behind in the database and storage
Ideas for improvement:
- Test: We could write a test for this by performing the jobs async, either by adding an async: option (and thread executor) to the TestQueueAdapter (https://github.com/rails/rails/blob/main/activejob/lib/active_job/test_helper.rb#L606 )
- Fix: We could batch AnalyzeJob, and 1…n TransformJob jobs into a single MultiJob that downloads the blob once and processes all variants synchronously
- Workaround: if a video attachment needs multiple variants, user could set preprocessed: false and then write a custom job that manually processes them all in sequence, and enqueue that after create