Skip to content

Instantly share code, notes, and snippets.

@cablehead
Created March 25, 2026 13:17
Show Gist options
  • Select an option

  • Save cablehead/68c7f638e86d747cd1f5255120052c88 to your computer and use it in GitHub Desktop.

Select an option

Save cablehead/68c7f638e86d747cd1f5255120052c88 to your computer and use it in GitHub Desktop.

In the for_collect case, my understanding is the main goal of FilePath is to prevent clobbering files with save. If we've collected, it's now safe to overwrite our original file, so FilePath is cleared. The code has always been doing this, but the previous version tied that decision to content_type — it only cleared FilePath when content_type was also None. That coupling didn't make sense, so when I added the custom metadata field, I moved the collect-time cleanup into its own for_collect method on PipelineMetadata and made it always clear FilePath regardless of other fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment