Skip to content

Instantly share code, notes, and snippets.

@judell
Created March 25, 2026 04:59
Show Gist options
  • Select an option

  • Save judell/e3018cb3e863483a3303c036be2d9eb6 to your computer and use it in GitHub Desktop.

Select an option

Save judell/e3018cb3e863483a3303c036be2d9eb6 to your computer and use it in GitHub Desktop.
Plan: Stop committing generated ICS/JSON files to the repo (issue #41)

Plan: Stop Committing Generated ICS/JSON Files to the Repo

Current State

  • CI workflow commits cities/*/events.json, cities.json, and version.txt to git after every build
  • Individual .ics files from scrapers/feeds are also committed
  • The app does not serve these files to users — it queries Supabase directly
  • The load-events edge function fetches events.json from raw.githubusercontent.com to upsert into Supabase
  • cities.json and version.txt ARE used by the frontend at load time (via GitHub Pages)

The Core Tension

The issue proposes GitHub Releases as a data archive, but there's a critical dependency: load-events fetches events.json from the raw git content. If we stop committing those files to main, we need an alternative path for getting data into Supabase.


Step-by-Step Plan

Phase 1: Change the data ingestion path (must happen FIRST)

Option A — Direct upload from CI to Supabase (recommended, simplest)

  • Instead of: CI commits → git push → edge function fetches from GitHub → upserts to Supabase
  • Do: CI generates events.json → CI calls load-events with the data inline (POST body) or CI directly calls the Supabase REST API
  • This eliminates the round-trip through git entirely
  • The load-events edge function already accepts a city list in the POST body; we'd extend it to accept event data too, OR we bypass it and use the Supabase REST API directly from CI

Option B — Upload to GitHub Release, fetch from there

  • CI creates a release tarball, load-events fetches from the latest release instead of raw git
  • More complex (release API, finding latest asset URL) with no real benefit over Option A

Recommendation: Option A. Modify the workflow to POST event data directly to the edge function or Supabase API, removing the git-as-transport pattern entirely.

Phase 2: Stop tracking generated data files on main

  1. Add to .gitignore:

    cities/*/*.ics
    cities/*/events.json
    cities/*/combined.ics
    
  2. git rm --cached to untrack existing files (keeps them locally):

    git rm --cached cities/*/*.ics cities/*/events.json
  3. Keep committing cities.json and version.txt — these are small, stable, and consumed by the frontend from GitHub Pages

Phase 3: Archive data to an orphan data branch

Create an orphan data branch (no shared history with main) for analysis, debugging, and auditing:

- name: Archive data to data branch
  run: |
    # Save generated files
    mkdir -p /tmp/build-data
    cp -r cities/*/events.json cities/*/*.ics report.json /tmp/build-data/ 2>/dev/null || true

    # Switch to data branch
    git fetch origin data || git checkout --orphan data
    git checkout data

    # Copy in new data
    cp -r /tmp/build-data/* .
    git add cities/*/events.json cities/*/*.ics report.json
    git commit -m "Build $(date +%Y%m%d-%H%M%S)"
    git push origin data

    # Return to main
    git checkout main

Why a data branch:

  • Diffing is trivial: git diff data~1..data -- cities/santarosa/events.json
  • No noise on maindata branch never merges into main
  • Full git history of generated data, searchable with git log
  • Timezone tests / report.json auditing can check out data branch or fetch individual files via raw.githubusercontent.com/.../data/...
  • Per-source event count regressions are visible in git diff

Phase 4: Optional — GitHub Releases for long-term snapshots

For periodic archival with pruning (complements the data branch):

- name: Create data release
  run: |
    TAG="build-$(date +%Y%m%d-%H%M%S)"
    tar czf data.tar.gz cities/*/events.json cities/*/*.ics
    gh release create "$TAG" data.tar.gz \
      --title "Build $TAG" \
      --notes "Auto-generated calendar data"

- name: Prune old releases (keep last 30)
  run: |
    gh release list --limit 100 --json tagName,createdAt \
      | jq -r '.[30:][].tagName' \
      | xargs -I{} gh release delete {} --yes --cleanup-tag

Phase 5: Update load-events edge function

Modify supabase/functions/load-events/index.ts to accept event data directly in the POST body instead of fetching from GitHub:

// Current: fetches from raw.githubusercontent.com
// New: accepts events array in request body per city
// Fallback: can still fetch from data branch URL if needed

The workflow step changes from triggering a fetch to posting data directly.

Phase 6: Update tests

  • tests/test_timezone_pipeline.py already uses synthetic ICS data — no changes needed
  • Browser tests (test.html) query Supabase — no changes needed
  • Regression tests use the live app — no changes needed
  • If any future test needs real build data, it can check out the data branch

Phase 7: Clean up workflow

Remove from generate-calendar.yml:

  • The git add/commit/push block (lines ~441-478) that commits generated files to main
  • The rebase-conflict retry logic (no longer needed)
  • Keep the git push for cities.json and version.txt only

What Gets Simpler

  • git pull stops conflicting — no more "always expect push to fail" pattern
  • Git history becomes meaningful — only real code changes on main
  • Repo size stops growing on main — no more ~800 ICS files per build
  • CI is faster — no rebase/retry loop

What Needs Care

  • load-events edge function must be updated before we stop committing (Phase 1 before Phase 2)
  • Edge function JWT gotcha still applies if we redeploy load-events
  • Rollback path: if Supabase is down, we can't recover from main anymore — but we CAN recover from the data branch or latest GitHub Release
  • cities.json and version.txt should still be committed to main (small, used by frontend)
  • data branch size will grow over time — consider periodic squashing or starting a fresh orphan if it gets unwieldy

Execution Order

  1. Modify load-events to accept inline data (or switch to direct Supabase REST calls from CI)
  2. Test the new ingestion path on one city
  3. Create the orphan data branch
  4. Add the data-branch archive step to the workflow
  5. Update .gitignore and git rm --cached
  6. Remove the old commit/push logic from the workflow
  7. Optionally add GitHub Releases for long-term snapshots
  8. Test a full CI run end-to-end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment