Skip to content

Instantly share code, notes, and snippets.

@andrew
Last active February 20, 2026 08:15
Show Gist options
  • Select an option

  • Save andrew/8f87fcdd67a4c7dd7ed336c19cb7f72c to your computer and use it in GitHub Desktop.

Select an option

Save andrew/8f87fcdd67a4c7dd7ed336c19cb7f72c to your computer and use it in GitHub Desktop.

Adding proxy cache to forgejo using git-pkgs/proxy internals

Context

Forgejo has a package registry that handles 23 ecosystems. Packages are uploaded directly -- there's no pull-through cache from upstream registries like npmjs.org or crates.io. git-pkgs/proxy is a standalone caching proxy for 16 ecosystems that already solves this problem. The question is what can be reused.

Prior art / related issues

This has been requested multiple times in the Gitea tracker (forgejo inherits these):

None of these have been implemented.

What forgejo already has

What git-pkgs/proxy has that forgejo needs

Directly reusable as libraries

  1. internal/upstream/fetcher.go -- HTTP fetcher with retry, exponential backoff, jitter, DNS caching, pluggable auth. Clean interface (FetcherInterface), no coupling to the rest of the proxy. Could be extracted to github.com/git-pkgs/fetch or similar.

  2. internal/upstream/circuit_breaker.go -- per-registry circuit breaker wrapping the fetcher. Trips after 5 consecutive failures, exponential backoff recovery. Also implements FetcherInterface so it's a transparent wrapper.

  3. internal/upstream/resolver.go -- maps (ecosystem, name, version) to download URLs. Knows URL patterns for npm, cargo, gem, go, hex, pub, maven, nuget. Falls back to registry metadata lookup for ecosystems with dynamic URLs (PyPI). Could be extracted as-is.

Reusable as reference/patterns (not direct code reuse)

  1. Protocol handlers (internal/handler/npm.go, cargo.go, etc.) -- each one knows how to fetch upstream metadata, rewrite URLs to point at the proxy, parse download paths. These are tightly coupled to the proxy's own Proxy struct and HTTP routing (chi). The logic is reusable but the code would need to be rewritten to work within forgejo's handler pattern (forgejo uses its own context/router framework, not chi).

  2. Metadata rewriting -- the npm handler fetches registry.npmjs.org/{package} and rewrites all tarball URLs to point at the proxy. Each ecosystem has its own variant of this. This logic is the real value and it's specific per ecosystem.

What cannot be reused

  • internal/database/ -- forgejo has xorm, the proxy uses raw SQL with sqlx. Completely different data models. Forgejo's existing Package/Version/File/Blob models are sufficient.
  • internal/storage/ -- forgejo already has modules/storage with the same backends. No reason to use the proxy's wrapper.
  • internal/metrics/, internal/server/ -- forgejo has its own instrumentation and routing.

Related git-pkgs libraries

Several pieces have already been extracted from the proxy into standalone libraries:

  • git-pkgs/registries -- registry API clients for fetching package metadata (versions, URLs, license, etc.) across ecosystems. Already used by the proxy's resolver for ecosystems with dynamic download URLs.
  • git-pkgs/archives -- archive format handling (tar, zip, etc.)
  • git-pkgs/purl -- Package URL parsing and construction
  • git-pkgs/vers -- cross-ecosystem version comparison
  • git-pkgs/vulns -- OSV vulnerability lookups
  • git-pkgs/spdx -- SPDX license normalization
  • git-pkgs/enrichment -- package metadata enrichment (license, vuln, version info)

Integration approach

Step 1: Extract fetcher + circuit breaker + resolver into a library

Pull internal/upstream/ out of the proxy into something like github.com/git-pkgs/fetch. It's already cleanly separated with a good interface. The proxy itself would then import this library too.

Files to extract:

  • fetcher.go (Fetcher, FetcherInterface, Artifact)
  • circuit_breaker.go (CircuitBreakerFetcher)
  • resolver.go (Resolver, ArtifactInfo)

Step 2: Add proxy source tracking to forgejo's package model

Forgejo already has a PackageProperty system (key-value pairs on packages and versions). A proxied package could be marked with properties like:

  • upstream_registry_url = https://registry.npmjs.org
  • upstream_source = proxy

Alternatively, a new package_proxy_source table with upstream URL, last synced timestamp, and whether the package is proxy-only or mixed.

Step 3: Add proxy service in forgejo

New file: services/packages/proxy.go

Core function: "if a package download request misses locally, check if this owner/org has proxy caching enabled, resolve the upstream URL, fetch it, store it through the existing CreatePackageAndAddFile flow, serve it."

This slots in naturally. The existing DownloadPackageFile handler in each ecosystem router currently calls GetFileStreamByPackageNameAndVersion and returns 404 on miss. The proxy version would catch that 404 and try upstream.

Step 4: Add metadata proxying per ecosystem

This is the bulk of the work. Each ecosystem handler needs a "proxy metadata" path that:

  1. Fetches metadata from upstream (package index, version list, etc.)
  2. Rewrites download URLs to point at the forgejo instance
  3. Merges with any locally-published versions

The metadata rewriting logic from git-pkgs/proxy's handlers is the reference here. It can't be copy-pasted because it's wired to the proxy's routing, but the rewrite functions themselves (like rewriteMetadata in npm.go) are straightforward to adapt.

Extending git-pkgs/registries to help here: The git-pkgs/registries library currently parses upstream responses into normalized Go structs and discards the raw response. For metadata proxying, we need the raw upstream response (the actual JSON npm returns, the PyPI simple index, etc.) so we can rewrite URLs and forward it.

Adding a FetchRawMetadata(ctx, name) ([]byte, string, error) method to the Registry interface would let each ecosystem return the raw bytes + content type from whatever endpoint a package manager client would normally hit. The internal HTTP client already has GetBody() -- it just needs to be exposed. This keeps all the per-ecosystem URL construction quirks (npm scopes, Go module case encoding, Maven group/artifact paths) inside the registries library rather than duplicating them in both the proxy and forgejo.

Step 5: Configuration

Add to forgejo's app.ini:

[packages.proxy]
ENABLED = false
UPSTREAM_NPM = https://registry.npmjs.org
UPSTREAM_CARGO = https://crates.io
; ... per-ecosystem upstream URLs
; auth tokens for private registries
NPM_TOKEN = ${FORGEJO_NPM_TOKEN}

Ecosystem overlap

git-pkgs/proxy supports 16 ecosystems, forgejo supports 23. The overlap where proxy logic already exists:

Ecosystem Proxy handler Forgejo handler Proxy reuse
npm yes yes URL rewriting, version extraction
cargo yes yes Sparse index proxying
gem yes yes Specs proxying, download
go yes yes Module proxy protocol
hex yes yes Metadata + tarball
pub yes yes API proxying
pypi yes yes Simple index rewriting
maven yes yes POM + JAR proxying
nuget yes yes Service index rewriting
composer yes yes packages.json proxying
conan yes yes Recipe proxying
conda yes yes Channel data rewriting
cran yes yes PACKAGES index proxying
container yes yes OCI distribution spec
debian yes yes Packages/Release proxying
rpm yes yes Repodata proxying
alpine no yes would need new handler
arch no yes would need new handler
chef no yes would need new handler
helm no yes would need new handler
swift no yes would need new handler
vagrant no yes would need new handler
generic no yes N/A (no upstream)

Rough effort estimate by phase

  • Step 1 (extract library): small, mostly moving files and updating imports
  • Step 2 (model changes): small, property-based approach needs no migrations
  • Step 3 (proxy service): medium, the core fetch-and-store loop
  • Step 4 (metadata proxying): large, per-ecosystem work -- probably start with npm + go + cargo as the highest-value targets
  • Step 5 (configuration): small

Open questions

  • Should proxy caching be per-user/org or instance-wide? Forgejo's package registry is owner-scoped, so there'd need to be a "system" or "global" proxy owner, or each org opts in individually.
  • Cache invalidation strategy? The proxy currently has no TTL or invalidation. For forgejo, you'd probably want configurable TTL on metadata (say 5 minutes) while artifacts are immutable once cached.
  • Should proxied packages appear in the forgejo UI alongside uploaded packages, or be visually distinguished?
  • How to handle auth for private upstream registries within forgejo's settings model?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment