Forgejo has a package registry that handles 23 ecosystems. Packages are uploaded directly -- there's no pull-through cache from upstream registries like npmjs.org or crates.io. git-pkgs/proxy is a standalone caching proxy for 16 ecosystems that already solves this problem. The question is what can be reused.
This has been requested multiple times in the Gitea tracker (forgejo inherits these):
- Support proxy registries for each package type (gitea#21223) -- the main feature request, references JFrog Artifactory as the canonical example
- Gitea as Package Registry Reverse Proxy (gitea#23619) -- similar request framed as reverse proxy
- Container Registry Pull through cache (gitea#26756) -- container-specific pull-through cache request
None of these have been implemented.
- Storage: content-addressed blob store with deduplication (local/S3/Minio), at
modules/packages/content_store.go - Database models: Package, PackageVersion, PackageFile, PackageBlob with xorm ORM, at
models/packages/ - Service layer:
services/packages/packages.gowithCreatePackageAndAddFile(),GetFileStreamByPackageNameAndVersion(), etc. - Per-ecosystem routers:
routers/api/packages/{type}/with metadata endpoints, download, upload, search - Access control: owner-scoped permissions, quota enforcement
- Settings:
modules/setting/packages.go-- per-ecosystem size limits, storage config - No proxy/upstream/cache concept at all -- every package is locally uploaded
-
internal/upstream/fetcher.go-- HTTP fetcher with retry, exponential backoff, jitter, DNS caching, pluggable auth. Clean interface (FetcherInterface), no coupling to the rest of the proxy. Could be extracted togithub.com/git-pkgs/fetchor similar. -
internal/upstream/circuit_breaker.go-- per-registry circuit breaker wrapping the fetcher. Trips after 5 consecutive failures, exponential backoff recovery. Also implementsFetcherInterfaceso it's a transparent wrapper. -
internal/upstream/resolver.go-- maps (ecosystem, name, version) to download URLs. Knows URL patterns for npm, cargo, gem, go, hex, pub, maven, nuget. Falls back to registry metadata lookup for ecosystems with dynamic URLs (PyPI). Could be extracted as-is.
-
Protocol handlers (
internal/handler/npm.go,cargo.go, etc.) -- each one knows how to fetch upstream metadata, rewrite URLs to point at the proxy, parse download paths. These are tightly coupled to the proxy's ownProxystruct and HTTP routing (chi). The logic is reusable but the code would need to be rewritten to work within forgejo's handler pattern (forgejo uses its own context/router framework, not chi). -
Metadata rewriting -- the npm handler fetches
registry.npmjs.org/{package}and rewrites all tarball URLs to point at the proxy. Each ecosystem has its own variant of this. This logic is the real value and it's specific per ecosystem.
internal/database/-- forgejo has xorm, the proxy uses raw SQL with sqlx. Completely different data models. Forgejo's existing Package/Version/File/Blob models are sufficient.internal/storage/-- forgejo already hasmodules/storagewith the same backends. No reason to use the proxy's wrapper.internal/metrics/,internal/server/-- forgejo has its own instrumentation and routing.
Several pieces have already been extracted from the proxy into standalone libraries:
git-pkgs/registries-- registry API clients for fetching package metadata (versions, URLs, license, etc.) across ecosystems. Already used by the proxy's resolver for ecosystems with dynamic download URLs.git-pkgs/archives-- archive format handling (tar, zip, etc.)git-pkgs/purl-- Package URL parsing and constructiongit-pkgs/vers-- cross-ecosystem version comparisongit-pkgs/vulns-- OSV vulnerability lookupsgit-pkgs/spdx-- SPDX license normalizationgit-pkgs/enrichment-- package metadata enrichment (license, vuln, version info)
Pull internal/upstream/ out of the proxy into something like github.com/git-pkgs/fetch. It's already cleanly separated with a good interface. The proxy itself would then import this library too.
Files to extract:
fetcher.go(Fetcher, FetcherInterface, Artifact)circuit_breaker.go(CircuitBreakerFetcher)resolver.go(Resolver, ArtifactInfo)
Forgejo already has a PackageProperty system (key-value pairs on packages and versions). A proxied package could be marked with properties like:
upstream_registry_url=https://registry.npmjs.orgupstream_source=proxy
Alternatively, a new package_proxy_source table with upstream URL, last synced timestamp, and whether the package is proxy-only or mixed.
New file: services/packages/proxy.go
Core function: "if a package download request misses locally, check if this owner/org has proxy caching enabled, resolve the upstream URL, fetch it, store it through the existing CreatePackageAndAddFile flow, serve it."
This slots in naturally. The existing DownloadPackageFile handler in each ecosystem router currently calls GetFileStreamByPackageNameAndVersion and returns 404 on miss. The proxy version would catch that 404 and try upstream.
This is the bulk of the work. Each ecosystem handler needs a "proxy metadata" path that:
- Fetches metadata from upstream (package index, version list, etc.)
- Rewrites download URLs to point at the forgejo instance
- Merges with any locally-published versions
The metadata rewriting logic from git-pkgs/proxy's handlers is the reference here. It can't be copy-pasted because it's wired to the proxy's routing, but the rewrite functions themselves (like rewriteMetadata in npm.go) are straightforward to adapt.
Extending git-pkgs/registries to help here: The git-pkgs/registries library currently parses upstream responses into normalized Go structs and discards the raw response. For metadata proxying, we need the raw upstream response (the actual JSON npm returns, the PyPI simple index, etc.) so we can rewrite URLs and forward it.
Adding a FetchRawMetadata(ctx, name) ([]byte, string, error) method to the Registry interface would let each ecosystem return the raw bytes + content type from whatever endpoint a package manager client would normally hit. The internal HTTP client already has GetBody() -- it just needs to be exposed. This keeps all the per-ecosystem URL construction quirks (npm scopes, Go module case encoding, Maven group/artifact paths) inside the registries library rather than duplicating them in both the proxy and forgejo.
Add to forgejo's app.ini:
[packages.proxy]
ENABLED = false
UPSTREAM_NPM = https://registry.npmjs.org
UPSTREAM_CARGO = https://crates.io
; ... per-ecosystem upstream URLs
; auth tokens for private registries
NPM_TOKEN = ${FORGEJO_NPM_TOKEN}git-pkgs/proxy supports 16 ecosystems, forgejo supports 23. The overlap where proxy logic already exists:
| Ecosystem | Proxy handler | Forgejo handler | Proxy reuse |
|---|---|---|---|
| npm | yes | yes | URL rewriting, version extraction |
| cargo | yes | yes | Sparse index proxying |
| gem | yes | yes | Specs proxying, download |
| go | yes | yes | Module proxy protocol |
| hex | yes | yes | Metadata + tarball |
| pub | yes | yes | API proxying |
| pypi | yes | yes | Simple index rewriting |
| maven | yes | yes | POM + JAR proxying |
| nuget | yes | yes | Service index rewriting |
| composer | yes | yes | packages.json proxying |
| conan | yes | yes | Recipe proxying |
| conda | yes | yes | Channel data rewriting |
| cran | yes | yes | PACKAGES index proxying |
| container | yes | yes | OCI distribution spec |
| debian | yes | yes | Packages/Release proxying |
| rpm | yes | yes | Repodata proxying |
| alpine | no | yes | would need new handler |
| arch | no | yes | would need new handler |
| chef | no | yes | would need new handler |
| helm | no | yes | would need new handler |
| swift | no | yes | would need new handler |
| vagrant | no | yes | would need new handler |
| generic | no | yes | N/A (no upstream) |
- Step 1 (extract library): small, mostly moving files and updating imports
- Step 2 (model changes): small, property-based approach needs no migrations
- Step 3 (proxy service): medium, the core fetch-and-store loop
- Step 4 (metadata proxying): large, per-ecosystem work -- probably start with npm + go + cargo as the highest-value targets
- Step 5 (configuration): small
- Should proxy caching be per-user/org or instance-wide? Forgejo's package registry is owner-scoped, so there'd need to be a "system" or "global" proxy owner, or each org opts in individually.
- Cache invalidation strategy? The proxy currently has no TTL or invalidation. For forgejo, you'd probably want configurable TTL on metadata (say 5 minutes) while artifacts are immutable once cached.
- Should proxied packages appear in the forgejo UI alongside uploaded packages, or be visually distinguished?
- How to handle auth for private upstream registries within forgejo's settings model?