Fromager Improvement Suggestions

This document contains suggestions for reducing repetition in package plugins by adding new helper functions and configuration options to fromager.

Executive Summary

After analyzing the package plugins in this repository, we identified several recurring patterns that could be simplified through fromager enhancements:

19 common patterns across 40+ package plugins
15 suggested helper functions to reduce code duplication
12 suggested configuration options to move logic from code to YAML

Analysis Methodology
Suggested Helper Functions
Suggested Configuration Options
Pattern Analysis Details

Analysis Methodology

We analyzed package plugins including:

torch.py, torchao.py, torchaudio.py, torchvision.py
vllm.py, aotriton.py, triton.py
nvidia_cudnn_frontend.py, cmake.py
bitsandbytes.py, pyarrow.py, llvmlite.py
faiss_cpu.py, tilelang.py, outlines_core.py
Plus 25+ additional plugins

Common themes identified:

Git source resolution and downloading
Version control via environment variables
Variant-specific configuration (CPU, CUDA, ROCm, etc.)
External dependency management
CMake-based builds
File patching and modification

Suggested Helper Functions

1. Version Environment Variable Setter

Pattern: Many plugins set BUILD_VERSION, SETUPTOOLS_SCM_PRETEND_VERSION, PYTORCH_BUILD_VERSION, etc.

Current Implementation:

# torchaudio.py
def build_wheel(...):
    extra_environ["BUILD_VERSION"] = str(version)

# torchao.py
def update_extra_environ(...):
    if version is not None:
        extra_environ["SETUPTOOLS_SCM_PRETEND_VERSION_FOR_TORCHAO"] = version.base_version
        extra_environ["BUILD_VERSION"] = version.base_version

# torch.py
def update_extra_environ(...):
    if version is not None:
        extra_environ["PYTORCH_BUILD_VERSION"] = version.base_version
        extra_environ["PYTORCH_BUILD_NUMBER"] = post

Suggested Helper:

# In fromager
def set_version_environment_variables(
    extra_environ: dict[str, str],
    version: Version | None,
    *,
    package_name: str | None = None,
    include_build_number: bool = False,
    include_setuptools_scm: bool = False,
    use_base_version: bool = True,
) -> None:
    """Set common version-related environment variables.

    Args:
        extra_environ: Dictionary to update with version variables
        version: Package version
        package_name: Package name for scm variable (e.g., 'torchao')
        include_build_number: Set BUILD_NUMBER from version.post
        include_setuptools_scm: Set SETUPTOOLS_SCM_PRETEND_VERSION*
        use_base_version: Use version.base_version instead of str(version)
    """

Benefits:

Eliminates repetitive version control code in 15+ plugins
Standardizes version handling across packages
Reduces errors from inconsistent version variable naming

Affected Plugins: torch.py, torchaudio.py, torchvision.py, torchao.py, vllm.py, and 10+ others

2. Git Source Resolver Provider Builder

Pattern: Nearly identical get_resolver_provider implementations for GitLab/GitHub sources.

Current Implementation:

# torchao.py
def get_resolver_provider(...):
    if include_sdists:
        return resolver.GitLabTagProvider(
            project_path=PROJECT_PATH,
            constraints=ctx.constraints,
        )
    return resolver.default_resolver_provider(...)

# bitsandbytes.py
def get_resolver_provider(...):
    if include_sdists:
        return resolver.GitHubTagProvider(
            "bitsandbytes-foundation", "bitsandbytes", ctx.constraints
        )
    return resolver.default_resolver_provider(...)

Suggested Helper:

# In fromager
def git_resolver_provider(
    ctx: context.WorkContext,
    req: Requirement,
    sdist_server_url: str,
    include_sdists: bool,
    include_wheels: bool,
    *,
    provider_type: Literal["gitlab", "github"],
    project_path: str | None = None,
    organization: str | None = None,
    repo: str | None = None,
    matcher: MatchFunction | re.Pattern | None = None,
    req_type: resolver.RequirementType | None = None,
    ignore_platform: bool = False,
) -> resolver.PyPIProvider | resolver.GenericProvider:
    """Create a git-based resolver provider with standard fallback.

    Simplifies the common pattern of using GitLab/GitHub providers
    when sdists are available, falling back to default otherwise.
    """

Benefits:

Reduces 20+ lines to 1-2 lines per plugin
Standardizes git resolver pattern
Easier to maintain and test

Affected Plugins: torchao.py, nvidia_cudnn_frontend.py, tilelang.py, vllm.py, bitsandbytes.py, and 10+ others

3. PKG-INFO Ensurer Wrapper

Pattern: Most plugins using git sources need to call sources.ensure_pkg_info() in prepare_source.

Current Implementation:

# torchao.py
def prepare_source(...):
    source_root_dir, is_new = sources.default_prepare_source(...)
    if is_new:
        sources.ensure_pkg_info(
            ctx=ctx,
            req=req,
            version=version,
            sdist_root_dir=source_root_dir,
            build_dir=None,
        )
    return source_root_dir, is_new

Suggested Helper:

# In fromager
def prepare_source_with_pkg_info(
    ctx: context.WorkContext,
    req: Requirement,
    source_filename: pathlib.Path,
    version: Version,
    *,
    ensure_pkg_info: bool = True,
    build_dir: pathlib.Path | None = None,
) -> tuple[pathlib.Path, bool]:
    """Prepare source with automatic PKG-INFO generation.

    Combines default_prepare_source with ensure_pkg_info for the
    common case of git sources that need PKG-INFO metadata.
    """

Benefits:

Reduces 10+ lines to 1-2 lines
Prevents forgetting to call ensure_pkg_info
Standard pattern for git sources

Affected Plugins: torchao.py, vllm.py, tilelang.py, and 8+ others

4. Build Directory Finder for Monorepos

Pattern: Packages in monorepos need custom logic to find the correct build directory.

Current Implementation:

# triton.py
def _get_build_dir(sdist_root_dir: pathlib.Path) -> pathlib.Path:
    if sdist_root_dir.joinpath("setup.py").is_file():
        # Triton >= 3.4.0
        return sdist_root_dir
    build_dir = sdist_root_dir / "python"
    if build_dir.joinpath("setup.py").is_file():
        # Triton < 3.4.0
        return build_dir
    raise ValueError("setup.py not found")

Suggested Helper:

# In fromager
def find_build_directory(
    sdist_root_dir: pathlib.Path,
    *,
    search_paths: list[str | pathlib.Path] = [".", "python"],
    marker_files: list[str] = ["setup.py", "pyproject.toml"],
) -> pathlib.Path:
    """Find the build directory in a monorepo structure.

    Searches for marker files in potential subdirectories to locate
    the actual Python package within a larger repository.
    """

Benefits:

Eliminates custom build directory detection logic
Configurable for different monorepo layouts
Consistent error messages

Affected Plugins: triton.py (all 5 hook methods use this pattern)

5. CMake Version/Config Parser

Pattern: Multiple plugins parse CMake files to extract version numbers or configuration values.

Current Implementation:

# torch.py
def _clone_six_repo(...):
    cmakefilename = build_dir / "third_party/NNPACK/cmake/DownloadSix.cmake"
    content = cmakefilename.read_text(encoding="utf-8")
    pattern = r"six-(\d+\.\d+\.\d+)\.tar\.gz"
    match = re.search(pattern, content)
    if not match:
        raise RuntimeError(f"Could not determine six version in {cmakefilename}")
    six_version = match.group(1)

# vllm.py
def _clone_cutlass_repo(...):
    cmakefilename = source_root_dir / cmakefile
    content = cmakefilename.read_text(encoding="utf-8")
    pattern = r'set\(CUTLASS_REVISION "v(\d+\.\d+\.\d+)"'
    match = re.search(pattern, content)
    if not match:
        raise RuntimeError(f"Could not determine cutlass version in {cmakefilename}")
    cutlass_version = match.group(1)

Suggested Helper:

# In fromager
def parse_cmake_variable(
    cmake_file: pathlib.Path,
    variable_name: str,
    *,
    pattern: str | re.Pattern | None = None,
    required: bool = True,
) -> str | None:
    """Extract a variable value from a CMake file.

    Supports both set() commands and inline values with custom patterns.
    """

def parse_cmake_version(
    cmake_file: pathlib.Path,
    *,
    version_variable: str | None = None,
    version_pattern: str | re.Pattern | None = None,
) -> Version:
    """Extract a version number from a CMake file."""

Benefits:

Reduces 10+ lines to 1-2 lines
Standardized error handling
Supports multiple CMake patterns

Affected Plugins: torch.py, vllm.py (cutlass, flash-attn), triton.py

6. Variant-Specific Environment Builder

Pattern: Many plugins check variant and set environment variables conditionally.

Current Implementation:

# torch.py
def update_extra_environ(...):
    if ctx.variant.startswith("rocm"):
        platlib = build_env.run([...])
        extra_environ["AOTRITON_INSTALLED_PREFIX"] = os.path.join(platlib, "aotriton")

# bitsandbytes.py
def build_wheel(...):
    backends = ["cpu"]
    if ctx.variant.startswith("cuda"):
        backends.append("cuda")
    if ctx.variant.startswith("rocm"):
        backends.append("hip")

Suggested Helper:

# In fromager
class VariantEnvironBuilder:
    """Builder for variant-specific environment configuration."""

    def __init__(self, ctx: context.WorkContext):
        self.ctx = ctx
        self.environ = {}

    def when_variant(self, pattern: str, **env_vars: str) -> "VariantEnvironBuilder":
        """Set environment variables when variant matches pattern."""
        if self.ctx.variant.startswith(pattern):
            self.environ.update(env_vars)
        return self

    def when_cuda(self, **env_vars: str) -> "VariantEnvironBuilder":
        """Set environment variables for CUDA variant."""
        return self.when_variant("cuda", **env_vars)

    def when_rocm(self, **env_vars: str) -> "VariantEnvironBuilder":
        """Set environment variables for ROCm variant."""
        return self.when_variant("rocm", **env_vars)

    def build(self) -> dict[str, str]:
        """Return the built environment dictionary."""
        return self.environ

Benefits:

Fluent API for variant configuration
More readable than nested if statements
Reusable across plugins

Affected Plugins: torch.py, vllm.py, bitsandbytes.py, tilelang.py, aotriton.py

7. Requirements File Patcher

Pattern: Many plugins patch requirements.txt files with regex replacements.

Current Implementation:

# vllm.py (has this pattern repeated 3+ times)
def _fix_torch_cpu_dependency(...):
    cpu_requirements = source_root_dir / "requirements" / "cpu.txt"
    if cpu_requirements.is_file():
        replace_lines(
            cpu_requirements,
            [
                (r"torch==2\.6\.0\+cpu; platform_machine == \"x86_64\"",
                 'torch==2.7.1; platform_machine == "x86_64"'),
                (r"(torch==.+?)\+cpu(.*)", r"\1\2"),
            ],
        )

Suggested Helper:

# In fromager
def patch_requirements_file(
    requirements_file: pathlib.Path,
    replacements: list[tuple[str | re.Pattern, str]],
    *,
    skip_if_missing: bool = False,
) -> bool:
    """Patch a requirements file with regex replacements.

    Returns True if file was modified, False otherwise.
    """

def patch_requirements_files(
    source_root_dir: pathlib.Path,
    patches: dict[str, list[tuple[str, str]]],
) -> dict[str, bool]:
    """Patch multiple requirements files.

    Args:
        source_root_dir: Root directory of the source
        patches: Mapping of relative file paths to replacement lists

    Returns:
        Mapping of file paths to whether they were modified
    """

Benefits:

Cleaner syntax for requirement patching
Consistent error handling
Can be called declaratively

Affected Plugins: vllm.py (used 4+ times), and others

8. External Dependency Downloader

Pattern: Plugins download and extract external dependencies (tarballs, zip files).

Current Implementation:

# cmake.py
def prepare_source(...):
    if is_new:
        build_dir = source_root_dir / f"build/py3-none-{platform_tag}"
        build_dir.mkdir(parents=True, exist_ok=True)
        url = CMAKE_TARBALL_URL_TEMPLATE.format(version=cmake_version)
        downloaded_path = download_url(build_dir, url)

# nvidia_cudnn_frontend.py
def prepare_source(...):
    if is_new:
        dlpack_version = get_dlpack_version(source_root_dir)
        dlpack_url = DLPACK_URL_TEMPLATE.format(version=dlpack_version)
        dlpack_dir = source_root_dir / "dlpack"
        dlpack_dir.mkdir(parents=True, exist_ok=True)
        downloaded_path = download_url(source_root_dir, dlpack_url, "dlpack.tar.gz")
        with tarfile.open(downloaded_path) as tf:
            tf.extractall(dlpack_dir, filter=tarfilter)
        downloaded_path.unlink()

Suggested Helper:

# In fromager
def download_and_extract_dependency(
    destination_dir: pathlib.Path,
    url: str,
    *,
    archive_name: str | None = None,
    extract: bool = True,
    strip_components: int = 0,
    cleanup_archive: bool = True,
) -> pathlib.Path:
    """Download and optionally extract an external dependency.

    Args:
        destination_dir: Where to place the extracted files
        url: URL to download from
        archive_name: Custom name for downloaded archive
        extract: Whether to extract the archive
        strip_components: Number of leading path components to strip
        cleanup_archive: Remove archive after extraction

    Returns:
        Path to extracted directory or downloaded file
    """

Benefits:

Reduces 15+ lines to 3-5 lines
Handles multiple archive formats
Consistent error handling

Affected Plugins: cmake.py, nvidia_cudnn_frontend.py

9. Multi-Backend CMake Builder

Pattern: Some packages build the same code for multiple backends (CPU, CUDA, ROCm).

Current Implementation:

# bitsandbytes.py
def _build_libbitsandbytes(...):
    # 40+ lines of cmake configure + build
    cmake_generate = ["cmake", "-S", str(sdist_root_dir), "-B", str(cmake_build_dir), ...]
    build_env.run(cmake_generate, ...)
    cmake_build = ["cmake", "--build", str(cmake_build_dir), ...]
    build_env.run(cmake_build, ...)

def build_wheel(...):
    backends = ["cpu"]
    if ctx.variant.startswith("cuda"):
        backends.append("cuda")
    if ctx.variant.startswith("rocm"):
        backends.append("hip")
    for compute_backend in backends:
        _build_libbitsandbytes(..., compute_backend=compute_backend)

Suggested Helper:

# In fromager
class CMakeBackendBuilder:
    """Build the same source for multiple backends."""

    def __init__(
        self,
        ctx: context.WorkContext,
        build_env: build_environment.BuildEnvironment,
        source_dir: pathlib.Path,
    ):
        self.ctx = ctx
        self.build_env = build_env
        self.source_dir = source_dir
        self.backends = []

    def add_backend(
        self,
        name: str,
        cmake_options: dict[str, str],
        *,
        condition: bool = True,
    ) -> "CMakeBackendBuilder":
        """Add a backend to build."""
        if condition:
            self.backends.append((name, cmake_options))
        return self

    def build_all(
        self,
        extra_environ: dict[str, str],
        *,
        generator: str = "Ninja",
        build_type: str = "Release",
    ) -> dict[str, pathlib.Path]:
        """Build all registered backends."""

Benefits:

Declarative multi-backend builds
Reduces code duplication
Easier to add new backends

Affected Plugins: bitsandbytes.py, tilelang.py

10. LLVM Path Finder

Pattern: Multiple plugins need to locate and configure LLVM installations.

Current Implementation:

# triton.py
def build_wheel(...):
    llvm_triton_version_file = sdist_root_dir / "cmake" / "llvm-hash.txt"
    llvm_triton_version = llvm_triton_version_file.read_text(encoding="utf-8")[:8]
    llvm_triton_dir = pathlib.Path(f"/usr/lib64/llvm-triton-{llvm_triton_version}")
    if not llvm_triton_dir.is_dir():
        raise FileNotFoundError(f"Cannot find the llvm-triton directory...")
    extra_environ["LLVM_SYSPATH"] = f"/usr/lib64/llvm-triton-{llvm_triton_version}"

# aotriton.py
def update_extra_environ(...):
    llvm_syspath = "/usr/lib64/llvm-triton-" + os.environ["LLVM_AOTRITON_09B0_COMMIT"]
    if ctx.variant.startswith("rocm"):
        if version == Version("0.10b"):
            llvm_syspath = llvm_syspath[:-8] + os.environ["LLVM_AOTRITON_010B0_COMMIT"]
        extra_environ["LLVM_SYSPATH"] = llvm_syspath
        extra_environ["LLVM_INCLUDE_DIRS"] = f"/usr/lib64/{llvm_syspath}/include"
        extra_environ["LLVM_LIBRARY_DIR"] = f"/usr/lib64/{llvm_syspath}/lib"

Suggested Helper:

# In fromager
def find_llvm_installation(
    *,
    version_file: pathlib.Path | None = None,
    version_env_var: str | None = None,
    base_path: pathlib.Path = pathlib.Path("/usr/lib64"),
    prefix: str = "llvm-",
    required: bool = True,
) -> pathlib.Path | None:
    """Locate an LLVM installation directory."""

def configure_llvm_environment(
    extra_environ: dict[str, str],
    llvm_dir: pathlib.Path,
    *,
    set_syspath: bool = True,
    set_include_dirs: bool = False,
    set_library_dir: bool = False,
) -> None:
    """Configure environment variables for LLVM installation."""

Benefits:

Standardizes LLVM discovery
Reduces error-prone path manipulation
Consistent error messages

Affected Plugins: triton.py, aotriton.py, llvmlite.py

11. Git Clone with External Project Detection

Pattern: Clone external dependencies referenced in CMake FetchContent declarations.

Current Implementation:

# vllm.py
def _clone_external_project_repo(...):
    cmakefilename = source_root_dir / cmakefile
    # Parse 30+ lines of CMake to find GIT_REPOSITORY and GIT_TAG
    # Look for FetchContent_Declare pattern
    for i in range(len(lines) - 3):
        if (current_lines[0].startswith("FetchContent_Declare(")
            and "GIT_REPOSITORY" in current_lines[2]
            and "GIT_TAG" in current_lines[3]):
            # Extract and parse...
    git_clone(ctx=ctx, req=Requirement(clonedir), ref=commit_hash, ...)

Suggested Helper:

# In fromager
def parse_cmake_fetch_content(
    cmake_file: pathlib.Path,
    project_name: str,
) -> dict[str, str]:
    """Parse CMake FetchContent_Declare to extract git info.

    Returns:
        Dictionary with 'git_repository', 'git_tag', etc.
    """

def clone_cmake_fetch_content_dependency(
    ctx: context.WorkContext,
    source_root_dir: pathlib.Path,
    cmake_file: str | pathlib.Path,
    project_name: str,
    destination: pathlib.Path,
    *,
    submodules: bool = False,
) -> pathlib.Path:
    """Clone a dependency declared in CMake FetchContent."""

Benefits:

Eliminates complex CMake parsing
Reusable for any FetchContent dependency
Reduces 50+ lines to 5 lines

Affected Plugins: vllm.py (used twice for cutlass and flash-attention)

12. Custom Package Infrastructure Generator

Pattern: Some packages need custom pyproject.toml, setup.py, and init.py files generated.

Current Implementation:

# aotriton.py (similar in tilelang.py)
PYPROJECT_TOML = """
[build-system]
requires = [...]
build-backend = "setuptools.build_meta"
...
"""

INIT_PY = """
import pathlib
def get_aotriton_include() -> pathlib.Path:
    return HERE / "include"
...
"""

def build_wheel(...):
    wheel_dir.joinpath("pyproject.toml").write_text(PYPROJECT_TOML.format(version=version))
    wheel_dir.joinpath("setup.py").write_text(SETUP_PY)
    install_dir.joinpath("__init__.py").write_text(INIT_PY)

Suggested Helper:

# In fromager
def generate_package_infrastructure(
    package_dir: pathlib.Path,
    package_name: str,
    version: Version,
    *,
    build_requires: list[str] | None = None,
    build_backend: str = "setuptools.build_meta",
    package_data: dict[str, list[str]] | None = None,
    include_paths: list[str] | None = None,
    lib_paths: list[str] | None = None,
) -> None:
    """Generate pyproject.toml, setup.py, and __init__.py for a package.

    Useful for packages that compile native code and need custom
    packaging infrastructure.
    """

Benefits:

Reduces 50+ lines of template strings
Standardizes package structure
Easier to maintain templates

Affected Plugins: aotriton.py, tilelang.py

13. Parallel Job Calculator

Pattern: Calculate optimal number of parallel jobs for builds.

Current Implementation:

# tilelang.py
cores = os.cpu_count() or 1
make_jobs = max(1, (cores * 75) // 100)
ninja_cmd = ["ninja", f"-j{make_jobs}"]

# pyarrow.py
pbi = ctx.package_build_info(req)
jobs = pbi.parallel_jobs()
environ_vars = {"PYARROW_PARALLEL": str(jobs)}

Suggested Helper:

# In fromager
def get_parallel_jobs(
    ctx: context.WorkContext,
    req: Requirement,
    *,
    percentage: int = 100,
    max_jobs: int | None = None,
) -> int:
    """Calculate optimal number of parallel jobs.

    Args:
        ctx: Work context
        req: Package requirement
        percentage: Percentage of cores to use (default 100)
        max_jobs: Maximum number of jobs (default unlimited)

    Returns:
        Number of parallel jobs to use
    """

Benefits:

Standardizes job calculation
Respects system limits
Consistent across builds

Affected Plugins: tilelang.py, pyarrow.py, and others using MAX_JOBS

14. Rust Vendoring with Patching

Pattern: Vendor Rust dependencies and apply patches.

Current Implementation:

# outlines_core.py
def prepare_source(...):
    source_root_dir, is_new = sources.unpack_source(...)
    if is_new:
        vendor_rust.vendor_rust(req, source_root_dir)
        if version in {Version("0.2.10"), Version("0.2.11")}:
            _patch_copy_aws_lc_sys(source_root_dir)
        sources.patch_source(ctx, source_root_dir, req, version)
        pyproject.apply_project_override(...)

Suggested Helper:

# In fromager
def prepare_rust_source(
    ctx: context.WorkContext,
    req: Requirement,
    source_filename: pathlib.Path,
    version: Version,
    *,
    vendor_first: bool = True,
    patch_crates: dict[str, pathlib.Path] | None = None,
) -> tuple[pathlib.Path, bool]:
    """Prepare Rust source with vendoring and patching.

    Args:
        vendor_first: Vendor before applying patches
        patch_crates: Crates to patch-copy (name -> source path)
    """

Benefits:

Handles Rust-specific workflow
Reduces repetitive unpacking/vendoring/patching
Supports cargo patch mechanism

Affected Plugins: outlines_core.py

15. Download URL from Tag Extractor

Pattern: Extract git repository details from download URLs.

Current Implementation:

# torchao.py, tilelang.py
def download_source(...):
    ref = get_tag_from_gitlab_archive_url(download_url)
    download_url = f"https://gitlab.com{PROJECT_PATH}.git"
    return clone_and_make_sdist(..., repo_url=download_url, tag=ref, ...)

Suggested Helper:

# In fromager (enhance existing functionality)
def extract_git_info_from_url(
    url: str,
    *,
    provider: Literal["gitlab", "github"] | None = None,
) -> dict[str, str]:
    """Extract git repository info from archive URL.

    Returns:
        Dictionary with 'provider', 'project_path', 'tag', 'clone_url'
    """

Benefits:

Eliminates manual URL parsing
Handles both GitLab and GitHub
Returns all needed git information

Affected Plugins: torchao.py, tilelang.py

Suggested Configuration Options

1. Git Source Configuration in YAML

Current: Requires Python plugin to specify git sources.

Proposed YAML:

# overrides/settings/package_name.yaml
source:
  type: git
  provider: gitlab  # or github
  project_path: /redhat/rhel-ai/core/mirrors/github/org/repo
  # OR for GitHub:
  # organization: org-name
  # repo: repo-name
  tag_pattern: "v{version}"  # optional, default: "v{version}"
  submodules: true
  matcher: "^midstream-cuda-v(.*)"  # optional regex for custom tag matching

Benefits:

Eliminates need for get_resolver_provider plugin for simple cases
Declarative git configuration
Easier to maintain

Would Replace Code In: 15+ plugins with simple git source resolution

2. Version Environment Variables in YAML

Current: Requires plugin to set version env vars.

Proposed YAML:

# overrides/settings/package_name.yaml
env:
  version_variables:
    BUILD_VERSION: "{version.base_version}"
    SETUPTOOLS_SCM_PRETEND_VERSION_FOR_{PACKAGE_NAME_UPPER}: "{version.base_version}"
    PYTORCH_BUILD_NUMBER: "{version.post}"
  # Variables can use placeholders:
  # {version}, {version.base_version}, {version.major}, {version.minor}, etc.
  # {PACKAGE_NAME}, {PACKAGE_NAME_UPPER}, {PACKAGE_NAME_LOWER}

Benefits:

No plugin needed for simple version control
Template syntax for version components
Clear and declarative

Would Replace Code In: 15+ plugins that only set version env vars

3. Build Directory Override in YAML

Current: Requires plugin with custom build_dir logic.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  build_directory: python  # relative to sdist root
  # OR
  build_directory_search:
    - "."
    - "python"
    - "src"
  marker_files:
    - setup.py
    - pyproject.toml

Benefits:

Handles monorepo layouts declaratively
No plugin needed for simple cases
Clear documentation of structure

Would Replace Code In: triton.py (and any future monorepo packages)

4. Variant-Specific Environment Variables in YAML

Current: Requires plugin with conditional logic.

Proposed YAML:

# overrides/settings/package_name.yaml
variants:
  cuda-ubi9:
    env:
      CUDA_ENABLED: "1"
      PYARROW_WITH_CUDA: "1"
  rocm-ubi9:
    env:
      ROCM_ENABLED: "1"
      AOTRITON_INSTALLED_PREFIX: "{platlib}/aotriton"
  cpu-ubi9:
    env:
      CPU_ONLY: "1"

Benefits:

Declarative variant configuration
No plugin needed for simple env vars
Easy to add new variants

Would Replace Code In: torch.py, pyarrow.py, bitsandbytes.py, and others

5. External Dependencies List in YAML

Current: Requires plugin to download external files.

Proposed YAML:

# overrides/settings/package_name.yaml
external_dependencies:
  - name: dlpack
    url: "https://gitlab.com/.../dlpack/-/archive/v{dlpack_version}/dlpack-v{dlpack_version}.tar.gz"
    version_file: dlpack_version.txt  # read version from this file
    destination: dlpack/
    extract: true
    strip_components: 1
  - name: cmake-source
    url: "https://github.com/Kitware/CMake/releases/download/v{version}/cmake-{version}.tar.gz"
    destination: "build/py3-none-{platform}/cmake-source.tar.gz"
    extract: false

Benefits:

Declarative dependency management
No plugin for simple downloads
Clear dependency documentation

Would Replace Code In: cmake.py, nvidia_cudnn_frontend.py

6. Line Replacement Rules in YAML

Current: Requires plugin with replace_lines calls.

Proposed YAML:

# overrides/settings/package_name.yaml
source_patches:
  requirements/cpu.txt:
    - pattern: 'torch==2\.6\.0\+cpu; platform_machine == "x86_64"'
      replacement: 'torch==2.7.1; platform_machine == "x86_64"'
    - pattern: '(torch==.+?)\+cpu(.*)'
      replacement: '\1\2'
  requirements/tpu.txt:
    - pattern: '^nixl==.*$'
      replacement: ''  # empty = remove line
  setup.py:
    - pattern: '(\s+)version = (get_version\([^)]+\))'
      replacement: '\1return \2'

Benefits:

Declarative patching
No plugin for simple replacements
Version control friendly

Would Replace Code In: vllm.py (multiple patch functions)

7. Build Backend List in YAML

Current: Requires plugin to build multiple backends.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  type: cmake_multi_backend
  backends:
    cpu:
      always: true
      cmake_options:
        COMPUTE_BACKEND: cpu
    cuda:
      when_variant: cuda
      cmake_options:
        COMPUTE_BACKEND: cuda
        COMPUTE_CAPABILITY: "{cuda_arch_list}"
    hip:
      when_variant: rocm
      cmake_options:
        COMPUTE_BACKEND: hip
        BNB_ROCM_ARCH: "{rocm_arch}"

Benefits:

Declarative multi-backend builds
No plugin for standard CMake builds
Clear build configuration

Would Replace Code In: bitsandbytes.py, tilelang.py

8. Custom Resolver Provider Configuration in YAML

Current: Requires plugin to specify resolver type.

Proposed YAML:

# overrides/settings/package_name.yaml
resolver:
  type: gitlab_tag  # or github_tag, pypi, custom
  project_path: /redhat/rhel-ai/core/mirrors/github/org/repo
  # For custom matchers:
  matcher:
    type: regex
    pattern: '^midstream-{variant}-v(\d+\.\d+\.\d+)'
    groups:
      version: 1
  # OR
  matcher:
    type: function
    module: package_plugins.vllm
    function: _create_midstream_matcher

Benefits:

Simple cases don't need plugins
Complex cases can use custom functions
Clear resolver documentation

Would Replace Code In: vllm.py, torchao.py, and 10+ others (for simple cases)

9. Pre/Post Build Scripts in YAML

Current: Requires plugin to run custom commands.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  pre_build_scripts:
    - script: "python generate_self_schema.py"
      working_directory: "{sdist_root_dir}"
      condition: "file_exists('generate_self_schema.py')"
    - script: "python tools/amd_build/build_amd.py"
      working_directory: "{sdist_root_dir}"
      condition: "variant_matches('rocm')"
      env:
        SIX_CLONE_DIR: "{sdist_root_dir}/third_party/six"

Benefits:

Simple pre/post build hooks without plugin
Conditional execution
Clear script documentation

Would Replace Code In: pydantic_core.py, torch.py (ROCm build)

10. CMake Build Configuration in YAML

Current: Requires plugin for CMake builds.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  type: cmake
  configure:
    source_dir: cpp
    build_dir: cpp/build
    generator: Ninja
    options:
      CMAKE_BUILD_TYPE: Release
      CMAKE_INSTALL_PREFIX: "{dist_dir}"
      ARROW_COMPUTE: ON
      ARROW_CUDA: "ON if variant == 'cuda-ubi9' else OFF"
  build:
    targets: [all]
  install:
    targets: [install]
  env:
    LD_LIBRARY_PATH: "{dist_dir}/lib:{LD_LIBRARY_PATH}"
    CMAKE_PREFIX_PATH: "{dist_dir}:{CMAKE_PREFIX_PATH}"

Benefits:

Declarative CMake builds
No plugin for standard CMake
Supports complex builds

Would Replace Code In: pyarrow.py (partially), and future CMake packages

11. Ensure PKG-INFO Flag in YAML

Current: Requires plugin to call ensure_pkg_info.

Proposed YAML:

# overrides/settings/package_name.yaml
source:
  type: git
  ensure_pkg_info: true  # automatically call ensure_pkg_info in prepare_source
  pkg_info:
    build_dir: null  # or specify a subdirectory

Benefits:

Automatic PKG-INFO for git sources
No plugin needed
Clear metadata requirements

Would Replace Code In: 10+ plugins with just ensure_pkg_info calls

12. LLVM Installation Configuration in YAML

Current: Requires plugin to locate LLVM.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  llvm:
    version_from_file: cmake/llvm-hash.txt
    version_length: 8  # use first 8 chars
    # OR
    version_from_env: LLVM_TRITON_VERSION
    base_path: /usr/lib64
    prefix: llvm-triton-
    env:
      LLVM_SYSPATH: "{llvm_dir}"
      LLVM_INCLUDE_DIRS: "{llvm_dir}/include"
      LLVM_LIBRARY_DIR: "{llvm_dir}/lib"

Benefits:

Declarative LLVM configuration
No plugin for standard LLVM lookup
Clear dependency documentation

Would Replace Code In: triton.py, aotriton.py, llvmlite.py (partially)

Pattern Analysis Details

Pattern Frequency Analysis

Pattern	Occurrences	Plugins Affected	Complexity
Version environment variables	15+	torch, torchao, torchaudio, vllm, etc.	Low
Git source resolution	15+	torchao, nvidia_cudnn_frontend, etc.	Medium
PKG-INFO generation	10+	torchao, vllm, tilelang, etc.	Low
Variant-specific env vars	10+	torch, vllm, bitsandbytes, etc.	Medium
External dependency download	5+	cmake, nvidia_cudnn_frontend, etc.	Medium
CMake version parsing	5+	torch, vllm, triton, etc.	Medium
Requirements file patching	5+	vllm (4 times alone)	Low
Multi-backend builds	3+	bitsandbytes, tilelang, aotriton	High
LLVM path configuration	3+	triton, aotriton, llvmlite	Medium
Monorepo build directory	2+	triton (multiple hooks)	Medium
Custom package infrastructure	2+	aotriton, tilelang	High
FetchContent parsing	2+	vllm (cutlass, flash-attn)	High
Rust vendoring	1+	outlines_core	Low

Code Reduction Estimates

If all suggestions were implemented:

Helper Functions: Would reduce plugin code by approximately 30-40%
- Example: torchao.py could go from ~103 lines to ~60 lines
- Example: vllm.py could go from ~527 lines to ~300 lines
Configuration Options: Would eliminate approximately 40-50% of simple plugins
- Plugins that only set env vars or resolve git sources could be pure YAML
- Example: torchaudio.py (32 lines) → pure YAML config
Combined: Estimated 50-60% reduction in total plugin code

Maintainability Benefits

Reduced Duplication: Same logic isn't repeated across 15+ files
Easier Testing: Helper functions can be unit tested in fromager
Clearer Intent: YAML configuration is more readable than Python code
Lower Barrier: Adding packages requires less Python knowledge
Consistency: Standard patterns enforced by helpers

Implementation Priority

High Priority (immediate impact, low complexity):

Version environment variable setter
Git resolver provider builder
PKG-INFO ensurer wrapper
Requirements file patcher
Version env vars in YAML
Git source configuration in YAML

Medium Priority (good impact, medium complexity): 7. Variant-specific environment builder 8. External dependency downloader 9. CMake version parser 10. Variant-specific env vars in YAML 11. External dependencies in YAML 12. Line replacement rules in YAML

Lower Priority (specialized use cases, high complexity): 13. Multi-backend CMake builder 14. Custom package infrastructure generator 15. FetchContent parser 16. Build backend list in YAML 17. CMake build configuration in YAML

Conclusion

These suggestions would significantly reduce repetition in package plugins while maintaining flexibility for complex cases. The combination of helper functions for reusable logic and YAML configuration for declarative patterns would:

Reduce total plugin code by 50-60%
Eliminate the need for Python plugins in many simple cases
Make the build system more maintainable and accessible
Standardize common patterns across the ecosystem

Next Steps

Review and prioritize suggestions with the fromager team
Implement high-priority helpers and config options
Migrate existing plugins to use new features
Document best practices for plugin development
Create templates for common plugin patterns

Questions for Discussion

Which patterns are most valuable to standardize in fromager vs. keeping as local utilities?
Should configuration options support complex expressions or remain simple?
How should we handle backward compatibility during migration?
What's the right balance between flexibility and simplification?

dhellmann/fromager-suggestions.md

Fromager Improvement Suggestions

Executive Summary

Table of Contents

Analysis Methodology

Suggested Helper Functions

1. Version Environment Variable Setter

2. Git Source Resolver Provider Builder

3. PKG-INFO Ensurer Wrapper

4. Build Directory Finder for Monorepos

5. CMake Version/Config Parser

6. Variant-Specific Environment Builder

7. Requirements File Patcher

8. External Dependency Downloader

9. Multi-Backend CMake Builder

10. LLVM Path Finder

11. Git Clone with External Project Detection

12. Custom Package Infrastructure Generator

13. Parallel Job Calculator

14. Rust Vendoring with Patching

15. Download URL from Tag Extractor

Suggested Configuration Options

1. Git Source Configuration in YAML

2. Version Environment Variables in YAML

3. Build Directory Override in YAML

4. Variant-Specific Environment Variables in YAML

5. External Dependencies List in YAML

6. Line Replacement Rules in YAML

7. Build Backend List in YAML

8. Custom Resolver Provider Configuration in YAML

9. Pre/Post Build Scripts in YAML

10. CMake Build Configuration in YAML

11. Ensure PKG-INFO Flag in YAML

12. LLVM Installation Configuration in YAML

Pattern Analysis Details

Pattern Frequency Analysis

Code Reduction Estimates

Maintainability Benefits

Implementation Priority

Conclusion

Next Steps

Questions for Discussion