Skip to content

Instantly share code, notes, and snippets.

@dhellmann
Created October 21, 2025 22:27
Show Gist options
  • Save dhellmann/3e8efa843bddd8fd355059e9eb7fe7d9 to your computer and use it in GitHub Desktop.
Save dhellmann/3e8efa843bddd8fd355059e9eb7fe7d9 to your computer and use it in GitHub Desktop.

Fromager Improvement Suggestions

This document contains suggestions for reducing repetition in package plugins by adding new helper functions and configuration options to fromager.

Executive Summary

After analyzing the package plugins in this repository, we identified several recurring patterns that could be simplified through fromager enhancements:

  • 19 common patterns across 40+ package plugins
  • 15 suggested helper functions to reduce code duplication
  • 12 suggested configuration options to move logic from code to YAML

Table of Contents

  1. Analysis Methodology
  2. Suggested Helper Functions
  3. Suggested Configuration Options
  4. Pattern Analysis Details

Analysis Methodology

We analyzed package plugins including:

  • torch.py, torchao.py, torchaudio.py, torchvision.py
  • vllm.py, aotriton.py, triton.py
  • nvidia_cudnn_frontend.py, cmake.py
  • bitsandbytes.py, pyarrow.py, llvmlite.py
  • faiss_cpu.py, tilelang.py, outlines_core.py
  • Plus 25+ additional plugins

Common themes identified:

  • Git source resolution and downloading
  • Version control via environment variables
  • Variant-specific configuration (CPU, CUDA, ROCm, etc.)
  • External dependency management
  • CMake-based builds
  • File patching and modification

Suggested Helper Functions

1. Version Environment Variable Setter

Pattern: Many plugins set BUILD_VERSION, SETUPTOOLS_SCM_PRETEND_VERSION, PYTORCH_BUILD_VERSION, etc.

Current Implementation:

# torchaudio.py
def build_wheel(...):
    extra_environ["BUILD_VERSION"] = str(version)

# torchao.py
def update_extra_environ(...):
    if version is not None:
        extra_environ["SETUPTOOLS_SCM_PRETEND_VERSION_FOR_TORCHAO"] = version.base_version
        extra_environ["BUILD_VERSION"] = version.base_version

# torch.py
def update_extra_environ(...):
    if version is not None:
        extra_environ["PYTORCH_BUILD_VERSION"] = version.base_version
        extra_environ["PYTORCH_BUILD_NUMBER"] = post

Suggested Helper:

# In fromager
def set_version_environment_variables(
    extra_environ: dict[str, str],
    version: Version | None,
    *,
    package_name: str | None = None,
    include_build_number: bool = False,
    include_setuptools_scm: bool = False,
    use_base_version: bool = True,
) -> None:
    """Set common version-related environment variables.

    Args:
        extra_environ: Dictionary to update with version variables
        version: Package version
        package_name: Package name for scm variable (e.g., 'torchao')
        include_build_number: Set BUILD_NUMBER from version.post
        include_setuptools_scm: Set SETUPTOOLS_SCM_PRETEND_VERSION*
        use_base_version: Use version.base_version instead of str(version)
    """

Benefits:

  • Eliminates repetitive version control code in 15+ plugins
  • Standardizes version handling across packages
  • Reduces errors from inconsistent version variable naming

Affected Plugins: torch.py, torchaudio.py, torchvision.py, torchao.py, vllm.py, and 10+ others


2. Git Source Resolver Provider Builder

Pattern: Nearly identical get_resolver_provider implementations for GitLab/GitHub sources.

Current Implementation:

# torchao.py
def get_resolver_provider(...):
    if include_sdists:
        return resolver.GitLabTagProvider(
            project_path=PROJECT_PATH,
            constraints=ctx.constraints,
        )
    return resolver.default_resolver_provider(...)

# bitsandbytes.py
def get_resolver_provider(...):
    if include_sdists:
        return resolver.GitHubTagProvider(
            "bitsandbytes-foundation", "bitsandbytes", ctx.constraints
        )
    return resolver.default_resolver_provider(...)

Suggested Helper:

# In fromager
def git_resolver_provider(
    ctx: context.WorkContext,
    req: Requirement,
    sdist_server_url: str,
    include_sdists: bool,
    include_wheels: bool,
    *,
    provider_type: Literal["gitlab", "github"],
    project_path: str | None = None,
    organization: str | None = None,
    repo: str | None = None,
    matcher: MatchFunction | re.Pattern | None = None,
    req_type: resolver.RequirementType | None = None,
    ignore_platform: bool = False,
) -> resolver.PyPIProvider | resolver.GenericProvider:
    """Create a git-based resolver provider with standard fallback.

    Simplifies the common pattern of using GitLab/GitHub providers
    when sdists are available, falling back to default otherwise.
    """

Benefits:

  • Reduces 20+ lines to 1-2 lines per plugin
  • Standardizes git resolver pattern
  • Easier to maintain and test

Affected Plugins: torchao.py, nvidia_cudnn_frontend.py, tilelang.py, vllm.py, bitsandbytes.py, and 10+ others


3. PKG-INFO Ensurer Wrapper

Pattern: Most plugins using git sources need to call sources.ensure_pkg_info() in prepare_source.

Current Implementation:

# torchao.py
def prepare_source(...):
    source_root_dir, is_new = sources.default_prepare_source(...)
    if is_new:
        sources.ensure_pkg_info(
            ctx=ctx,
            req=req,
            version=version,
            sdist_root_dir=source_root_dir,
            build_dir=None,
        )
    return source_root_dir, is_new

Suggested Helper:

# In fromager
def prepare_source_with_pkg_info(
    ctx: context.WorkContext,
    req: Requirement,
    source_filename: pathlib.Path,
    version: Version,
    *,
    ensure_pkg_info: bool = True,
    build_dir: pathlib.Path | None = None,
) -> tuple[pathlib.Path, bool]:
    """Prepare source with automatic PKG-INFO generation.

    Combines default_prepare_source with ensure_pkg_info for the
    common case of git sources that need PKG-INFO metadata.
    """

Benefits:

  • Reduces 10+ lines to 1-2 lines
  • Prevents forgetting to call ensure_pkg_info
  • Standard pattern for git sources

Affected Plugins: torchao.py, vllm.py, tilelang.py, and 8+ others


4. Build Directory Finder for Monorepos

Pattern: Packages in monorepos need custom logic to find the correct build directory.

Current Implementation:

# triton.py
def _get_build_dir(sdist_root_dir: pathlib.Path) -> pathlib.Path:
    if sdist_root_dir.joinpath("setup.py").is_file():
        # Triton >= 3.4.0
        return sdist_root_dir
    build_dir = sdist_root_dir / "python"
    if build_dir.joinpath("setup.py").is_file():
        # Triton < 3.4.0
        return build_dir
    raise ValueError("setup.py not found")

Suggested Helper:

# In fromager
def find_build_directory(
    sdist_root_dir: pathlib.Path,
    *,
    search_paths: list[str | pathlib.Path] = [".", "python"],
    marker_files: list[str] = ["setup.py", "pyproject.toml"],
) -> pathlib.Path:
    """Find the build directory in a monorepo structure.

    Searches for marker files in potential subdirectories to locate
    the actual Python package within a larger repository.
    """

Benefits:

  • Eliminates custom build directory detection logic
  • Configurable for different monorepo layouts
  • Consistent error messages

Affected Plugins: triton.py (all 5 hook methods use this pattern)


5. CMake Version/Config Parser

Pattern: Multiple plugins parse CMake files to extract version numbers or configuration values.

Current Implementation:

# torch.py
def _clone_six_repo(...):
    cmakefilename = build_dir / "third_party/NNPACK/cmake/DownloadSix.cmake"
    content = cmakefilename.read_text(encoding="utf-8")
    pattern = r"six-(\d+\.\d+\.\d+)\.tar\.gz"
    match = re.search(pattern, content)
    if not match:
        raise RuntimeError(f"Could not determine six version in {cmakefilename}")
    six_version = match.group(1)

# vllm.py
def _clone_cutlass_repo(...):
    cmakefilename = source_root_dir / cmakefile
    content = cmakefilename.read_text(encoding="utf-8")
    pattern = r'set\(CUTLASS_REVISION "v(\d+\.\d+\.\d+)"'
    match = re.search(pattern, content)
    if not match:
        raise RuntimeError(f"Could not determine cutlass version in {cmakefilename}")
    cutlass_version = match.group(1)

Suggested Helper:

# In fromager
def parse_cmake_variable(
    cmake_file: pathlib.Path,
    variable_name: str,
    *,
    pattern: str | re.Pattern | None = None,
    required: bool = True,
) -> str | None:
    """Extract a variable value from a CMake file.

    Supports both set() commands and inline values with custom patterns.
    """

def parse_cmake_version(
    cmake_file: pathlib.Path,
    *,
    version_variable: str | None = None,
    version_pattern: str | re.Pattern | None = None,
) -> Version:
    """Extract a version number from a CMake file."""

Benefits:

  • Reduces 10+ lines to 1-2 lines
  • Standardized error handling
  • Supports multiple CMake patterns

Affected Plugins: torch.py, vllm.py (cutlass, flash-attn), triton.py


6. Variant-Specific Environment Builder

Pattern: Many plugins check variant and set environment variables conditionally.

Current Implementation:

# torch.py
def update_extra_environ(...):
    if ctx.variant.startswith("rocm"):
        platlib = build_env.run([...])
        extra_environ["AOTRITON_INSTALLED_PREFIX"] = os.path.join(platlib, "aotriton")

# bitsandbytes.py
def build_wheel(...):
    backends = ["cpu"]
    if ctx.variant.startswith("cuda"):
        backends.append("cuda")
    if ctx.variant.startswith("rocm"):
        backends.append("hip")

Suggested Helper:

# In fromager
class VariantEnvironBuilder:
    """Builder for variant-specific environment configuration."""

    def __init__(self, ctx: context.WorkContext):
        self.ctx = ctx
        self.environ = {}

    def when_variant(self, pattern: str, **env_vars: str) -> "VariantEnvironBuilder":
        """Set environment variables when variant matches pattern."""
        if self.ctx.variant.startswith(pattern):
            self.environ.update(env_vars)
        return self

    def when_cuda(self, **env_vars: str) -> "VariantEnvironBuilder":
        """Set environment variables for CUDA variant."""
        return self.when_variant("cuda", **env_vars)

    def when_rocm(self, **env_vars: str) -> "VariantEnvironBuilder":
        """Set environment variables for ROCm variant."""
        return self.when_variant("rocm", **env_vars)

    def build(self) -> dict[str, str]:
        """Return the built environment dictionary."""
        return self.environ

Benefits:

  • Fluent API for variant configuration
  • More readable than nested if statements
  • Reusable across plugins

Affected Plugins: torch.py, vllm.py, bitsandbytes.py, tilelang.py, aotriton.py


7. Requirements File Patcher

Pattern: Many plugins patch requirements.txt files with regex replacements.

Current Implementation:

# vllm.py (has this pattern repeated 3+ times)
def _fix_torch_cpu_dependency(...):
    cpu_requirements = source_root_dir / "requirements" / "cpu.txt"
    if cpu_requirements.is_file():
        replace_lines(
            cpu_requirements,
            [
                (r"torch==2\.6\.0\+cpu; platform_machine == \"x86_64\"",
                 'torch==2.7.1; platform_machine == "x86_64"'),
                (r"(torch==.+?)\+cpu(.*)", r"\1\2"),
            ],
        )

Suggested Helper:

# In fromager
def patch_requirements_file(
    requirements_file: pathlib.Path,
    replacements: list[tuple[str | re.Pattern, str]],
    *,
    skip_if_missing: bool = False,
) -> bool:
    """Patch a requirements file with regex replacements.

    Returns True if file was modified, False otherwise.
    """

def patch_requirements_files(
    source_root_dir: pathlib.Path,
    patches: dict[str, list[tuple[str, str]]],
) -> dict[str, bool]:
    """Patch multiple requirements files.

    Args:
        source_root_dir: Root directory of the source
        patches: Mapping of relative file paths to replacement lists

    Returns:
        Mapping of file paths to whether they were modified
    """

Benefits:

  • Cleaner syntax for requirement patching
  • Consistent error handling
  • Can be called declaratively

Affected Plugins: vllm.py (used 4+ times), and others


8. External Dependency Downloader

Pattern: Plugins download and extract external dependencies (tarballs, zip files).

Current Implementation:

# cmake.py
def prepare_source(...):
    if is_new:
        build_dir = source_root_dir / f"build/py3-none-{platform_tag}"
        build_dir.mkdir(parents=True, exist_ok=True)
        url = CMAKE_TARBALL_URL_TEMPLATE.format(version=cmake_version)
        downloaded_path = download_url(build_dir, url)

# nvidia_cudnn_frontend.py
def prepare_source(...):
    if is_new:
        dlpack_version = get_dlpack_version(source_root_dir)
        dlpack_url = DLPACK_URL_TEMPLATE.format(version=dlpack_version)
        dlpack_dir = source_root_dir / "dlpack"
        dlpack_dir.mkdir(parents=True, exist_ok=True)
        downloaded_path = download_url(source_root_dir, dlpack_url, "dlpack.tar.gz")
        with tarfile.open(downloaded_path) as tf:
            tf.extractall(dlpack_dir, filter=tarfilter)
        downloaded_path.unlink()

Suggested Helper:

# In fromager
def download_and_extract_dependency(
    destination_dir: pathlib.Path,
    url: str,
    *,
    archive_name: str | None = None,
    extract: bool = True,
    strip_components: int = 0,
    cleanup_archive: bool = True,
) -> pathlib.Path:
    """Download and optionally extract an external dependency.

    Args:
        destination_dir: Where to place the extracted files
        url: URL to download from
        archive_name: Custom name for downloaded archive
        extract: Whether to extract the archive
        strip_components: Number of leading path components to strip
        cleanup_archive: Remove archive after extraction

    Returns:
        Path to extracted directory or downloaded file
    """

Benefits:

  • Reduces 15+ lines to 3-5 lines
  • Handles multiple archive formats
  • Consistent error handling

Affected Plugins: cmake.py, nvidia_cudnn_frontend.py


9. Multi-Backend CMake Builder

Pattern: Some packages build the same code for multiple backends (CPU, CUDA, ROCm).

Current Implementation:

# bitsandbytes.py
def _build_libbitsandbytes(...):
    # 40+ lines of cmake configure + build
    cmake_generate = ["cmake", "-S", str(sdist_root_dir), "-B", str(cmake_build_dir), ...]
    build_env.run(cmake_generate, ...)
    cmake_build = ["cmake", "--build", str(cmake_build_dir), ...]
    build_env.run(cmake_build, ...)

def build_wheel(...):
    backends = ["cpu"]
    if ctx.variant.startswith("cuda"):
        backends.append("cuda")
    if ctx.variant.startswith("rocm"):
        backends.append("hip")
    for compute_backend in backends:
        _build_libbitsandbytes(..., compute_backend=compute_backend)

Suggested Helper:

# In fromager
class CMakeBackendBuilder:
    """Build the same source for multiple backends."""

    def __init__(
        self,
        ctx: context.WorkContext,
        build_env: build_environment.BuildEnvironment,
        source_dir: pathlib.Path,
    ):
        self.ctx = ctx
        self.build_env = build_env
        self.source_dir = source_dir
        self.backends = []

    def add_backend(
        self,
        name: str,
        cmake_options: dict[str, str],
        *,
        condition: bool = True,
    ) -> "CMakeBackendBuilder":
        """Add a backend to build."""
        if condition:
            self.backends.append((name, cmake_options))
        return self

    def build_all(
        self,
        extra_environ: dict[str, str],
        *,
        generator: str = "Ninja",
        build_type: str = "Release",
    ) -> dict[str, pathlib.Path]:
        """Build all registered backends."""

Benefits:

  • Declarative multi-backend builds
  • Reduces code duplication
  • Easier to add new backends

Affected Plugins: bitsandbytes.py, tilelang.py


10. LLVM Path Finder

Pattern: Multiple plugins need to locate and configure LLVM installations.

Current Implementation:

# triton.py
def build_wheel(...):
    llvm_triton_version_file = sdist_root_dir / "cmake" / "llvm-hash.txt"
    llvm_triton_version = llvm_triton_version_file.read_text(encoding="utf-8")[:8]
    llvm_triton_dir = pathlib.Path(f"/usr/lib64/llvm-triton-{llvm_triton_version}")
    if not llvm_triton_dir.is_dir():
        raise FileNotFoundError(f"Cannot find the llvm-triton directory...")
    extra_environ["LLVM_SYSPATH"] = f"/usr/lib64/llvm-triton-{llvm_triton_version}"

# aotriton.py
def update_extra_environ(...):
    llvm_syspath = "/usr/lib64/llvm-triton-" + os.environ["LLVM_AOTRITON_09B0_COMMIT"]
    if ctx.variant.startswith("rocm"):
        if version == Version("0.10b"):
            llvm_syspath = llvm_syspath[:-8] + os.environ["LLVM_AOTRITON_010B0_COMMIT"]
        extra_environ["LLVM_SYSPATH"] = llvm_syspath
        extra_environ["LLVM_INCLUDE_DIRS"] = f"/usr/lib64/{llvm_syspath}/include"
        extra_environ["LLVM_LIBRARY_DIR"] = f"/usr/lib64/{llvm_syspath}/lib"

Suggested Helper:

# In fromager
def find_llvm_installation(
    *,
    version_file: pathlib.Path | None = None,
    version_env_var: str | None = None,
    base_path: pathlib.Path = pathlib.Path("/usr/lib64"),
    prefix: str = "llvm-",
    required: bool = True,
) -> pathlib.Path | None:
    """Locate an LLVM installation directory."""

def configure_llvm_environment(
    extra_environ: dict[str, str],
    llvm_dir: pathlib.Path,
    *,
    set_syspath: bool = True,
    set_include_dirs: bool = False,
    set_library_dir: bool = False,
) -> None:
    """Configure environment variables for LLVM installation."""

Benefits:

  • Standardizes LLVM discovery
  • Reduces error-prone path manipulation
  • Consistent error messages

Affected Plugins: triton.py, aotriton.py, llvmlite.py


11. Git Clone with External Project Detection

Pattern: Clone external dependencies referenced in CMake FetchContent declarations.

Current Implementation:

# vllm.py
def _clone_external_project_repo(...):
    cmakefilename = source_root_dir / cmakefile
    # Parse 30+ lines of CMake to find GIT_REPOSITORY and GIT_TAG
    # Look for FetchContent_Declare pattern
    for i in range(len(lines) - 3):
        if (current_lines[0].startswith("FetchContent_Declare(")
            and "GIT_REPOSITORY" in current_lines[2]
            and "GIT_TAG" in current_lines[3]):
            # Extract and parse...
    git_clone(ctx=ctx, req=Requirement(clonedir), ref=commit_hash, ...)

Suggested Helper:

# In fromager
def parse_cmake_fetch_content(
    cmake_file: pathlib.Path,
    project_name: str,
) -> dict[str, str]:
    """Parse CMake FetchContent_Declare to extract git info.

    Returns:
        Dictionary with 'git_repository', 'git_tag', etc.
    """

def clone_cmake_fetch_content_dependency(
    ctx: context.WorkContext,
    source_root_dir: pathlib.Path,
    cmake_file: str | pathlib.Path,
    project_name: str,
    destination: pathlib.Path,
    *,
    submodules: bool = False,
) -> pathlib.Path:
    """Clone a dependency declared in CMake FetchContent."""

Benefits:

  • Eliminates complex CMake parsing
  • Reusable for any FetchContent dependency
  • Reduces 50+ lines to 5 lines

Affected Plugins: vllm.py (used twice for cutlass and flash-attention)


12. Custom Package Infrastructure Generator

Pattern: Some packages need custom pyproject.toml, setup.py, and init.py files generated.

Current Implementation:

# aotriton.py (similar in tilelang.py)
PYPROJECT_TOML = """
[build-system]
requires = [...]
build-backend = "setuptools.build_meta"
...
"""

INIT_PY = """
import pathlib
def get_aotriton_include() -> pathlib.Path:
    return HERE / "include"
...
"""

def build_wheel(...):
    wheel_dir.joinpath("pyproject.toml").write_text(PYPROJECT_TOML.format(version=version))
    wheel_dir.joinpath("setup.py").write_text(SETUP_PY)
    install_dir.joinpath("__init__.py").write_text(INIT_PY)

Suggested Helper:

# In fromager
def generate_package_infrastructure(
    package_dir: pathlib.Path,
    package_name: str,
    version: Version,
    *,
    build_requires: list[str] | None = None,
    build_backend: str = "setuptools.build_meta",
    package_data: dict[str, list[str]] | None = None,
    include_paths: list[str] | None = None,
    lib_paths: list[str] | None = None,
) -> None:
    """Generate pyproject.toml, setup.py, and __init__.py for a package.

    Useful for packages that compile native code and need custom
    packaging infrastructure.
    """

Benefits:

  • Reduces 50+ lines of template strings
  • Standardizes package structure
  • Easier to maintain templates

Affected Plugins: aotriton.py, tilelang.py


13. Parallel Job Calculator

Pattern: Calculate optimal number of parallel jobs for builds.

Current Implementation:

# tilelang.py
cores = os.cpu_count() or 1
make_jobs = max(1, (cores * 75) // 100)
ninja_cmd = ["ninja", f"-j{make_jobs}"]

# pyarrow.py
pbi = ctx.package_build_info(req)
jobs = pbi.parallel_jobs()
environ_vars = {"PYARROW_PARALLEL": str(jobs)}

Suggested Helper:

# In fromager
def get_parallel_jobs(
    ctx: context.WorkContext,
    req: Requirement,
    *,
    percentage: int = 100,
    max_jobs: int | None = None,
) -> int:
    """Calculate optimal number of parallel jobs.

    Args:
        ctx: Work context
        req: Package requirement
        percentage: Percentage of cores to use (default 100)
        max_jobs: Maximum number of jobs (default unlimited)

    Returns:
        Number of parallel jobs to use
    """

Benefits:

  • Standardizes job calculation
  • Respects system limits
  • Consistent across builds

Affected Plugins: tilelang.py, pyarrow.py, and others using MAX_JOBS


14. Rust Vendoring with Patching

Pattern: Vendor Rust dependencies and apply patches.

Current Implementation:

# outlines_core.py
def prepare_source(...):
    source_root_dir, is_new = sources.unpack_source(...)
    if is_new:
        vendor_rust.vendor_rust(req, source_root_dir)
        if version in {Version("0.2.10"), Version("0.2.11")}:
            _patch_copy_aws_lc_sys(source_root_dir)
        sources.patch_source(ctx, source_root_dir, req, version)
        pyproject.apply_project_override(...)

Suggested Helper:

# In fromager
def prepare_rust_source(
    ctx: context.WorkContext,
    req: Requirement,
    source_filename: pathlib.Path,
    version: Version,
    *,
    vendor_first: bool = True,
    patch_crates: dict[str, pathlib.Path] | None = None,
) -> tuple[pathlib.Path, bool]:
    """Prepare Rust source with vendoring and patching.

    Args:
        vendor_first: Vendor before applying patches
        patch_crates: Crates to patch-copy (name -> source path)
    """

Benefits:

  • Handles Rust-specific workflow
  • Reduces repetitive unpacking/vendoring/patching
  • Supports cargo patch mechanism

Affected Plugins: outlines_core.py


15. Download URL from Tag Extractor

Pattern: Extract git repository details from download URLs.

Current Implementation:

# torchao.py, tilelang.py
def download_source(...):
    ref = get_tag_from_gitlab_archive_url(download_url)
    download_url = f"https://gitlab.com{PROJECT_PATH}.git"
    return clone_and_make_sdist(..., repo_url=download_url, tag=ref, ...)

Suggested Helper:

# In fromager (enhance existing functionality)
def extract_git_info_from_url(
    url: str,
    *,
    provider: Literal["gitlab", "github"] | None = None,
) -> dict[str, str]:
    """Extract git repository info from archive URL.

    Returns:
        Dictionary with 'provider', 'project_path', 'tag', 'clone_url'
    """

Benefits:

  • Eliminates manual URL parsing
  • Handles both GitLab and GitHub
  • Returns all needed git information

Affected Plugins: torchao.py, tilelang.py


Suggested Configuration Options

1. Git Source Configuration in YAML

Current: Requires Python plugin to specify git sources.

Proposed YAML:

# overrides/settings/package_name.yaml
source:
  type: git
  provider: gitlab  # or github
  project_path: /redhat/rhel-ai/core/mirrors/github/org/repo
  # OR for GitHub:
  # organization: org-name
  # repo: repo-name
  tag_pattern: "v{version}"  # optional, default: "v{version}"
  submodules: true
  matcher: "^midstream-cuda-v(.*)"  # optional regex for custom tag matching

Benefits:

  • Eliminates need for get_resolver_provider plugin for simple cases
  • Declarative git configuration
  • Easier to maintain

Would Replace Code In: 15+ plugins with simple git source resolution


2. Version Environment Variables in YAML

Current: Requires plugin to set version env vars.

Proposed YAML:

# overrides/settings/package_name.yaml
env:
  version_variables:
    BUILD_VERSION: "{version.base_version}"
    SETUPTOOLS_SCM_PRETEND_VERSION_FOR_{PACKAGE_NAME_UPPER}: "{version.base_version}"
    PYTORCH_BUILD_NUMBER: "{version.post}"
  # Variables can use placeholders:
  # {version}, {version.base_version}, {version.major}, {version.minor}, etc.
  # {PACKAGE_NAME}, {PACKAGE_NAME_UPPER}, {PACKAGE_NAME_LOWER}

Benefits:

  • No plugin needed for simple version control
  • Template syntax for version components
  • Clear and declarative

Would Replace Code In: 15+ plugins that only set version env vars


3. Build Directory Override in YAML

Current: Requires plugin with custom build_dir logic.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  build_directory: python  # relative to sdist root
  # OR
  build_directory_search:
    - "."
    - "python"
    - "src"
  marker_files:
    - setup.py
    - pyproject.toml

Benefits:

  • Handles monorepo layouts declaratively
  • No plugin needed for simple cases
  • Clear documentation of structure

Would Replace Code In: triton.py (and any future monorepo packages)


4. Variant-Specific Environment Variables in YAML

Current: Requires plugin with conditional logic.

Proposed YAML:

# overrides/settings/package_name.yaml
variants:
  cuda-ubi9:
    env:
      CUDA_ENABLED: "1"
      PYARROW_WITH_CUDA: "1"
  rocm-ubi9:
    env:
      ROCM_ENABLED: "1"
      AOTRITON_INSTALLED_PREFIX: "{platlib}/aotriton"
  cpu-ubi9:
    env:
      CPU_ONLY: "1"

Benefits:

  • Declarative variant configuration
  • No plugin needed for simple env vars
  • Easy to add new variants

Would Replace Code In: torch.py, pyarrow.py, bitsandbytes.py, and others


5. External Dependencies List in YAML

Current: Requires plugin to download external files.

Proposed YAML:

# overrides/settings/package_name.yaml
external_dependencies:
  - name: dlpack
    url: "https://gitlab.com/.../dlpack/-/archive/v{dlpack_version}/dlpack-v{dlpack_version}.tar.gz"
    version_file: dlpack_version.txt  # read version from this file
    destination: dlpack/
    extract: true
    strip_components: 1
  - name: cmake-source
    url: "https://github.com/Kitware/CMake/releases/download/v{version}/cmake-{version}.tar.gz"
    destination: "build/py3-none-{platform}/cmake-source.tar.gz"
    extract: false

Benefits:

  • Declarative dependency management
  • No plugin for simple downloads
  • Clear dependency documentation

Would Replace Code In: cmake.py, nvidia_cudnn_frontend.py


6. Line Replacement Rules in YAML

Current: Requires plugin with replace_lines calls.

Proposed YAML:

# overrides/settings/package_name.yaml
source_patches:
  requirements/cpu.txt:
    - pattern: 'torch==2\.6\.0\+cpu; platform_machine == "x86_64"'
      replacement: 'torch==2.7.1; platform_machine == "x86_64"'
    - pattern: '(torch==.+?)\+cpu(.*)'
      replacement: '\1\2'
  requirements/tpu.txt:
    - pattern: '^nixl==.*$'
      replacement: ''  # empty = remove line
  setup.py:
    - pattern: '(\s+)version = (get_version\([^)]+\))'
      replacement: '\1return \2'

Benefits:

  • Declarative patching
  • No plugin for simple replacements
  • Version control friendly

Would Replace Code In: vllm.py (multiple patch functions)


7. Build Backend List in YAML

Current: Requires plugin to build multiple backends.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  type: cmake_multi_backend
  backends:
    cpu:
      always: true
      cmake_options:
        COMPUTE_BACKEND: cpu
    cuda:
      when_variant: cuda
      cmake_options:
        COMPUTE_BACKEND: cuda
        COMPUTE_CAPABILITY: "{cuda_arch_list}"
    hip:
      when_variant: rocm
      cmake_options:
        COMPUTE_BACKEND: hip
        BNB_ROCM_ARCH: "{rocm_arch}"

Benefits:

  • Declarative multi-backend builds
  • No plugin for standard CMake builds
  • Clear build configuration

Would Replace Code In: bitsandbytes.py, tilelang.py


8. Custom Resolver Provider Configuration in YAML

Current: Requires plugin to specify resolver type.

Proposed YAML:

# overrides/settings/package_name.yaml
resolver:
  type: gitlab_tag  # or github_tag, pypi, custom
  project_path: /redhat/rhel-ai/core/mirrors/github/org/repo
  # For custom matchers:
  matcher:
    type: regex
    pattern: '^midstream-{variant}-v(\d+\.\d+\.\d+)'
    groups:
      version: 1
  # OR
  matcher:
    type: function
    module: package_plugins.vllm
    function: _create_midstream_matcher

Benefits:

  • Simple cases don't need plugins
  • Complex cases can use custom functions
  • Clear resolver documentation

Would Replace Code In: vllm.py, torchao.py, and 10+ others (for simple cases)


9. Pre/Post Build Scripts in YAML

Current: Requires plugin to run custom commands.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  pre_build_scripts:
    - script: "python generate_self_schema.py"
      working_directory: "{sdist_root_dir}"
      condition: "file_exists('generate_self_schema.py')"
    - script: "python tools/amd_build/build_amd.py"
      working_directory: "{sdist_root_dir}"
      condition: "variant_matches('rocm')"
      env:
        SIX_CLONE_DIR: "{sdist_root_dir}/third_party/six"

Benefits:

  • Simple pre/post build hooks without plugin
  • Conditional execution
  • Clear script documentation

Would Replace Code In: pydantic_core.py, torch.py (ROCm build)


10. CMake Build Configuration in YAML

Current: Requires plugin for CMake builds.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  type: cmake
  configure:
    source_dir: cpp
    build_dir: cpp/build
    generator: Ninja
    options:
      CMAKE_BUILD_TYPE: Release
      CMAKE_INSTALL_PREFIX: "{dist_dir}"
      ARROW_COMPUTE: ON
      ARROW_CUDA: "ON if variant == 'cuda-ubi9' else OFF"
  build:
    targets: [all]
  install:
    targets: [install]
  env:
    LD_LIBRARY_PATH: "{dist_dir}/lib:{LD_LIBRARY_PATH}"
    CMAKE_PREFIX_PATH: "{dist_dir}:{CMAKE_PREFIX_PATH}"

Benefits:

  • Declarative CMake builds
  • No plugin for standard CMake
  • Supports complex builds

Would Replace Code In: pyarrow.py (partially), and future CMake packages


11. Ensure PKG-INFO Flag in YAML

Current: Requires plugin to call ensure_pkg_info.

Proposed YAML:

# overrides/settings/package_name.yaml
source:
  type: git
  ensure_pkg_info: true  # automatically call ensure_pkg_info in prepare_source
  pkg_info:
    build_dir: null  # or specify a subdirectory

Benefits:

  • Automatic PKG-INFO for git sources
  • No plugin needed
  • Clear metadata requirements

Would Replace Code In: 10+ plugins with just ensure_pkg_info calls


12. LLVM Installation Configuration in YAML

Current: Requires plugin to locate LLVM.

Proposed YAML:

# overrides/settings/package_name.yaml
build:
  llvm:
    version_from_file: cmake/llvm-hash.txt
    version_length: 8  # use first 8 chars
    # OR
    version_from_env: LLVM_TRITON_VERSION
    base_path: /usr/lib64
    prefix: llvm-triton-
    env:
      LLVM_SYSPATH: "{llvm_dir}"
      LLVM_INCLUDE_DIRS: "{llvm_dir}/include"
      LLVM_LIBRARY_DIR: "{llvm_dir}/lib"

Benefits:

  • Declarative LLVM configuration
  • No plugin for standard LLVM lookup
  • Clear dependency documentation

Would Replace Code In: triton.py, aotriton.py, llvmlite.py (partially)


Pattern Analysis Details

Pattern Frequency Analysis

Pattern Occurrences Plugins Affected Complexity
Version environment variables 15+ torch, torchao, torchaudio, vllm, etc. Low
Git source resolution 15+ torchao, nvidia_cudnn_frontend, etc. Medium
PKG-INFO generation 10+ torchao, vllm, tilelang, etc. Low
Variant-specific env vars 10+ torch, vllm, bitsandbytes, etc. Medium
External dependency download 5+ cmake, nvidia_cudnn_frontend, etc. Medium
CMake version parsing 5+ torch, vllm, triton, etc. Medium
Requirements file patching 5+ vllm (4 times alone) Low
Multi-backend builds 3+ bitsandbytes, tilelang, aotriton High
LLVM path configuration 3+ triton, aotriton, llvmlite Medium
Monorepo build directory 2+ triton (multiple hooks) Medium
Custom package infrastructure 2+ aotriton, tilelang High
FetchContent parsing 2+ vllm (cutlass, flash-attn) High
Rust vendoring 1+ outlines_core Low

Code Reduction Estimates

If all suggestions were implemented:

  • Helper Functions: Would reduce plugin code by approximately 30-40%

    • Example: torchao.py could go from ~103 lines to ~60 lines
    • Example: vllm.py could go from ~527 lines to ~300 lines
  • Configuration Options: Would eliminate approximately 40-50% of simple plugins

    • Plugins that only set env vars or resolve git sources could be pure YAML
    • Example: torchaudio.py (32 lines) → pure YAML config
  • Combined: Estimated 50-60% reduction in total plugin code

Maintainability Benefits

  1. Reduced Duplication: Same logic isn't repeated across 15+ files
  2. Easier Testing: Helper functions can be unit tested in fromager
  3. Clearer Intent: YAML configuration is more readable than Python code
  4. Lower Barrier: Adding packages requires less Python knowledge
  5. Consistency: Standard patterns enforced by helpers

Implementation Priority

High Priority (immediate impact, low complexity):

  1. Version environment variable setter
  2. Git resolver provider builder
  3. PKG-INFO ensurer wrapper
  4. Requirements file patcher
  5. Version env vars in YAML
  6. Git source configuration in YAML

Medium Priority (good impact, medium complexity): 7. Variant-specific environment builder 8. External dependency downloader 9. CMake version parser 10. Variant-specific env vars in YAML 11. External dependencies in YAML 12. Line replacement rules in YAML

Lower Priority (specialized use cases, high complexity): 13. Multi-backend CMake builder 14. Custom package infrastructure generator 15. FetchContent parser 16. Build backend list in YAML 17. CMake build configuration in YAML


Conclusion

These suggestions would significantly reduce repetition in package plugins while maintaining flexibility for complex cases. The combination of helper functions for reusable logic and YAML configuration for declarative patterns would:

  • Reduce total plugin code by 50-60%
  • Eliminate the need for Python plugins in many simple cases
  • Make the build system more maintainable and accessible
  • Standardize common patterns across the ecosystem

Next Steps

  1. Review and prioritize suggestions with the fromager team
  2. Implement high-priority helpers and config options
  3. Migrate existing plugins to use new features
  4. Document best practices for plugin development
  5. Create templates for common plugin patterns

Questions for Discussion

  1. Which patterns are most valuable to standardize in fromager vs. keeping as local utilities?
  2. Should configuration options support complex expressions or remain simple?
  3. How should we handle backward compatibility during migration?
  4. What's the right balance between flexibility and simplification?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment