This document contains suggestions for reducing repetition in package plugins by adding new helper functions and configuration options to fromager.
After analyzing the package plugins in this repository, we identified several recurring patterns that could be simplified through fromager enhancements:
- 19 common patterns across 40+ package plugins
- 15 suggested helper functions to reduce code duplication
- 12 suggested configuration options to move logic from code to YAML
- Analysis Methodology
- Suggested Helper Functions
- Suggested Configuration Options
- Pattern Analysis Details
We analyzed package plugins including:
torch.py,torchao.py,torchaudio.py,torchvision.pyvllm.py,aotriton.py,triton.pynvidia_cudnn_frontend.py,cmake.pybitsandbytes.py,pyarrow.py,llvmlite.pyfaiss_cpu.py,tilelang.py,outlines_core.py- Plus 25+ additional plugins
Common themes identified:
- Git source resolution and downloading
- Version control via environment variables
- Variant-specific configuration (CPU, CUDA, ROCm, etc.)
- External dependency management
- CMake-based builds
- File patching and modification
Pattern: Many plugins set BUILD_VERSION, SETUPTOOLS_SCM_PRETEND_VERSION, PYTORCH_BUILD_VERSION, etc.
Current Implementation:
# torchaudio.py
def build_wheel(...):
extra_environ["BUILD_VERSION"] = str(version)
# torchao.py
def update_extra_environ(...):
if version is not None:
extra_environ["SETUPTOOLS_SCM_PRETEND_VERSION_FOR_TORCHAO"] = version.base_version
extra_environ["BUILD_VERSION"] = version.base_version
# torch.py
def update_extra_environ(...):
if version is not None:
extra_environ["PYTORCH_BUILD_VERSION"] = version.base_version
extra_environ["PYTORCH_BUILD_NUMBER"] = postSuggested Helper:
# In fromager
def set_version_environment_variables(
extra_environ: dict[str, str],
version: Version | None,
*,
package_name: str | None = None,
include_build_number: bool = False,
include_setuptools_scm: bool = False,
use_base_version: bool = True,
) -> None:
"""Set common version-related environment variables.
Args:
extra_environ: Dictionary to update with version variables
version: Package version
package_name: Package name for scm variable (e.g., 'torchao')
include_build_number: Set BUILD_NUMBER from version.post
include_setuptools_scm: Set SETUPTOOLS_SCM_PRETEND_VERSION*
use_base_version: Use version.base_version instead of str(version)
"""Benefits:
- Eliminates repetitive version control code in 15+ plugins
- Standardizes version handling across packages
- Reduces errors from inconsistent version variable naming
Affected Plugins: torch.py, torchaudio.py, torchvision.py, torchao.py, vllm.py, and 10+ others
Pattern: Nearly identical get_resolver_provider implementations for GitLab/GitHub sources.
Current Implementation:
# torchao.py
def get_resolver_provider(...):
if include_sdists:
return resolver.GitLabTagProvider(
project_path=PROJECT_PATH,
constraints=ctx.constraints,
)
return resolver.default_resolver_provider(...)
# bitsandbytes.py
def get_resolver_provider(...):
if include_sdists:
return resolver.GitHubTagProvider(
"bitsandbytes-foundation", "bitsandbytes", ctx.constraints
)
return resolver.default_resolver_provider(...)Suggested Helper:
# In fromager
def git_resolver_provider(
ctx: context.WorkContext,
req: Requirement,
sdist_server_url: str,
include_sdists: bool,
include_wheels: bool,
*,
provider_type: Literal["gitlab", "github"],
project_path: str | None = None,
organization: str | None = None,
repo: str | None = None,
matcher: MatchFunction | re.Pattern | None = None,
req_type: resolver.RequirementType | None = None,
ignore_platform: bool = False,
) -> resolver.PyPIProvider | resolver.GenericProvider:
"""Create a git-based resolver provider with standard fallback.
Simplifies the common pattern of using GitLab/GitHub providers
when sdists are available, falling back to default otherwise.
"""Benefits:
- Reduces 20+ lines to 1-2 lines per plugin
- Standardizes git resolver pattern
- Easier to maintain and test
Affected Plugins: torchao.py, nvidia_cudnn_frontend.py, tilelang.py, vllm.py, bitsandbytes.py, and 10+ others
Pattern: Most plugins using git sources need to call sources.ensure_pkg_info() in prepare_source.
Current Implementation:
# torchao.py
def prepare_source(...):
source_root_dir, is_new = sources.default_prepare_source(...)
if is_new:
sources.ensure_pkg_info(
ctx=ctx,
req=req,
version=version,
sdist_root_dir=source_root_dir,
build_dir=None,
)
return source_root_dir, is_newSuggested Helper:
# In fromager
def prepare_source_with_pkg_info(
ctx: context.WorkContext,
req: Requirement,
source_filename: pathlib.Path,
version: Version,
*,
ensure_pkg_info: bool = True,
build_dir: pathlib.Path | None = None,
) -> tuple[pathlib.Path, bool]:
"""Prepare source with automatic PKG-INFO generation.
Combines default_prepare_source with ensure_pkg_info for the
common case of git sources that need PKG-INFO metadata.
"""Benefits:
- Reduces 10+ lines to 1-2 lines
- Prevents forgetting to call ensure_pkg_info
- Standard pattern for git sources
Affected Plugins: torchao.py, vllm.py, tilelang.py, and 8+ others
Pattern: Packages in monorepos need custom logic to find the correct build directory.
Current Implementation:
# triton.py
def _get_build_dir(sdist_root_dir: pathlib.Path) -> pathlib.Path:
if sdist_root_dir.joinpath("setup.py").is_file():
# Triton >= 3.4.0
return sdist_root_dir
build_dir = sdist_root_dir / "python"
if build_dir.joinpath("setup.py").is_file():
# Triton < 3.4.0
return build_dir
raise ValueError("setup.py not found")Suggested Helper:
# In fromager
def find_build_directory(
sdist_root_dir: pathlib.Path,
*,
search_paths: list[str | pathlib.Path] = [".", "python"],
marker_files: list[str] = ["setup.py", "pyproject.toml"],
) -> pathlib.Path:
"""Find the build directory in a monorepo structure.
Searches for marker files in potential subdirectories to locate
the actual Python package within a larger repository.
"""Benefits:
- Eliminates custom build directory detection logic
- Configurable for different monorepo layouts
- Consistent error messages
Affected Plugins: triton.py (all 5 hook methods use this pattern)
Pattern: Multiple plugins parse CMake files to extract version numbers or configuration values.
Current Implementation:
# torch.py
def _clone_six_repo(...):
cmakefilename = build_dir / "third_party/NNPACK/cmake/DownloadSix.cmake"
content = cmakefilename.read_text(encoding="utf-8")
pattern = r"six-(\d+\.\d+\.\d+)\.tar\.gz"
match = re.search(pattern, content)
if not match:
raise RuntimeError(f"Could not determine six version in {cmakefilename}")
six_version = match.group(1)
# vllm.py
def _clone_cutlass_repo(...):
cmakefilename = source_root_dir / cmakefile
content = cmakefilename.read_text(encoding="utf-8")
pattern = r'set\(CUTLASS_REVISION "v(\d+\.\d+\.\d+)"'
match = re.search(pattern, content)
if not match:
raise RuntimeError(f"Could not determine cutlass version in {cmakefilename}")
cutlass_version = match.group(1)Suggested Helper:
# In fromager
def parse_cmake_variable(
cmake_file: pathlib.Path,
variable_name: str,
*,
pattern: str | re.Pattern | None = None,
required: bool = True,
) -> str | None:
"""Extract a variable value from a CMake file.
Supports both set() commands and inline values with custom patterns.
"""
def parse_cmake_version(
cmake_file: pathlib.Path,
*,
version_variable: str | None = None,
version_pattern: str | re.Pattern | None = None,
) -> Version:
"""Extract a version number from a CMake file."""Benefits:
- Reduces 10+ lines to 1-2 lines
- Standardized error handling
- Supports multiple CMake patterns
Affected Plugins: torch.py, vllm.py (cutlass, flash-attn), triton.py
Pattern: Many plugins check variant and set environment variables conditionally.
Current Implementation:
# torch.py
def update_extra_environ(...):
if ctx.variant.startswith("rocm"):
platlib = build_env.run([...])
extra_environ["AOTRITON_INSTALLED_PREFIX"] = os.path.join(platlib, "aotriton")
# bitsandbytes.py
def build_wheel(...):
backends = ["cpu"]
if ctx.variant.startswith("cuda"):
backends.append("cuda")
if ctx.variant.startswith("rocm"):
backends.append("hip")Suggested Helper:
# In fromager
class VariantEnvironBuilder:
"""Builder for variant-specific environment configuration."""
def __init__(self, ctx: context.WorkContext):
self.ctx = ctx
self.environ = {}
def when_variant(self, pattern: str, **env_vars: str) -> "VariantEnvironBuilder":
"""Set environment variables when variant matches pattern."""
if self.ctx.variant.startswith(pattern):
self.environ.update(env_vars)
return self
def when_cuda(self, **env_vars: str) -> "VariantEnvironBuilder":
"""Set environment variables for CUDA variant."""
return self.when_variant("cuda", **env_vars)
def when_rocm(self, **env_vars: str) -> "VariantEnvironBuilder":
"""Set environment variables for ROCm variant."""
return self.when_variant("rocm", **env_vars)
def build(self) -> dict[str, str]:
"""Return the built environment dictionary."""
return self.environBenefits:
- Fluent API for variant configuration
- More readable than nested if statements
- Reusable across plugins
Affected Plugins: torch.py, vllm.py, bitsandbytes.py, tilelang.py, aotriton.py
Pattern: Many plugins patch requirements.txt files with regex replacements.
Current Implementation:
# vllm.py (has this pattern repeated 3+ times)
def _fix_torch_cpu_dependency(...):
cpu_requirements = source_root_dir / "requirements" / "cpu.txt"
if cpu_requirements.is_file():
replace_lines(
cpu_requirements,
[
(r"torch==2\.6\.0\+cpu; platform_machine == \"x86_64\"",
'torch==2.7.1; platform_machine == "x86_64"'),
(r"(torch==.+?)\+cpu(.*)", r"\1\2"),
],
)Suggested Helper:
# In fromager
def patch_requirements_file(
requirements_file: pathlib.Path,
replacements: list[tuple[str | re.Pattern, str]],
*,
skip_if_missing: bool = False,
) -> bool:
"""Patch a requirements file with regex replacements.
Returns True if file was modified, False otherwise.
"""
def patch_requirements_files(
source_root_dir: pathlib.Path,
patches: dict[str, list[tuple[str, str]]],
) -> dict[str, bool]:
"""Patch multiple requirements files.
Args:
source_root_dir: Root directory of the source
patches: Mapping of relative file paths to replacement lists
Returns:
Mapping of file paths to whether they were modified
"""Benefits:
- Cleaner syntax for requirement patching
- Consistent error handling
- Can be called declaratively
Affected Plugins: vllm.py (used 4+ times), and others
Pattern: Plugins download and extract external dependencies (tarballs, zip files).
Current Implementation:
# cmake.py
def prepare_source(...):
if is_new:
build_dir = source_root_dir / f"build/py3-none-{platform_tag}"
build_dir.mkdir(parents=True, exist_ok=True)
url = CMAKE_TARBALL_URL_TEMPLATE.format(version=cmake_version)
downloaded_path = download_url(build_dir, url)
# nvidia_cudnn_frontend.py
def prepare_source(...):
if is_new:
dlpack_version = get_dlpack_version(source_root_dir)
dlpack_url = DLPACK_URL_TEMPLATE.format(version=dlpack_version)
dlpack_dir = source_root_dir / "dlpack"
dlpack_dir.mkdir(parents=True, exist_ok=True)
downloaded_path = download_url(source_root_dir, dlpack_url, "dlpack.tar.gz")
with tarfile.open(downloaded_path) as tf:
tf.extractall(dlpack_dir, filter=tarfilter)
downloaded_path.unlink()Suggested Helper:
# In fromager
def download_and_extract_dependency(
destination_dir: pathlib.Path,
url: str,
*,
archive_name: str | None = None,
extract: bool = True,
strip_components: int = 0,
cleanup_archive: bool = True,
) -> pathlib.Path:
"""Download and optionally extract an external dependency.
Args:
destination_dir: Where to place the extracted files
url: URL to download from
archive_name: Custom name for downloaded archive
extract: Whether to extract the archive
strip_components: Number of leading path components to strip
cleanup_archive: Remove archive after extraction
Returns:
Path to extracted directory or downloaded file
"""Benefits:
- Reduces 15+ lines to 3-5 lines
- Handles multiple archive formats
- Consistent error handling
Affected Plugins: cmake.py, nvidia_cudnn_frontend.py
Pattern: Some packages build the same code for multiple backends (CPU, CUDA, ROCm).
Current Implementation:
# bitsandbytes.py
def _build_libbitsandbytes(...):
# 40+ lines of cmake configure + build
cmake_generate = ["cmake", "-S", str(sdist_root_dir), "-B", str(cmake_build_dir), ...]
build_env.run(cmake_generate, ...)
cmake_build = ["cmake", "--build", str(cmake_build_dir), ...]
build_env.run(cmake_build, ...)
def build_wheel(...):
backends = ["cpu"]
if ctx.variant.startswith("cuda"):
backends.append("cuda")
if ctx.variant.startswith("rocm"):
backends.append("hip")
for compute_backend in backends:
_build_libbitsandbytes(..., compute_backend=compute_backend)Suggested Helper:
# In fromager
class CMakeBackendBuilder:
"""Build the same source for multiple backends."""
def __init__(
self,
ctx: context.WorkContext,
build_env: build_environment.BuildEnvironment,
source_dir: pathlib.Path,
):
self.ctx = ctx
self.build_env = build_env
self.source_dir = source_dir
self.backends = []
def add_backend(
self,
name: str,
cmake_options: dict[str, str],
*,
condition: bool = True,
) -> "CMakeBackendBuilder":
"""Add a backend to build."""
if condition:
self.backends.append((name, cmake_options))
return self
def build_all(
self,
extra_environ: dict[str, str],
*,
generator: str = "Ninja",
build_type: str = "Release",
) -> dict[str, pathlib.Path]:
"""Build all registered backends."""Benefits:
- Declarative multi-backend builds
- Reduces code duplication
- Easier to add new backends
Affected Plugins: bitsandbytes.py, tilelang.py
Pattern: Multiple plugins need to locate and configure LLVM installations.
Current Implementation:
# triton.py
def build_wheel(...):
llvm_triton_version_file = sdist_root_dir / "cmake" / "llvm-hash.txt"
llvm_triton_version = llvm_triton_version_file.read_text(encoding="utf-8")[:8]
llvm_triton_dir = pathlib.Path(f"/usr/lib64/llvm-triton-{llvm_triton_version}")
if not llvm_triton_dir.is_dir():
raise FileNotFoundError(f"Cannot find the llvm-triton directory...")
extra_environ["LLVM_SYSPATH"] = f"/usr/lib64/llvm-triton-{llvm_triton_version}"
# aotriton.py
def update_extra_environ(...):
llvm_syspath = "/usr/lib64/llvm-triton-" + os.environ["LLVM_AOTRITON_09B0_COMMIT"]
if ctx.variant.startswith("rocm"):
if version == Version("0.10b"):
llvm_syspath = llvm_syspath[:-8] + os.environ["LLVM_AOTRITON_010B0_COMMIT"]
extra_environ["LLVM_SYSPATH"] = llvm_syspath
extra_environ["LLVM_INCLUDE_DIRS"] = f"/usr/lib64/{llvm_syspath}/include"
extra_environ["LLVM_LIBRARY_DIR"] = f"/usr/lib64/{llvm_syspath}/lib"Suggested Helper:
# In fromager
def find_llvm_installation(
*,
version_file: pathlib.Path | None = None,
version_env_var: str | None = None,
base_path: pathlib.Path = pathlib.Path("/usr/lib64"),
prefix: str = "llvm-",
required: bool = True,
) -> pathlib.Path | None:
"""Locate an LLVM installation directory."""
def configure_llvm_environment(
extra_environ: dict[str, str],
llvm_dir: pathlib.Path,
*,
set_syspath: bool = True,
set_include_dirs: bool = False,
set_library_dir: bool = False,
) -> None:
"""Configure environment variables for LLVM installation."""Benefits:
- Standardizes LLVM discovery
- Reduces error-prone path manipulation
- Consistent error messages
Affected Plugins: triton.py, aotriton.py, llvmlite.py
Pattern: Clone external dependencies referenced in CMake FetchContent declarations.
Current Implementation:
# vllm.py
def _clone_external_project_repo(...):
cmakefilename = source_root_dir / cmakefile
# Parse 30+ lines of CMake to find GIT_REPOSITORY and GIT_TAG
# Look for FetchContent_Declare pattern
for i in range(len(lines) - 3):
if (current_lines[0].startswith("FetchContent_Declare(")
and "GIT_REPOSITORY" in current_lines[2]
and "GIT_TAG" in current_lines[3]):
# Extract and parse...
git_clone(ctx=ctx, req=Requirement(clonedir), ref=commit_hash, ...)Suggested Helper:
# In fromager
def parse_cmake_fetch_content(
cmake_file: pathlib.Path,
project_name: str,
) -> dict[str, str]:
"""Parse CMake FetchContent_Declare to extract git info.
Returns:
Dictionary with 'git_repository', 'git_tag', etc.
"""
def clone_cmake_fetch_content_dependency(
ctx: context.WorkContext,
source_root_dir: pathlib.Path,
cmake_file: str | pathlib.Path,
project_name: str,
destination: pathlib.Path,
*,
submodules: bool = False,
) -> pathlib.Path:
"""Clone a dependency declared in CMake FetchContent."""Benefits:
- Eliminates complex CMake parsing
- Reusable for any FetchContent dependency
- Reduces 50+ lines to 5 lines
Affected Plugins: vllm.py (used twice for cutlass and flash-attention)
Pattern: Some packages need custom pyproject.toml, setup.py, and init.py files generated.
Current Implementation:
# aotriton.py (similar in tilelang.py)
PYPROJECT_TOML = """
[build-system]
requires = [...]
build-backend = "setuptools.build_meta"
...
"""
INIT_PY = """
import pathlib
def get_aotriton_include() -> pathlib.Path:
return HERE / "include"
...
"""
def build_wheel(...):
wheel_dir.joinpath("pyproject.toml").write_text(PYPROJECT_TOML.format(version=version))
wheel_dir.joinpath("setup.py").write_text(SETUP_PY)
install_dir.joinpath("__init__.py").write_text(INIT_PY)Suggested Helper:
# In fromager
def generate_package_infrastructure(
package_dir: pathlib.Path,
package_name: str,
version: Version,
*,
build_requires: list[str] | None = None,
build_backend: str = "setuptools.build_meta",
package_data: dict[str, list[str]] | None = None,
include_paths: list[str] | None = None,
lib_paths: list[str] | None = None,
) -> None:
"""Generate pyproject.toml, setup.py, and __init__.py for a package.
Useful for packages that compile native code and need custom
packaging infrastructure.
"""Benefits:
- Reduces 50+ lines of template strings
- Standardizes package structure
- Easier to maintain templates
Affected Plugins: aotriton.py, tilelang.py
Pattern: Calculate optimal number of parallel jobs for builds.
Current Implementation:
# tilelang.py
cores = os.cpu_count() or 1
make_jobs = max(1, (cores * 75) // 100)
ninja_cmd = ["ninja", f"-j{make_jobs}"]
# pyarrow.py
pbi = ctx.package_build_info(req)
jobs = pbi.parallel_jobs()
environ_vars = {"PYARROW_PARALLEL": str(jobs)}Suggested Helper:
# In fromager
def get_parallel_jobs(
ctx: context.WorkContext,
req: Requirement,
*,
percentage: int = 100,
max_jobs: int | None = None,
) -> int:
"""Calculate optimal number of parallel jobs.
Args:
ctx: Work context
req: Package requirement
percentage: Percentage of cores to use (default 100)
max_jobs: Maximum number of jobs (default unlimited)
Returns:
Number of parallel jobs to use
"""Benefits:
- Standardizes job calculation
- Respects system limits
- Consistent across builds
Affected Plugins: tilelang.py, pyarrow.py, and others using MAX_JOBS
Pattern: Vendor Rust dependencies and apply patches.
Current Implementation:
# outlines_core.py
def prepare_source(...):
source_root_dir, is_new = sources.unpack_source(...)
if is_new:
vendor_rust.vendor_rust(req, source_root_dir)
if version in {Version("0.2.10"), Version("0.2.11")}:
_patch_copy_aws_lc_sys(source_root_dir)
sources.patch_source(ctx, source_root_dir, req, version)
pyproject.apply_project_override(...)Suggested Helper:
# In fromager
def prepare_rust_source(
ctx: context.WorkContext,
req: Requirement,
source_filename: pathlib.Path,
version: Version,
*,
vendor_first: bool = True,
patch_crates: dict[str, pathlib.Path] | None = None,
) -> tuple[pathlib.Path, bool]:
"""Prepare Rust source with vendoring and patching.
Args:
vendor_first: Vendor before applying patches
patch_crates: Crates to patch-copy (name -> source path)
"""Benefits:
- Handles Rust-specific workflow
- Reduces repetitive unpacking/vendoring/patching
- Supports cargo patch mechanism
Affected Plugins: outlines_core.py
Pattern: Extract git repository details from download URLs.
Current Implementation:
# torchao.py, tilelang.py
def download_source(...):
ref = get_tag_from_gitlab_archive_url(download_url)
download_url = f"https://gitlab.com{PROJECT_PATH}.git"
return clone_and_make_sdist(..., repo_url=download_url, tag=ref, ...)Suggested Helper:
# In fromager (enhance existing functionality)
def extract_git_info_from_url(
url: str,
*,
provider: Literal["gitlab", "github"] | None = None,
) -> dict[str, str]:
"""Extract git repository info from archive URL.
Returns:
Dictionary with 'provider', 'project_path', 'tag', 'clone_url'
"""Benefits:
- Eliminates manual URL parsing
- Handles both GitLab and GitHub
- Returns all needed git information
Affected Plugins: torchao.py, tilelang.py
Current: Requires Python plugin to specify git sources.
Proposed YAML:
# overrides/settings/package_name.yaml
source:
type: git
provider: gitlab # or github
project_path: /redhat/rhel-ai/core/mirrors/github/org/repo
# OR for GitHub:
# organization: org-name
# repo: repo-name
tag_pattern: "v{version}" # optional, default: "v{version}"
submodules: true
matcher: "^midstream-cuda-v(.*)" # optional regex for custom tag matchingBenefits:
- Eliminates need for
get_resolver_providerplugin for simple cases - Declarative git configuration
- Easier to maintain
Would Replace Code In: 15+ plugins with simple git source resolution
Current: Requires plugin to set version env vars.
Proposed YAML:
# overrides/settings/package_name.yaml
env:
version_variables:
BUILD_VERSION: "{version.base_version}"
SETUPTOOLS_SCM_PRETEND_VERSION_FOR_{PACKAGE_NAME_UPPER}: "{version.base_version}"
PYTORCH_BUILD_NUMBER: "{version.post}"
# Variables can use placeholders:
# {version}, {version.base_version}, {version.major}, {version.minor}, etc.
# {PACKAGE_NAME}, {PACKAGE_NAME_UPPER}, {PACKAGE_NAME_LOWER}Benefits:
- No plugin needed for simple version control
- Template syntax for version components
- Clear and declarative
Would Replace Code In: 15+ plugins that only set version env vars
Current: Requires plugin with custom build_dir logic.
Proposed YAML:
# overrides/settings/package_name.yaml
build:
build_directory: python # relative to sdist root
# OR
build_directory_search:
- "."
- "python"
- "src"
marker_files:
- setup.py
- pyproject.tomlBenefits:
- Handles monorepo layouts declaratively
- No plugin needed for simple cases
- Clear documentation of structure
Would Replace Code In: triton.py (and any future monorepo packages)
Current: Requires plugin with conditional logic.
Proposed YAML:
# overrides/settings/package_name.yaml
variants:
cuda-ubi9:
env:
CUDA_ENABLED: "1"
PYARROW_WITH_CUDA: "1"
rocm-ubi9:
env:
ROCM_ENABLED: "1"
AOTRITON_INSTALLED_PREFIX: "{platlib}/aotriton"
cpu-ubi9:
env:
CPU_ONLY: "1"Benefits:
- Declarative variant configuration
- No plugin needed for simple env vars
- Easy to add new variants
Would Replace Code In: torch.py, pyarrow.py, bitsandbytes.py, and others
Current: Requires plugin to download external files.
Proposed YAML:
# overrides/settings/package_name.yaml
external_dependencies:
- name: dlpack
url: "https://gitlab.com/.../dlpack/-/archive/v{dlpack_version}/dlpack-v{dlpack_version}.tar.gz"
version_file: dlpack_version.txt # read version from this file
destination: dlpack/
extract: true
strip_components: 1
- name: cmake-source
url: "https://github.com/Kitware/CMake/releases/download/v{version}/cmake-{version}.tar.gz"
destination: "build/py3-none-{platform}/cmake-source.tar.gz"
extract: falseBenefits:
- Declarative dependency management
- No plugin for simple downloads
- Clear dependency documentation
Would Replace Code In: cmake.py, nvidia_cudnn_frontend.py
Current: Requires plugin with replace_lines calls.
Proposed YAML:
# overrides/settings/package_name.yaml
source_patches:
requirements/cpu.txt:
- pattern: 'torch==2\.6\.0\+cpu; platform_machine == "x86_64"'
replacement: 'torch==2.7.1; platform_machine == "x86_64"'
- pattern: '(torch==.+?)\+cpu(.*)'
replacement: '\1\2'
requirements/tpu.txt:
- pattern: '^nixl==.*$'
replacement: '' # empty = remove line
setup.py:
- pattern: '(\s+)version = (get_version\([^)]+\))'
replacement: '\1return \2'Benefits:
- Declarative patching
- No plugin for simple replacements
- Version control friendly
Would Replace Code In: vllm.py (multiple patch functions)
Current: Requires plugin to build multiple backends.
Proposed YAML:
# overrides/settings/package_name.yaml
build:
type: cmake_multi_backend
backends:
cpu:
always: true
cmake_options:
COMPUTE_BACKEND: cpu
cuda:
when_variant: cuda
cmake_options:
COMPUTE_BACKEND: cuda
COMPUTE_CAPABILITY: "{cuda_arch_list}"
hip:
when_variant: rocm
cmake_options:
COMPUTE_BACKEND: hip
BNB_ROCM_ARCH: "{rocm_arch}"Benefits:
- Declarative multi-backend builds
- No plugin for standard CMake builds
- Clear build configuration
Would Replace Code In: bitsandbytes.py, tilelang.py
Current: Requires plugin to specify resolver type.
Proposed YAML:
# overrides/settings/package_name.yaml
resolver:
type: gitlab_tag # or github_tag, pypi, custom
project_path: /redhat/rhel-ai/core/mirrors/github/org/repo
# For custom matchers:
matcher:
type: regex
pattern: '^midstream-{variant}-v(\d+\.\d+\.\d+)'
groups:
version: 1
# OR
matcher:
type: function
module: package_plugins.vllm
function: _create_midstream_matcherBenefits:
- Simple cases don't need plugins
- Complex cases can use custom functions
- Clear resolver documentation
Would Replace Code In: vllm.py, torchao.py, and 10+ others (for simple cases)
Current: Requires plugin to run custom commands.
Proposed YAML:
# overrides/settings/package_name.yaml
build:
pre_build_scripts:
- script: "python generate_self_schema.py"
working_directory: "{sdist_root_dir}"
condition: "file_exists('generate_self_schema.py')"
- script: "python tools/amd_build/build_amd.py"
working_directory: "{sdist_root_dir}"
condition: "variant_matches('rocm')"
env:
SIX_CLONE_DIR: "{sdist_root_dir}/third_party/six"Benefits:
- Simple pre/post build hooks without plugin
- Conditional execution
- Clear script documentation
Would Replace Code In: pydantic_core.py, torch.py (ROCm build)
Current: Requires plugin for CMake builds.
Proposed YAML:
# overrides/settings/package_name.yaml
build:
type: cmake
configure:
source_dir: cpp
build_dir: cpp/build
generator: Ninja
options:
CMAKE_BUILD_TYPE: Release
CMAKE_INSTALL_PREFIX: "{dist_dir}"
ARROW_COMPUTE: ON
ARROW_CUDA: "ON if variant == 'cuda-ubi9' else OFF"
build:
targets: [all]
install:
targets: [install]
env:
LD_LIBRARY_PATH: "{dist_dir}/lib:{LD_LIBRARY_PATH}"
CMAKE_PREFIX_PATH: "{dist_dir}:{CMAKE_PREFIX_PATH}"Benefits:
- Declarative CMake builds
- No plugin for standard CMake
- Supports complex builds
Would Replace Code In: pyarrow.py (partially), and future CMake packages
Current: Requires plugin to call ensure_pkg_info.
Proposed YAML:
# overrides/settings/package_name.yaml
source:
type: git
ensure_pkg_info: true # automatically call ensure_pkg_info in prepare_source
pkg_info:
build_dir: null # or specify a subdirectoryBenefits:
- Automatic PKG-INFO for git sources
- No plugin needed
- Clear metadata requirements
Would Replace Code In: 10+ plugins with just ensure_pkg_info calls
Current: Requires plugin to locate LLVM.
Proposed YAML:
# overrides/settings/package_name.yaml
build:
llvm:
version_from_file: cmake/llvm-hash.txt
version_length: 8 # use first 8 chars
# OR
version_from_env: LLVM_TRITON_VERSION
base_path: /usr/lib64
prefix: llvm-triton-
env:
LLVM_SYSPATH: "{llvm_dir}"
LLVM_INCLUDE_DIRS: "{llvm_dir}/include"
LLVM_LIBRARY_DIR: "{llvm_dir}/lib"Benefits:
- Declarative LLVM configuration
- No plugin for standard LLVM lookup
- Clear dependency documentation
Would Replace Code In: triton.py, aotriton.py, llvmlite.py (partially)
| Pattern | Occurrences | Plugins Affected | Complexity |
|---|---|---|---|
| Version environment variables | 15+ | torch, torchao, torchaudio, vllm, etc. | Low |
| Git source resolution | 15+ | torchao, nvidia_cudnn_frontend, etc. | Medium |
| PKG-INFO generation | 10+ | torchao, vllm, tilelang, etc. | Low |
| Variant-specific env vars | 10+ | torch, vllm, bitsandbytes, etc. | Medium |
| External dependency download | 5+ | cmake, nvidia_cudnn_frontend, etc. | Medium |
| CMake version parsing | 5+ | torch, vllm, triton, etc. | Medium |
| Requirements file patching | 5+ | vllm (4 times alone) | Low |
| Multi-backend builds | 3+ | bitsandbytes, tilelang, aotriton | High |
| LLVM path configuration | 3+ | triton, aotriton, llvmlite | Medium |
| Monorepo build directory | 2+ | triton (multiple hooks) | Medium |
| Custom package infrastructure | 2+ | aotriton, tilelang | High |
| FetchContent parsing | 2+ | vllm (cutlass, flash-attn) | High |
| Rust vendoring | 1+ | outlines_core | Low |
If all suggestions were implemented:
-
Helper Functions: Would reduce plugin code by approximately 30-40%
- Example: torchao.py could go from ~103 lines to ~60 lines
- Example: vllm.py could go from ~527 lines to ~300 lines
-
Configuration Options: Would eliminate approximately 40-50% of simple plugins
- Plugins that only set env vars or resolve git sources could be pure YAML
- Example: torchaudio.py (32 lines) → pure YAML config
-
Combined: Estimated 50-60% reduction in total plugin code
- Reduced Duplication: Same logic isn't repeated across 15+ files
- Easier Testing: Helper functions can be unit tested in fromager
- Clearer Intent: YAML configuration is more readable than Python code
- Lower Barrier: Adding packages requires less Python knowledge
- Consistency: Standard patterns enforced by helpers
High Priority (immediate impact, low complexity):
- Version environment variable setter
- Git resolver provider builder
- PKG-INFO ensurer wrapper
- Requirements file patcher
- Version env vars in YAML
- Git source configuration in YAML
Medium Priority (good impact, medium complexity): 7. Variant-specific environment builder 8. External dependency downloader 9. CMake version parser 10. Variant-specific env vars in YAML 11. External dependencies in YAML 12. Line replacement rules in YAML
Lower Priority (specialized use cases, high complexity): 13. Multi-backend CMake builder 14. Custom package infrastructure generator 15. FetchContent parser 16. Build backend list in YAML 17. CMake build configuration in YAML
These suggestions would significantly reduce repetition in package plugins while maintaining flexibility for complex cases. The combination of helper functions for reusable logic and YAML configuration for declarative patterns would:
- Reduce total plugin code by 50-60%
- Eliminate the need for Python plugins in many simple cases
- Make the build system more maintainable and accessible
- Standardize common patterns across the ecosystem
- Review and prioritize suggestions with the fromager team
- Implement high-priority helpers and config options
- Migrate existing plugins to use new features
- Document best practices for plugin development
- Create templates for common plugin patterns
- Which patterns are most valuable to standardize in fromager vs. keeping as local utilities?
- Should configuration options support complex expressions or remain simple?
- How should we handle backward compatibility during migration?
- What's the right balance between flexibility and simplification?